Information Retrieval (IR) research has traditionally focused on serving the best results for a single query— so-called ad hoc retrieval. However, users typically search iteratively, refining and reformulating their queries during a session. A key challenge in the study of this interaction is the creation of suitable evaluation resources to assess the effectiveness of IR systems over sessions. This paper describes the TREC Session Track, which ran from 2010 through to 2014, which focussed on forming test collections that included various forms of implicit feedback. We describe the test collections; a brief analysis of the differences between datasets over the years; and the evaluation results that demonstrate that the use of user session data significantly improved effectiveness.
Lexical query modeling has been the leading paradigm for session search. In this paper, we analyze TREC session query logs and compare the performance of different lexical matching approaches for session search. Naive methods based on term frequency weighting perform on par with specialized session models. In addition, we investigate the viability of lexical query models in the setting of session search. We give important insights into the potential and limitations of lexical query modeling for session search and propose future directions for the field of session search.
Evaluation has always been the cornerstone of scientific development. Scientists come up with hypotheses (models) to explain physical phenomena, and validate these models by comparing their output to observations in nature. A scientific field consists then merely by a collection of hypotheses that could not been disproved (yet) when compared to nature. Evaluation plays the exact key role in the field of information retrieval. Researchers and practitioners develop models to explain the relation between an information need expressed by a person and information contained in available resources, and test these models by comparing their outcomes to collections of observations.
This article is a short survey on methods, measures, and designs used in the field of Information Retrieval to evaluate the quality of search algorithms (aka the implementation of a model) against collections of observations. The phrase “search quality” has more than one interpreta- tions, however here I will only discuss one of these interpretations, the effectiveness of a search algorithm to find the information requested by a user. There are two types of collections of observations used for the purpose of evaluation: (a) relevance annotations, and (b) observable user behaviour. I will call the evaluation framework based on the former a collection-based evaluation, while the one based on the latter an in-situ evaluation. This survey is far from complete; it only presents my personal viewpoint on the recent developments in the field.
Hospitals need to provide information to many external parties (e.g. tumor registration to IKNL, statistics to clinical auditing institutes such as DICA, information to health insurance companies etc.). Often this requires filling in predefined forms for all eligible patients/cases. Being able to fill in forms and check for eligibility automatically can save hospitals time and money, since most effort is currently manual.
The goal of this project is to develop an algorithmic pipeline that automatically (a) extracts information from medical dossiers, (b) tests for eligibility, and (c) fills in predefined forms.
This is a joint project with CTcue funded by Amsterdam Data Science
Aldo Lipani is visiting me for a month. Aldo is a PhD Student at the Vienna University of Technology working on evaluation in Information Retrieval. Aldo’s main focus is on improving the reliability of the test collection based evaluation, developing of an analytical approach to accessibility measures. Aldo and I will work on extending the definition and measurement of retrievability.
This is a joined project with Lora Aroyo (VU), Sound & Vision and Crowdynews funded by Commit on identifying controversial topics in multi-model data, with applications ranging from health, to political discourse, news, etc.
The ControCurator project aims to enable modern information access systems to discover and understand controversial topics and events by bringing together different types of crowds (niches of experts, lay crowds and engaged social media contributors) and machines in a joint active learning workflow for the creation of adequate training data (real-time and offline).
Are you interested in building systems that will assist users towards task completion rather than simply showing relevant results for a query?
The primary goals of the Tasks track are (1) to evaluate a system’s understanding of tasks users aim to complete, and (2) to evaluate how useful are retrieved documents towards the underlying task completion.
Ideally, a search engine should be able to understand the reason that caused the user to submit a query (i.e., the actual task that caused the query to be issued), and rather than just showing results relevant to the query submitted, the search engine should be able to guide the user to complete their task by incorporating the information about the actual information need.
The Tasks track will run for a second year at TREC. For details visit the track’s website.
Google will be supporting my research on ‘Session-based Personalization: Analysis and Evaluation’. The focus of this research is on personalization of search engine results on the basis of user interactions with the search engine on the current session.
The Google Research Awards Program received 805 proposals and funded 113 of them, with only 3 of them in the fields of Information retrieval, extraction and organization (including semantic graphs).