Maarten de Rijke

Information retrieval

FAT/WEB: Workshop on Fairness, Accountability, and Transparency on the Web

Recent academic and journalistic reviews of online web services have revealed that many systems exhibit subtle biases reflecting historic discrimination. Examples include racial and gender bias in search advertising, image recognition services, sharing economy mechanisms, pricing, and web-based delivery. The list of production systems exhibiting biases continues to grow and may be endemic to the way models are trained and the data used.

At the same time, concerns about user autonomy and fairness have been raised in the context of web-based experimentation such as A/B testing or explore/exploit algorithms. Given the ubiquity of this practice and increasing adoption in potentially-sensitive domains (e.g. health, employment), user consent and risk will become fundamental to the practice.

Finally, understanding the reasons behind predictions and outcomes of web services is important in optimizing a system and in building trust with users. However, it also has legal and ethical implications when the algorithm has an unintended or undesirable impact along social boundaries.
The objective of this full day workshop is to study and discuss the problems and solutions with algorithmic fairness, accountability, and transparency of models in the context of web-based services.


Survey on query auto completion online

A survey on query auto completion in information retrieval written by Fei Cai and myself is online now:

  • Fei Cai and Maarten de Rijke. A survey of query auto completion in information retrieval. Foundations and Trends in Information Retrieval, 10(4):273-363, September 2016. Bibtex, PDF
    Author = {Cai, Fei and de Rijke, Maarten},
    Date-Added = {2016-01-27 11:52:34 +0000},
    Date-Modified = {2016-09-20 14:20:08 +0000},
    Journal = {Foundations and Trends in Information Retrieval},
    Month = {September},
    Number = {4},
    Pages = {273--363},
    Title = {A survey of query auto completion in information retrieval},
    Volume = {10},
    Year = {2016}}

In information retrieval, query auto completion, also known as type-ahead and auto-complete suggestion, refers to the following functionality: given a prefix consisting of a number of characters entered into a search box, the user interface proposes alternative ways of extending the prefix to a full query. Ranking query completions is a challenging task due to the limited length of prefixes entered by users, the large volume of possible query completions matching a prefix, and the broad range of possible search intents. In recent years, a large number of query auto completion approaches have been proposed that produce ranked lists of alternative query completions by mining query logs.

In this survey, we review work on query auto completion that has been published before 2016. We focus mainly on web search and provide a formal definition of the query auto completion problem. We describe two dominant families of approaches to the query auto completion problem, one based on heuristic models and the other based on learning to rank. We also identify dominant trends in published work on query auto completion, viz. the use of time-sensitive signals and the use of user-specific signals. We describe the datasets and metrics that are used to evaluate algorithms for query auto completion. We also devote a chapter to efficiency and a chapter to presentation and interaction aspects of query auto completion. We end by discussing related tasks as well as potential research directions to further the area.

CIKM 2016 Workshop on Data-Driven Talen Acquisition

The Workshop on Data-Driven Talent Acquisition (DDTA’16) will be co-located with CIKM 2016, held in Indianapolis, USA, on October 28, 2016.

Expertise search is a well-established field in information retrieval. In recent years, the increasing availability of data enables accumulation of evidence of talent and expertise from a wide range of domains. The availability of big data significantly benefits employers and recruiters. By analyzing the massive amounts of structured and unstructured data, organizations may be able to find the exact skillsets and talent they need to grow their business. The aim of this workshop is to provide a forum for industry and academia to discuss the recent progress in talent search and management, and how the use of big data and data-driven decision making can advance talent acquisition and human resource management.

Important Dates

  • Submission deadline: September 8, 2016
  • Acceptance notification: September 22, 2016
  • Workshop date: October 28, 2016

Further details

Special issue on Neural Information Retrieval

The Information Retrieval Journal has put out a call for contributions to a special issue on neural information retrieval. Topics for this issue include the application of neural network models in IR tasks, including but not limited to:

  • Full text document retrieval, passage retrieval, question answering
  • Web search, paid search, searching social media, entity search
  • Learning to rank combined with neural network based representation learning
  • User and task modelling, personalized search and recommendations, diversity
  • Query formulation assistance, query recommendation, conversational search
  • Multimedia and cross-media retrieval

The deadline is October 15, 2016. Here’s a PDF with the call.

We’re hiring: 3 PhD positions in Academic Search

As part of a collaborative project between the University of Amsterdam, the VU University Amsterdam, the Royal Netherlands Academy of Arts and Sciences, and Elsevier, we are looking to fill three fully funded PhD student positions in the area of academic search. Please visit the University’s vacancy page for details on the project and on how and where to apply.

Algoritmen zijn niet neutraal. En dat is maar goed ook.

David Graus en ik schreven een stukje over neutraliteit en algoritmen voor NRC en NRC.Next onder de titel “Algoritmen zijn niet neutraal. En dat is maar goed ook.” Helaas meende de redactie er een andere titel boven te moeten plakken, een titel die nogal afleidt van onze boodschap.


(NRC Handelsblad, 17 juni 2016, pagina 16.)

Neu-IR: The SIGIR 2016 Workshop on Neural Information Retrieval

The deadline for submissions to Neu-IR: The SIGIR 2016 Workshop on Neural Information Retrieval is less than three weeks away. Neu-IR will be a highly interactive full day workshop, featuring a mix of presentation and interaction formats. We welcome application papers, papers that address fundamental modeling challenges, and best practices papers. Please see for details and submission instructions.

New flyer for WSDM 2017

The important dates for WSDM 2017 (paper submission deadline, notification date) are fixed. Here’s a new flyer with the details.

WSDM 2017 Flyer 3

SIGIR 2016 tutorial on online learning to rank

Artem Grotov and I will be teaching a half-day tutorial on online learning to rank for information retrieval at SIGIR 2016.

During the past 10–15 years offline learning to rank has had a tremendous influence on information retrieval, both scientifically and in practice. Recently, as the limitations of offline learning to rank for information retrieval have become apparent, there is increased attention for online learning to rank methods for information retrieval in the community. Such methods learn from user interactions rather than from a set of labeled data that is fully available for training up front.

Today’s search engines have developed into complex systems that combines hundreds of ranking criteria with the aim of producing the optimal result list in response to users’ queries. For automatically tuning optimal combinations of large numbers of ranking criteria, learning to rank (LTR) has proved invaluable. For a given query, each document is represented by a feature vector. The features may be query dependent, document dependent or capture the relationship between the query and documents. The task of the learner is to find a model that combines these features such that, when this model is used to produce a ranking for an unseen query, user satisfaction is maximized.

Traditionally, learning to rank algorithms are trained in batch mode, on a complete dataset of query and document pairs with their associated manually created relevance labels. This setting has a number of disadvantages and is impractical in many cases. First, creating such datasets is expensive and therefore infeasible for smaller search engines, such as small web-store search engines. Second, it may be impossible for experts to annotate documents, as in the case of personalized search. Third, the relevance of documents to queries can change over time, like in a news search engine.

Online learning to rank addresses all of these issues by incrementally learning from user feedback in real time. Online learning is closely related to active learning, incremental learning, and counterfactual learning. However, online learning is more difficult because the agent has to balance exploration and exploitation: actions with unknown performance have to be explored to learn better solutions.

There is a growing body of established methods for online learning to rank for information retrieval. The time is right to organize and present this material to a broad audience of interested information retrieval researchers, whether junior or senior, whether academic or industrial. The online learning to rank methods available today have been proposed by different communities, in machine learning and information retrieval. A key aim of the tutorial is to bring these together and offer a unified perspective. To achieve this we illustrate the core and state of the art methods in online learning to rank, their theoretical foundations and real-world applications, as well as existing online learning algorithms that have not been used by information retrieval community so far.

Neu-IR: SIGIR 2016 Workshop on Neural Information Retrieval

SIGIR 2016 will feature a workshop on Neural Information Retrieval. In recent years, deep neural networks have yielded significant performance improvements in application areas such as speech recognition and computer vision. They have also had an impact in natural language applications such as machine translation, image caption generation and conversational agents. Our focus with the Neu-IR workshop is on the applicability of deep neural networks to information retrieval. There are two complementary dimensions to this: one involves demonstrating performance improvements on public or private information retrieval datasets, the other concerns thinking about deep neural network architectures and what they tell us about information retrieval problems. Neu-IR (pronounced “new IR”) will be a highly interactive full day workshop that focuses on advances and challenges along both dimensions.

See, where further information will be shared.

« Older posts

© 2016 Maarten de Rijke

Theme by Anders NorenUp ↑