A tutorial at SIGIR 2016


During the past 10-15 years offline learning to rank has had a tremendous influence on information retrieval, both scientifically and in practice. Recently, as the limitations of offline learning to rank for information retrieval have become apparent, there is increased attention for online learning to rank methods for information retrieval in the community. Such methods learn from user interactions rather than from a set of labeled data that is fully available for training up front. The time is right for an intermediate-level tutorial on online learning to rank.

Online learning to rank from user interactions is fundamentally different from currently dominant supervised learning to rank approaches for information retrieval, where training data is assumed to be randomly sampled from some underlying distribution, and where absolute and reliable labels are provided by professional annotators. When learning from user interactions, a system has no control over which queries it receives, it only receives feedback on the result lists it presents to users, and it has to present high quality result lists while learning, to satisfy user expectations.

In this tutorial we formulate online learning to rank as a reinforcement learning problem, in which an agent, the search engine, learns from interactions with an environment, the user and their interactions, by trying out actions (e.g., returning a ranked list of items) and observing rewards (e.g., interpreting user feedback as absolute or relative feedback) in multiple rounds or discrete time steps. Particularly relevant to this tutorial are methods for tackling so-called contextual bandit problems (also known as bandits with side information). A contextual bandit problem is a special case of a reinforcement learning problem in which states are independent of the agent’s actions. In information retrieval terms, the context could consist of the user and the query and the actions are the search engine result pages. A difference between typical contextual bandit formulations and online learning to rank for information retrieval is that in information retrieval (absolute) rewards cannot be observed directly. Instead, feedback for learning is inferred from observed user interactions as noisy preference indications.

An important benefit of reducing IR problems to bandit approaches is that the rich body of work on bandit approaches can be used. At the same time, information retrieval poses unique challenges that inspire additional research on bandit algorithms. The objectives of the tutorial are as follows:

  • To describe existing online LTR algorithms in a unified way, i.e., using common notation and terminology, so that different models can easily be related to each other.
  • Explain the importance of balancing exploration and exploitation in an online LTR setting.
  • To explain how to analyse the performance of online LTR algorithms and why it is worth the effort.
  • To present appropriate experimental and evaluation methodologies for online LTR in both synthetic and real world settings.
  • To describe how to deploy online LTR algorithms in an industrial setting.
  • To present online learning algorithms that have not been used yet for learning to rank and indicate how IR researchers might go about using them.
  • To discuss future directions of research in online LTR.



Artem Grotov is a PhD candidate at the Informatics Institute of the University of Amsterdam. He works on online Learning to Rank as well as other topics that deal with interpreting data obtained from user interactions with interactive systems. His work on click models and evaluating search engines based on logged user interaction have been published at SIGIR 2015 and CLEF 2015. Grotov has helped to teach multiple courses on Information Retrieval and has been a thesis supervisor for Bachelor and Master students working on algorithmic information retrieval topics.

Maarten de Rijke is a Professor of Computer Science at the Informatics Institute of the University of Amsterdam. Together with a team of PhD students and postdocs he works on problems on semantic search and on- and offline learning to rank for information retrieval. Some of their recent work on on- and offline learning to rank has been (or will be) published at ICML 2014, WSDM 2014, CIKM 2014, WSDM 2015, SIGIR 2015, NIPS 2015, WSDM 2016, ECIR 2016, WWW 2016, SIGIR 2016. De Rijke has taught extensively at all levels, from public lectures on search engine technology to advanced tutorials aimed at PhD students and researchers. Recent tutorials include SIGIR 2015 and WSDM 2016.


We are working on a survey on online learning to rank and hope to have a first mature draft in September 2016, which will be shared here.


We will be sharing the bibtex file for the survey here.