WSDM 2016 paper on Multileave Gradient Descent for Fast Online Learning to Rank online

One of our WSDM 2016 papers is online now:

Anne Schuth, Harrie Oosterhuis, Shimon Whiteson, and Maarten de Rijke. Multileave gradient descent for fast online learning to rank. In WSDM 2016: The 9th International Conference on Web Search and Data Mining, page 457–466. ACM, February 2016. Bibtex, PDF

@inproceedings{schuth-multileave-2016,
author = {Schuth, Anne and Oosterhuis, Harrie and Whiteson, Shimon and de Rijke, Maarten},
booktitle = {WSDM 2016: The 9th International Conference on Web Search and Data Mining},
date-added = {2015-10-12 18:46:10 +0000},
date-modified = {2016-05-22 17:59:17 +0000},
month = {February},
pages = {457--466},
publisher = {ACM},
title = {Multileave gradient descent for fast online learning to rank},
year = {2016}}

Modern search systems are based on dozens or even hundreds of ranking features. The dueling bandit gradient descent (DBGD) algorithm has been shown to effectively learn combinations of these features solely from user interactions. DBGD explores the search space by comparing a possibly improved ranker to the current production ranker. To this end, it uses interleaved comparison methods, which can infer with high sensitivity a preference between two rankings based only on interaction data. A limiting factor is that it can compare only to a single exploratory ranker.

We propose an online learning to rank algorithm called multileave gradient descent (MGD) that extends DBGD to learn from so-called multileaved comparison methods that can compare a set of rankings instead of merely a pair. We show experimentally that MGD allows for better selection of candidates than DBGD without the need for more comparisons involving users. An important implication of our results is that orders of magnitude less user interaction data is required to find good rankers when multileaved comparisons are used within online learning to rank. Hence, fewer users need to be exposed to possibly inferior rankers and our method allows search engines to adapt more quickly to changes in user preferences.