Maarten de Rijke

Information retrieval

Month: December 2015

Nikos Voskarides wins 2014/2015 STIL Thesis Award

Congratulations to my PhD student Nikos Voskarides for winning the 2014/2015 STIL Thesis Award for his MSc thesis “Explaining relationships between entities.” The prize was awarded on December 18, 2015, at CLIN-26 in Amsterdam. His MSc thesis was supervised by Edgar Meij and Manos Tsagkias.

WSDM 2016 paper on dynamic collective entity representations for entity ranking online

The following WSDM 2016 paper is online now:

  • David Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, and Maarten de Rijke. Learning dynamic collective entity representations for entity ranking. In WSDM 2016: The 9th International Conference on Web Search and Data Mining, pages 595-604. ACM, February 2016. Bibtex, PDF
    @inproceedings{graus-dynamic-2016,
    Author = {Graus, David and Tsagkias, Manos and Weerkamp, Wouter and Meij, Edgar and de Rijke, Maarten},
    Booktitle = {WSDM 2016: The 9th International Conference on Web Search and Data Mining},
    Date-Added = {2015-10-12 18:42:35 +0000},
    Date-Modified = {2016-05-22 17:59:44 +0000},
    Month = {February},
    Pages = {595--604},
    Publisher = {ACM},
    Title = {Learning dynamic collective entity representations for entity ranking},
    Year = {2016}}

Entity ranking, i.e., successfully positioning a relevant entity at the top of the ranking for a given query, is inherently difficult due to the potential mismatch between the entity’s description in a knowledge base, and the way people refer to the entity when searching for it. To counter this issue we propose a method for constructing dynamic collective entity representations. We collect entity descriptions from a variety of sources and combine them into a single entity representation by learning to weight the content from different sources that are associated with an entity for optimal retrieval effectiveness. Our method is able to add new descriptions in real time and learn the best representation as time evolves so as to capture the dynamics of how people search entities. Incorporating dynamic description sources into dynamic collective entity representations improves retrieval effectiveness by 7% over a state-of-the-art learning to rank baseline. Periodic retraining of the ranker enables higher ranking effectiveness for dynamic collective entity representations.

Three full papers accepted at WWW 2016

Good news. Three full papers were accepted at the 25th Word Wide Web Conference:

  • Alexey Borisov, Pavel Serdyukov and Maarten de Rijke: Using Metafeatures to Increase the Effectiveness of Latent Semantic Models in Web Search
  • Alexey Borisov, Ilya Markov, Maarten de Rijke and Pavel Serdyukov: A Distributed Representation Approach to Modeling User Browsing Behavior in Web Search
  • Christophe Van Gysel, Maarten de Rijke and Marcel Worring: Unsupervised, Efficient and Semantic Expertise Retrieval

NIPS workshop paper on sources of variability in large-scale machine learning experiments online

Our contribution to the NIPS LearningSys 2015 Workshop on Machine Leaning Systems by Damien Lefortier, Anthony Truchet and Maarten de Rijke is available online now:

  • Damien Lefortier, Anthony Truchet, and Maarten de Rijke. Sources of variability in large-scale machine learning systems. In NIPS LearningSys 2015 Workshop on Machine Learning Systems, December 2015. Bibtex, PDF
    @inproceedings{lefortier-sources-2015,
    Author = {Lefortier, Damien and Truchet, Anthony and de Rijke, Maarten},
    Booktitle = {NIPS LearningSys 2015 Workshop on Machine Learning Systems},
    Date-Added = {2015-11-03 15:49:55 +0000},
    Date-Modified = {2016-04-03 17:49:40 +0000},
    Month = {December},
    Title = {Sources of variability in large-scale machine learning systems},
    Year = {2015}}

We investigate sources of variability of a state-of-the-art distributed machine learning system for learning click and conversion prediction models for display advertising. We focus on three main sources of variability: asynchronous updates in the learning algorithm, downsampling of the data, and the non-deterministic order of examples received by each learning instance. We observe that some sources of variability can lead to significant differences between the models obtained and cause issues for, e.g., regression testing, debugging, and offline evaluation. We present effective solutions to stabilize the system and remove these sources of variability, thus fully solving the issues related to regression testing and to debugging. Moreover, we discuss potential limitations of this stabilization for drawing conclusions, in which case we may want to take the variability produced by the machine learning system into account in confidence intervals.

© 2017 Maarten de Rijke

Theme by Anders NorenUp ↑