Maarten de Rijke

Information retrieval

Month: January 2016

WWW 2016 papers online

We have three full papers at WWW this year. They are all online now:

  • Alexey Borisov, Pavel Serdyukov, and Maarten de Rijke. Using metafeatures to increase the effectiveness of latent semantic models in web search. In WWW 2016: 25th International World Wide Web Conference, pages 1081-1091. ACM, April 2016. Bibtex, PDF
    @inproceedings{borisov-using-2016,
    Author = {Borisov, Alexey and Serdyukov, Pavel and de Rijke, Maarten},
    Booktitle = {WWW 2016: 25th International World Wide Web Conference},
    Date-Added = {2015-12-15 21:51:14 +0000},
    Date-Modified = {2016-05-22 18:00:45 +0000},
    Month = {April},
    Pages = {1081--1091},
    Publisher = {ACM},
    Title = {Using metafeatures to increase the effectiveness of latent semantic models in web search},
    Year = {2016}}

In web search, latent semantic models have been proposed to bridge the lexical gap between queries and documents that is due to the fact that searchers and content creators often use different vocabularies and language styles to express the same concept. Modern search engines simply use the outputs of latent semantic models as features for a so-called global ranker. We argue that this is not optimal, because a single value output by a latent semantic model may be insufficient to describe all aspects of the model’s prediction, and thus some information captured by the model is not used effectively by the search engine.

To increase the effectiveness of latent semantic models in web search, we propose to create metafeatures—feature vectors that describe the structure of the model’s prediction for a given query-document pair—and pass them to the global ranker along with the models’ scores. We provide simple guidelines to represent the latent semantic model’s prediction with more than a single number, and illustrate these guidelines using several latent semantic models.

We test the impact of the proposed metafeatures on a web document ranking task using four latent semantic models. Our experiments show that (1) through the use of metafeatures, the performance of each individual latent semantic model can be improved by 10.2% and 4.2% in NDCG scores at truncation levels 1 and 10; and (2) through the use of metafeatures, the performance of a combination of latent semantic models can be improved by 7.6% and 3.8% in NDCG scores at truncation levels 1 and 10, respectively.

  • Alexey Borisov, Ilya Markov, Maarten de Rijke, and Pavel Serdyukov. A neural click model for web search. In WWW 2016: 25th International World Wide Web Conference, pages 531-541. ACM, April 2016. Bibtex, PDF
    @inproceedings{borisov-neural-2016,
    Author = {Borisov, Alexey and Markov, Ilya and de Rijke, Maarten and Serdyukov, Pavel},
    Booktitle = {WWW 2016: 25th International World Wide Web Conference},
    Date-Added = {2015-12-15 21:48:25 +0000},
    Date-Modified = {2016-05-22 18:00:26 +0000},
    Month = {April},
    Pages = {531--541},
    Publisher = {ACM},
    Title = {A neural click model for web search},
    Year = {2016}}

Understanding user browsing behavior in web search is key to improving web search effectiveness. Many click models have been proposed to explain or predict user clicks on search engine results. They are based on the probabilistic graphical model (PGM) framework, in which user behavior is represented as a sequence of observable and hidden events. The PGM framework provides a mathematically solid way to reason about a set of events given some information about other events. But the structure of the dependencies between the events has to be set manually. Different click models use different hand-crafted sets of dependencies.

We propose an alternative based on the idea of distributed representations: to represent the user’s information need and the information available to the user with a vector state. The components of the vector state are learned to represent concepts that are useful for modeling user behavior. And user behavior is modeled as a sequence of vector states associated with a query session: the vector state is initialized with a query, and then iteratively updated based on information about interactions with the search engine results. This approach allows us to directly understand user browsing behavior from click-through data, i.e., without the need for a predefined set of rules as is customary for PGM-based click models.

We illustrate our approach using a set of neural click models. Our experimental results show that the neural click model that uses the same training data as traditional PGM-based click models, has better performance on the click prediction task (i.e., predicting user click on search engine results) and the relevance prediction task (i.e., ranking documents by their relevance to a query). An analysis of the best performing neural click model shows that it learns similar concepts to those used in traditional click models, and that it also learns other concepts that cannot be designed manually.

  • Christophe Van Gysel, Maarten de Rijke, and Marcel Worring. Unsupervised, efficient and semantic expertise retrieval. In WWW 2016: 25th International World Wide Web Conference, pages 1069-1079. ACM, April 2016. Bibtex, PDF
    @inproceedings{vangysel-unsupervised-2016,
    Author = {Van Gysel, Christophe and de Rijke, Maarten and Worring, Marcel},
    Booktitle = {WWW 2016: 25th International World Wide Web Conference},
    Date-Added = {2015-12-15 21:52:52 +0000},
    Date-Modified = {2016-05-22 18:01:03 +0000},
    Month = {April},
    Pages = {1069--1079},
    Publisher = {ACM},
    Title = {Unsupervised, efficient and semantic expertise retrieval},
    Year = {2016}}

We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. We exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations in an unsupervised way. We compare our model to state-of-the-art unsupervised statistical vector space and probabilistic generative approaches. Our proposed log-linear model achieves the retrieval performance levels of state-of-the-art document-centric methods with the low inference cost of so-called profile-centric approaches. It yields a statistically significant improved ranking over vector space and generative models in most cases, matching the performance of supervised methods on various benchmarks. That is, by using solely text we can do as well as methods that work with external evidence and/or relevance feedback. A contrastive analysis of rankings produced by discriminative and generative approaches shows that they have complementary strengths due to the ability of the unsupervised discriminative model to perform semantic matching.

NWO grant on learning to rank for information retrieval

Good news from NWO today. I received a grant from NWO to support two PhD students and a postdoc to work on learning to rank (LTR) for information retrieval. The goal is to lay the foundations for contextual LTR methods, which automatically construct the right ranking features based on the query context. The project will use use data collected through natural interactions with a live system to extract features that might not be obvious to researchers who manually design the ones currently used for LTR.

Media coverage in Le Monde

On the occasion of the 15th anniversary of Wikipedia, French newspaper Le Monde has devoted a full page to Wikipedia, which includes a discussion of the role of Wikipedia in research, especially around semantic search and the knowledge graph. In this setting our work on semantic analysis of microblogs, entity linking, and linking content for live TV was discussed. Visit Le Monde or read this PDF.

© 2017 Maarten de Rijke

Theme by Anders NorenUp ↑