“Blending Vertical and Web results: A Case Study using Video Intent” by Damien Lefortier, Pavel Serdyukov, Fedor Romanenko and Maarten de Rijke is available online now.
Modern search engines aggregate results from specialized verticals into the Web search results. We study a setting where vertical and Web results are blended into a single result list, a setting that has not been studied before. We focus on video intent and present a detailed observational study of Yandex’s two video content sources (i.e., the specialized vertical and a subset of the general web index) thus providing insights into their complementary character. By investigating how to blend results from these sources, we contrast traditional federated search and fusion-based approaches with newly proposed approaches that significantly outperform the baseline methods.
“Query-dependent contextualization of streaming data” by Nikos Voskarides, Daan Odijk, Manos Tsagkias, Wouter Weerkamp and Maarten de Rijke is available online.
We propose a method for linking entities in a stream of short textual documents that takes into account context both inside a document and inside the history of documents seen so far. Our method uses a generic optimization framework for combining several entity ranking functions, and we introduce a global control function to control optimization. Our results demonstrate the effectiveness of combining entity ranking functions that take into account context, which is further boosted by 6% when we use an informed global control function.
“The impact of semantic document expansion on cluster-based fusion for microblog search” by Shangsong Liang, Zhaochun Ren and Maarten de Rijke is available online now.
Searching microblog posts, with their limited length and creative language usage, is challenging. We frame the microblog search problem as a data fusion problem. We examine the effectiveness of a recent cluster-based fusion method on the task of retrieving microblog posts. We find that in the optimal setting the contribution of the clustering information is very limited, which we hypothesize to be due to the limited length of microblog posts. To increase the contribution of the clustering information in cluster-based fusion, we integrate semantic document expansion as a preprocessing step. We enrich the content of microblog posts appearing in the lists to be fused by Wikipedia articles, based on which clusters are created. We verify the effectiveness of our combined document expansion plus fusion method by making comparisons with microblog search algorithms and other fusion methods.
“Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams” by David Graus, Manos Tsagkias, Lars Buitinck and Maarten de Rijke is available online now.
The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.
“Effects of Position Bias on Click-Based Recommender Evaluation” by Katja Hofmann, Anne Schuth, Alejandro Bellogin and Maarten de Rijke is available online now.
Measuring the quality of recommendations produced by a recommender system (RS) is challenging. Labels used for evaluation are typically obtained from users of a RS, by asking for explicit feedback, or inferring labels from implicit feedback. Both approaches can introduce significant biases in the evaluation process. We investigate biases that may affect labels inferred from implicit feedback. Implicit feedback is easy to collect but can be prone to biases, such as position bias. We examine this bias using click models, and show how bias following these models would affect the outcomes of RS evaluation. We find that evaluation based on implicit and explicit feedback can agree well, but only when the evaluation metrics are designed to take user behavior and preferences into account, stressing the importance of understanding user behavior in deployed RSs.
A few weeks ago I visited a local high school as part of a series of efforts to get more high school kids to maintain an interest in computer science and possibly study the subject in university. I gave a sneak preview of a new version of a demo that we’ve been working on with Manos Tsagkias and Wouter Weerkamp, Streamwatchr.
Streamwatchr offers a new way to discover and enjoy music through an innovative interface. We show, in real-time, what music people around the world are listening to. Each time Streamwatchr identifies a tweet in which someone reports about the song that he or she is listening to, it shows a tile with a photo of the artist and a play button (on mouse over) that does, indeed, play the song (from Youtube). Streamwatchr collects about 500,000 music tweets per day, which is about 6 tweets per second.
I visited a class of 12 and 13 year olds at a local high school here in Amsterdam. As my laptop and the beamer refused to talk to each other I walked around the class room with my laptop to demo Streamwatchr, with Streamwatchr running in full screen. While walking around I talked about some of the technology behind it (entity linking, data integration, open data, etc). Occasionally, I put the laptop down on a table so that the students could interact with Streamwatchr.
I was amazed to see how addictive the interface was … a screen full of tiles, 6 of which flip and change at random every second, kept every kid glued to the laptop. Later (private) demos of Streamwatchr to friends and colleagues led to similar scenes. As I observed the high school kids interact with Streamwatchr, some interesting questions came up. What is the appeal of random changes to visual elements at a pace that seems to be slightly higher than one can actively track? Is it that there is always something on the screen that you have not seen yet? But that you think you don’t want to miss? At which speed should those random changes occur to be optimally captivating? Should the changes really be random or should they provide maximal coverage of the screen (in the obvious spatial sense) to be optimally captivating? There’s a great set of experiments to be run there — an unexpected side product of an outreach activity.
“Optimizing Base Rankers Using Clicks: A Case Study Using BM25” by Anne Schuth, Floor Sietsma, Shimon Whiteson and Maarten de Rijke is available online now.
We study the problem of optimizing an individual base ranker using clicks. Surprisingly, while there has been considerable attention for using clicks to optimize linear combinations of base rankers, the problem of optimizing an individual base ranker using clicks has been ignored. The problem is different from the problem of optimizing linear combinations of base rankers as the scoring function of a base ranker may be highly non-linear. For the sake of concreteness, we focus on the optimization of a specific base ranker, viz. BM25. We start by showing that significant improvements in performance can be obtained when optimizing the parameters of BM25 for individual datasets. We also show that it is possible to optimize these parameters from clicks, i.e., without the use of manually annotated data, reaching or even beating manually tuned parameters.
“Relative Confidence Sampling for Efficient On-Line Ranker Evaluation” by Masrour Zoghi, Shimon Whiteson, Maarten de Rijke and Remi Munos is available online now.
A key challenge in information retrieval is that of on-line ranker evaluation: determining which one of a finite set of rankers performs the best in expectation on the basis of user clicks on presented document lists. When the presented lists are constructed using interleaved comparison methods, which interleave lists proposed by two different candidate rankers, then the problem of minimizing the total regret accumulated while evaluating the rankers can be formalized as a K-armed dueling bandits problem. In this paper, we propose a new method called relative confidence sampling (RCS) that aims to reduce cumulative regret by being less conservative than existing methods in eliminating rankers from contention. In addition, we present an empirical comparison between RCS and two state-of-the-art methods, relative upper confidence bound andSAVAGE. The results demonstrate that RCS can substantially outperform these alternatives on several large learning to rank datasets.