Fully funded PhD position in machine learning for information retrieval

We’re looking for a strong candidate for a fully funded four-year PhD position on a collaborative project with Microsoft Research Cambridge. The research will focus on the development of new algorithms for leveraging data reuse in order to efficiently evaluate and optimize the behavior of information retrieval systems. See this page for the advertisement, further requirements, and conditions. The deadline for applications is March 22, 2015.

NWO grant: MediaNow project on narrative search engines

Jos van Dijck, Johan Oomen and I obtained an NWO Creative Industries grant to work on next generation search engine technologies for for exploring large multimedia archives. The target users are media-professionals. The proposed innovations at the interface of computer science and media studies come in three kinds. First, we will develop, test and release self-learning search algorithms that adapt and improve their behavior while being used. Second, we will create robust methods for semantically analyzing content in media archives. Third, we will develop new search engine result page presentations that provide automatically generated storylines as narratives for professionals in the creative industries. The algorithmic solutions will be implemented in the research environment of the Netherlands Institute for Sound and Vision and released as open source search solutions.

Now playing: Bright Eyes -- Messenger Bird's Song

ACM TOIS paper on a comparative analysis of interleaving methods for aggregated search online

Our ACM Transactions on Information Systems paper called “A comparative analysis of interleaving methods for aggregated search” by Aleksandr Chuklin, Anne Schuth, Ke Zhou and Maarten de Rijke is available online now.

A result page of a modern search engine often goes beyond a simple list of ``ten blue links.'' Many specific user needs (e.g., News, Image, Video) are addressed by so-called aggregated or vertical search solutions: specially presented documents, often retrieved from specific sources, that stand out from the regular organic web search results. When it comes to evaluating ranking systems, such complex result layouts raise their own challenges. This is especially true for so-called interleaving methods that have arisen as an important type of online evaluation: by mixing results from two different result pages interleaving can easily break the desired web layout in which vertical documents are grouped together, and hence hurt the user experience.

We conduct an analysis of different interleaving methods as applied to aggregated search engine result pages. Apart from conventional interleaving methods, we propose two vertical-aware methods: one derived from the widely used Team-Draft Interleaving method by adjusting it in such a way that it respects vertical document groupings, and another based on the recently introduced Optimized Interleaving framework. We show that our proposed methods are better at preserving the user experience than existing interleaving methods while still performing well as a tool for comparing ranking systems. For evaluating our proposed vertical-aware interleaving methods we use real world click data as well as simulated clicks and simulated ranking systems.

IPM paper on burst-aware data fusion for microblog search online

An Information Processing & Management paper on burst-aware data fusion for microblog search by Shangsong Liang and Maarten de Rijke is online now.

We consider the problem of searching posts in microblog environments. We frame this microblog post search problem as a late data fusion problem. Previous work on data fusion has mainly focused on aggregating document lists based on retrieval status values or ranks of documents without fully utilizing temporal features of the set of documents being fused. Additionally, previous work on data fusion has often worked on the assumption that only documents that are highly ranked in many of the lists are likely to be of relevance. We propose BurstFuseX, a fusion model that not only utilizes a microblog post’s ranking information but also exploits its publication time. BurstFuseX builds on an existing fusion method and rewards posts that are published in or near a burst of posts that are highly ranked in many of the lists being aggregated. We experimentally verify the effectiveness of the proposed late data fusion algorithm, and demonstrate that in terms of mean average precision it significantly outperforms the standard, state-of-the-art fusion approaches as well as burst or time-sensitive retrieval methods.

ECIR 2015 on automatically assessing Wikipedia article quality by exploiting article-editor networks online

Our ECIR 2015 paper on automatically assessing article quality by exploiting article-bitor networks by Xinyi Li, Jintao Tang, Ting Wang, Zhunchen Luo and Maarten de Rijke is online now.

We consider the problem of automatically assessing Wikipedia article quality. We develop several models to rank articles by using the editing relations between articles and editors. First, we create a basic model by modeling the article-editor network. Then we design measures of an editor's contribution and build weighted models that improve the ranking performance. Finally, we use a combination of featured article information and the weighted models to obtain the best performance. We find that using manual evaluation to assist automatic evaluation is a viable solution for the article quality assessment task on Wikipedia.

ECIR 2015 paper on multi-emotion detection in user-generated reviews online

Our ECIR 2015 paper on multi-emotion detection in user-generated reviews by Lars Buitinck, Jesse van Amerongen, Ed Tan and Maarten de Rijke is online now.

Expressions of emotion abound in user-generated content, whether it be in blogs, reviews, or on social media. Much work has been devoted to detecting and classifying these emotions, but little of it has acknowledged the fact that emotionally charged text may express multiple emotions at the same time. We describe a new dataset of user-generated movie reviews annotated for emotional expressions, and experimentally validate two algorithms that can detect multiple emotions in each sentence of these reviews.

ECIR 2015 paper on user behavior in location search on mobile devices online

Our ECIR 2015 paper on user behavior in location search on mobile devices by Yaser Norouzzadeh Ravari, Ilya Markov, Artem Grotov, Maarten Clements and Maarten de Rijke is online now.

Location search engines are an important part of GPS-enabled devices such as mobile phones and tablet computers. In this paper, we study how users behave when they interact with a location search engine by analyzing logs from a popular GPS-navigation service to find out whether mobile users' location search characteristics differ from those of regular web search. In particular, we analyze query- and session-based characteristics and the temporal distribution of location searches performed on smart phones and tablet computers. Our findings may be used to improve the design of search interfaces in order to help users perform location search more effectively and improve the overall experience on GPS-enabled mobile devices.

Update to Streamwatchr live

In the run-up to the Buma Music meets Tech Award at Noorderslag 2015, an update to our music discovery demonstrator Streamwatchr has gone live. An improved interface that is easier on your device’s battery life, some new functionality and a Twitter bot called @lyricswatchr are the most important ingredients of the update.