DESI VI paper on semantic search for e-discovery online

“Who is Involved? Semantic Search for E-Discovery,” a synthesis of some of our recent e-discovery related work, by David van Dijk, David Graus, Zhaochun Ren, Hans Henseler and Maarten de Rijke will be presented at the ICAIL 2015 Workshop on Using Machine Learning and Other Advanced Techniques to Address Legal Problems in E-Discovery and Information Governance (DESI VI Workshop). It is available online now.

E-discovery projects typically start with an assessment of the collected electronic data in order to estimate the risk to prosecute or defend a legal case. This is not a review task but is appropriately called early case assessment, which is bet- ter known as exploratory search in the information retrieval community. This paper first describes text mining method- ologies that can be used for enhancing exploratory search. Based on these ideas we present a semantic search dashboard that includes entities that are relevant to investigators such as who knew who, what, where and when. We describe how this dashboard can be powered by results from our ongoing research in the “Semantic Search for E-Discovery” project on topic detection and clustering, semantic enrichment of user profiles, email recipient recommendation, expert finding and identity extraction from digital forensic evidence.

SIGIR 2015 tutorials on click models

Aleksandr Chuklin, Ilya Markov and I will be teaching an introductory and advanced tutorial on click models for web search at SIGIR 2015, on August 10, 2015.

The tutorial is based on a forthcoming book on click models for web search. Participants will have access to the book, slides, as well as code used for demo sessions during the tutorial. The advanced part of the tutorial will also feature a guest speaker who will talk about developing new click models.

We’ll announce more details as soon as we can.

Now playing: Jamie xx — Far Nearer

ICWSM 2015 paper on determining the presence of political parties in social circles online

“Determining the Presence of Political Parties in Social Circles” by Christophe Van Gysel, Bart Goethals and Maarten de Rijke is available online.

In the paper, we derive the political climate of the social circles of Twitter users using a weakly-supervised approach. By applying random walks over a sub-sample of Twitter’s social graph we infer a distribution indicating the presence of eight Flemish political parties in users’ social circles in the months before the 2014 elections. The graph structure is induced through a combination of connection and retweet features and combines information of over a million tweets and 14 million follower connections. We solely exploit the social graph structure and do not rely on tweet content. For validation we compare the affiliation of politically active Twitter users with the most-influential party in their network. On a validation set of around 700 politically active individuals we achieve F1 scores of 0:85 and greater. We asked the Twitter community to evaluate our classification performance. More than half of the 2 258 users who responded reported a score higher than 60 out of 100.

CfP: 1st Workshop on User Modeling in Heterogeneous Search Environments (HetUM 2015) in conjunction with CIKM 2015

1st Workshop on User Modeling in Heterogeneous Search Environments (HetUM 2015) in conjunction with CIKM 2015
19 October 2015, Melbourne, Australia


Regular paper submission: 19 June 2015
Special track for re-submissions of CIKM papers: 8 July 2015
Notification of acceptance: 23 July 2015
Camera ready: 7 August 2015
Workshop: 19 October 2015

When users interact with information retrieval (IR) systems, they leave rich implicit feedback in the form of clicks, mouse movements, etc. This feedback contains valuable information about users and about IR systems. Analyzing and interpreting user interactions and modeling user search behavior has become an important research direction. It enables us to better understand users, perform user simulations, improve search algorithms and build quality metrics.


We’re hiring: Fully funded PhD student and two postdocs

We’re looking for three strong candidates for a fully funded PhD student position in information retrieval, a three-year postdoc position in information retrieval, and a three-year postdoc position in media studies. This is a collaborative project, called MediaNow, between the Informatics Institute and Department of Media Studies of the University of Amsterdam and the Netherlands Institute for Sound and Vision. You can find out more about the project, plus links to the advertisements etc., at The deadline for applications is April 12, 2015. Interviews are scheduled for May 1, 2015.

Fully funded PhD position in machine learning for information retrieval

We’re looking for a strong candidate for a fully funded four-year PhD position on a collaborative project with Microsoft Research Cambridge. The research will focus on the development of new algorithms for leveraging data reuse in order to efficiently evaluate and optimize the behavior of information retrieval systems. See this page for the advertisement, further requirements, and conditions. The deadline for applications is March 22, 2015.

NWO grant: MediaNow project on narrative search engines

José van Dijck, Johan Oomen and I obtained an NWO Creative Industries grant to work on next generation search engine technologies for for exploring large multimedia archives. The target users are media-professionals. The proposed innovations at the interface of computer science and media studies come in three kinds. First, we will develop, test and release self-learning search algorithms that adapt and improve their behavior while being used. Second, we will create robust methods for semantically analyzing content in media archives. Third, we will develop new search engine result page presentations that provide automatically generated storylines as narratives for professionals in the creative industries. The algorithmic solutions will be implemented in the research environment of the Netherlands Institute for Sound and Vision and released as open source search solutions.

Now playing: Bright Eyes — Messenger Bird’s Song

ACM TOIS paper on a comparative analysis of interleaving methods for aggregated search online

Our ACM Transactions on Information Systems paper called “A comparative analysis of interleaving methods for aggregated search” by Aleksandr Chuklin, Anne Schuth, Ke Zhou and Maarten de Rijke is available online now.

A result page of a modern search engine often goes beyond a simple list of “ten blue links.” Many specific user needs (e.g., News, Image, Video) are addressed by so-called aggregated or vertical search solutions: specially presented documents, often retrieved from specific sources, that stand out from the regular organic web search results. When it comes to evaluating ranking systems, such complex result layouts raise their own challenges. This is especially true for so-called interleaving methods that have arisen as an important type of online evaluation: by mixing results from two different result pages interleaving can easily break the desired web layout in which vertical documents are grouped together, and hence hurt the user experience.

We conduct an analysis of different interleaving methods as applied to aggregated search engine result pages. Apart from conventional interleaving methods, we propose two vertical-aware methods: one derived from the widely used Team-Draft Interleaving method by adjusting it in such a way that it respects vertical document groupings, and another based on the recently introduced Optimized Interleaving framework. We show that our proposed methods are better at preserving the user experience than existing interleaving methods while still performing well as a tool for comparing ranking systems. For evaluating our proposed vertical-aware interleaving methods we use real world click data as well as simulated clicks and simulated ranking systems.