Month: August 2014

DSRC Is Now Amsterdam Data Science

The Data Science Research Center is now Amsterdam Data Science. This name change is to better reflect the goals of the center to act as a focal point for data science within Amsterdam encompassing not just research but also education and innovation.

IIiX 2014 short paper on social bookmarking online

Our IIiX 2014 short paper on “A social bookmarking system to support cluster-driven archival arrangement” by Marc Bron, Shenghui Wang, Titia van der Werf and Maarten de Rijke is available online now.

Cultural heritage materials are increasingly being made available through standard search facilities. However, it is challenging to automatically organize these materials in a way that is well aligned with users’ specific interests. We report on the development of a social bookmaking system to collect human annotations that are used to measure the performance of three different clustering algorithms. We find that there is a discrepancy between the latent structure present in the data and the clusters annotated by humans. However, it is difficult to detect such discrepancies explicitly.

ACM SIGIR Workshop on Gathering Efficient Assessments of Relevance (GEAR) paper online

With some delays, because of the holidays and traveling, but the short paper that Aleksandr Chuklin and I published at the ACM SIGIR Workshop on Gathering Efficient Assessments of Relevance (GEAR) is online. It’s called “The anatomy of relevance: topical, snippet and perceived relevance in search result evaluation. And you can find it here.

Currently, the quality of a search engine is often determined using so-called topical relevance, i.e., the match between the user intent (expressed as a query) and the \emph{content} of the document. In this work we want to draw attention to two aspects of retrieval system performance affected by the \emph{presentation} of results: result attractiveness (“perceived relevance”) and immediate usefulness of the snippets (“snippet relevance”). Perceived relevance may influence discoverability of good topical documents and seemingly better rankings may in fact be less useful to the user if good-looking snippets lead to irrelevant documents or vice-versa. And result items on a search engine result page (SERP) with high snippet relevance may add towards the total utility gained by the user even without the need to click those items.

We start by motivating the need to collect different aspects of relevance (topical, perceived and snippet relevances) and how these aspects can improve evaluation measures. We then discuss possible ways to collect these relevance aspects using crowdsourcing and the challenges arising from that.

CLARIAH 12M Euro Grant

NWO has awarded 12M Euro to CLARIAH, a project to build a digital infrastructure for software, data, enrichment, search and analytics in the humanities. Frank van Harmelen, Cees Snoek and myself are the computer scientists that are part of the core team of the project. See for more details.

