My colleague Lars Buitinck and I received a grant from the HPC fund to support the development of a more scalable version of xTAS, our extensible text analysis service. It’s the pipeline that we (and others) use for our text mining work. This is great news as it allows us to port modules to the…
The autonomous search engine
After Tuesday’s talk on personal data mining, I gave another talk to non-experts on Thursday. This time the topic was “The Autonomous Search Engine”. The backbone of the story is the move from supervised to weakly supervised technology development of one of the core components of search engines: rankers. Weak supervision in this context means…
Life mining talk
I gave a talk aimed at the general public on personal data mining last night, in Maastricht. The talk is about explaining what type of information can be mined from the content of open sources (news, social media, etc) using state of the art search and text mining technology. And the focus is on extracting personal information,…
Microsoft PhD Fellowship
For a proposal entitled “Leveraging Data Reuse for Efficient Ranker Evaluation in Information Retrieval”, my colleague Shimon Whiteson and I received funding. The proposal was submitted to the Microsoft Research PhD Scholarship Programme. The project is a collaboration with Filip Radlinksi and will run for three years, with a start planned in the fall. We’ll…
ECIR 2014 paper on blending vertical and web results online
“Blending Vertical and Web results: A Case Study using Video Intent” by Damien Lefortier, Pavel Serdyukov, Fedor Romanenko and Maarten de Rijke is available online now. Modern search engines aggregate results from specialized verticals into the Web search results. We study a setting where vertical and Web results are blended into a single result list, a setting…
ECIR 2014 paper on query-dependent contextualization of streaming data online
“Query-dependent contextualization of streaming data” by Nikos Voskarides, Daan Odijk, Manos Tsagkias, Wouter Weerkamp and Maarten de Rijke is available online. We propose a method for linking entities in a stream of short textual documents that takes into account context both inside a document and inside the history of documents seen so far. Our method uses a…
ECIR 2014 paper on cluster-based fusion for microblog search online
“The impact of semantic document expansion on cluster-based fusion for microblog search” by Shangsong Liang, Zhaochun Ren and Maarten de Rijke is available online now. Searching microblog posts, with their limited length and creative language usage, is challenging. We frame the microblog search problem as a data fusion problem. We examine the effectiveness of a recent cluster-based…
ECIR 2014 paper on predicting new concepts in social streams online
“Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams” by David Graus, Manos Tsagkias, Lars Buitinck and Maarten de Rijke is available online now. The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and…
ECIR 2014 paper on click-based recommender evaluation online
“Effects of Position Bias on Click-Based Recommender Evaluation” by Katja Hofmann, Anne Schuth, Alejandro Bellogin and Maarten de Rijke is available online now. Measuring the quality of recommendations produced by a recommender system (RS) is challenging. Labels used for evaluation are typically obtained from users of a RS, by asking for explicit feedback, or inferring labels from…
Going out with Streamwatchr
A few weeks ago I visited a local high school as part of a series of efforts to get more high school kids to maintain an interest in computer science and possibly study the subject in university. I gave a sneak preview of a new version of a demo that we’ve been working on with…