Maarten de Rijke

Information retrieval

The Birth of Collective Memories

David Graus, Daan Odijk and I just uploaded a manuscript to arXiv on “The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams.”

We study how collective memories are formed online. We do so by tracking entities that emerge in public discourse, that is, in online text streams such as social media and news streams, before they are incorporated into Wikipedia, which, we argue, can be viewed as an online place for collective memory. By tracking how entities emerge in public discourse, i.e., the temporal patterns between their first mention in online text streams and subsequent incorporation into collective memory, we gain insights into how the collective remembrance process happens online. Specifically, we analyze nearly 80,000 entities as they emerge in online text streams before they are incorporated into Wikipedia. The online text streams we use for our analysis comprise of social media and news streams, and span over 579 million documents in a timespan of 18 months.

We discover two main emergence patterns: entities that emerge in a “bursty” fashion, i.e., that appear in public discourse without a precedent, blast into activity and transition into collective memory. Other entities display a “delayed” pattern, where they appear in public discourse, experience a period of inactivity, and then resurface before transitioning into our cultural collective memory.

Please see https://arxiv.org/abs/1701.04039 and let us know what you think.

We’re hiring (again): Fully funded PhD position in responsible people search

logo-uvaWe have another vacancy for a fully funded four-year PhD position in responsible data science and HR analytics. The starting date is “as soon as possible.” Please visit the UvA vacancies pages for details on how to apply.

We’re hiring: Fully funded PhD student in academic search

logo-uvaWe have a vacancy for a fully funded four-year PhD position in academic search. The starting date is “as soon as possible.” Please visit the UvA vacancies pages for details on how to apply.

FAT/WEB: Workshop on Fairness, Accountability, and Transparency on the Web

Recent academic and journalistic reviews of online web services have revealed that many systems exhibit subtle biases reflecting historic discrimination. Examples include racial and gender bias in search advertising, image recognition services, sharing economy mechanisms, pricing, and web-based delivery. The list of production systems exhibiting biases continues to grow and may be endemic to the way models are trained and the data used.

At the same time, concerns about user autonomy and fairness have been raised in the context of web-based experimentation such as A/B testing or explore/exploit algorithms. Given the ubiquity of this practice and increasing adoption in potentially-sensitive domains (e.g. health, employment), user consent and risk will become fundamental to the practice.

Finally, understanding the reasons behind predictions and outcomes of web services is important in optimizing a system and in building trust with users. However, it also has legal and ethical implications when the algorithm has an unintended or undesirable impact along social boundaries.
The objective of this full day workshop is to study and discuss the problems and solutions with algorithmic fairness, accountability, and transparency of models in the context of web-based services.

See https://fatweb.github.io.

Survey on query auto completion online

A survey on query auto completion in information retrieval written by Fei Cai and myself is online now:

  • Fei Cai and Maarten de Rijke. A survey of query auto completion in information retrieval. Foundations and Trends in Information Retrieval, 10(4):273-363, September 2016. Bibtex, PDF
    @article{cai-survey-2016,
    Author = {Cai, Fei and de Rijke, Maarten},
    Date-Added = {2016-01-27 11:52:34 +0000},
    Date-Modified = {2016-09-20 14:20:08 +0000},
    Journal = {Foundations and Trends in Information Retrieval},
    Month = {September},
    Number = {4},
    Pages = {273--363},
    Title = {A survey of query auto completion in information retrieval},
    Volume = {10},
    Year = {2016}}

In information retrieval, query auto completion, also known as type-ahead and auto-complete suggestion, refers to the following functionality: given a prefix consisting of a number of characters entered into a search box, the user interface proposes alternative ways of extending the prefix to a full query. Ranking query completions is a challenging task due to the limited length of prefixes entered by users, the large volume of possible query completions matching a prefix, and the broad range of possible search intents. In recent years, a large number of query auto completion approaches have been proposed that produce ranked lists of alternative query completions by mining query logs.

In this survey, we review work on query auto completion that has been published before 2016. We focus mainly on web search and provide a formal definition of the query auto completion problem. We describe two dominant families of approaches to the query auto completion problem, one based on heuristic models and the other based on learning to rank. We also identify dominant trends in published work on query auto completion, viz. the use of time-sensitive signals and the use of user-specific signals. We describe the datasets and metrics that are used to evaluate algorithms for query auto completion. We also devote a chapter to efficiency and a chapter to presentation and interaction aspects of query auto completion. We end by discussing related tasks as well as potential research directions to further the area.

CIKM 2016 Workshop on Data-Driven Talen Acquisition

The Workshop on Data-Driven Talent Acquisition (DDTA’16) will be co-located with CIKM 2016, held in Indianapolis, USA, on October 28, 2016.

Expertise search is a well-established field in information retrieval. In recent years, the increasing availability of data enables accumulation of evidence of talent and expertise from a wide range of domains. The availability of big data significantly benefits employers and recruiters. By analyzing the massive amounts of structured and unstructured data, organizations may be able to find the exact skillsets and talent they need to grow their business. The aim of this workshop is to provide a forum for industry and academia to discuss the recent progress in talent search and management, and how the use of big data and data-driven decision making can advance talent acquisition and human resource management.

Important Dates

  • Submission deadline: September 8, 2016
  • Acceptance notification: September 22, 2016
  • Workshop date: October 28, 2016

Further details

Special issue on Neural Information Retrieval

The Information Retrieval Journal has put out a call for contributions to a special issue on neural information retrieval. Topics for this issue include the application of neural network models in IR tasks, including but not limited to:

  • Full text document retrieval, passage retrieval, question answering
  • Web search, paid search, searching social media, entity search
  • Learning to rank combined with neural network based representation learning
  • User and task modelling, personalized search and recommendations, diversity
  • Query formulation assistance, query recommendation, conversational search
  • Multimedia and cross-media retrieval

The deadline is October 15, 2016. Here’s a PDF with the call.

We’re hiring: 3 PhD positions in Academic Search

As part of a collaborative project between the University of Amsterdam, the VU University Amsterdam, the Royal Netherlands Academy of Arts and Sciences, and Elsevier, we are looking to fill three fully funded PhD student positions in the area of academic search. Please visit the University’s vacancy page for details on the project and on how and where to apply.

Algoritmen zijn niet neutraal. En dat is maar goed ook.

David Graus en ik schreven een stukje over neutraliteit en algoritmen voor NRC en NRC.Next onder de titel “Algoritmen zijn niet neutraal. En dat is maar goed ook.” Helaas meende de redactie er een andere titel boven te moeten plakken, een titel die nogal afleidt van onze boodschap.

NRC_Handelsblad_20160617_1_16_3

(NRC Handelsblad, 17 juni 2016, pagina 16.)

Neu-IR: The SIGIR 2016 Workshop on Neural Information Retrieval

The deadline for submissions to Neu-IR: The SIGIR 2016 Workshop on Neural Information Retrieval is less than three weeks away. Neu-IR will be a highly interactive full day workshop, featuring a mix of presentation and interaction formats. We welcome application papers, papers that address fundamental modeling challenges, and best practices papers. Please see http://research.microsoft.com/en-us/events/neuir2016/ for details and submission instructions.

« Older posts

© 2017 Maarten de Rijke

Theme by Anders NorenUp ↑