Maarten de Rijke

Information retrieval

Author: mdr (page 1 of 11)

FAT* Conference on Fairness, Accountability, and Transparency

FAT* is a multi-disciplinary conference that brings together researchers and practitioners interested in fairness, accountability, and transparency in socio-technical systems.

Artificial intelligence, automation, and machine learning are being adopted in a growing number of contexts. Fueled by big data, these systems filter, sort, score, recommend, personalize, and otherwise shape human experiences of socio-technical systems. Although these systems bring myriad benefits, they also contain inherent risks, such as codifying and entrenching biases; reducing accountability and hindering due process; and increasing the information assymmetry between data producers and data holders.

FAT* is an annual conference dedicating to bringing together a diverse community to investigate and tackle issues in this emerging area. FAT* builds upon several years of successful workshops on the topics of fairness, accountability, transparency, ethics, and interpretability in machine learning, recommender systems, the web, and other technical disciplines.

The inaugural 2018 FAT* Conference will be held February 23 and 24th, 2018 at New York University, NYC. Details will be announced at https://www.fatconference.org/2018/index.html.

Material from NN4IR tutorial online

The material from our highly popular tutorial on Neural Networks for Information Retrieval (NN4IR), presented during SIGIR 2017 in Tokyo is available online at http://nn4ir.com.

Neural Networks for Information Retrieval tutorial at SIGIR 2017

Title: Neural Networks for Information Retrieval (NN4IR)

Description: Machine learning plays an important role in many aspects of modern IR systems, and deep learning is applied to all of those. The fast pace of modern-day research into deep learning has given rise to many different approaches to many different IR problems. What are the underlying key technologies and what key insights into IR problems are they able to give us? This full-day tutorial gives a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR research and our understanding of IR problems. Additionally, we peek into the future by examining recently introduced paradigms as well as current challenges. Expect to learn about neural networks in semantic matching, ranking, user interaction, and response generation in a highly interactive tutorial.

Presenters: Tom Kenter, Alexey Borisov, Christophe van Gysel, Mostafa Dehghani, Maarten de Rijke, Bhaskar Mitra

Where: SIGIR 2017, Tokyo

When: August 7, 2017

Neu-IR: The SIGIR 2017 Workshop on Neural Information Retrieval

After a very successful first edition of Neu-IR at SIGIR 2016, we are happy to organize a second version of the workshop at SIGIR 2017. Key facts: the web site for Neu-IR 2017 is online now at http://neu-ir.weebly.com, the deadline for submissions is June 11, 2017 and the workshop itself will take place on Friday August 11, 2017 in Tokyo, Japan.

The Birth of Collective Memories

David Graus, Daan Odijk and I just uploaded a manuscript to arXiv on “The Birth of Collective Memories: Analyzing Emerging Entities in Text Streams.”

We study how collective memories are formed online. We do so by tracking entities that emerge in public discourse, that is, in online text streams such as social media and news streams, before they are incorporated into Wikipedia, which, we argue, can be viewed as an online place for collective memory. By tracking how entities emerge in public discourse, i.e., the temporal patterns between their first mention in online text streams and subsequent incorporation into collective memory, we gain insights into how the collective remembrance process happens online. Specifically, we analyze nearly 80,000 entities as they emerge in online text streams before they are incorporated into Wikipedia. The online text streams we use for our analysis comprise of social media and news streams, and span over 579 million documents in a timespan of 18 months.

We discover two main emergence patterns: entities that emerge in a “bursty” fashion, i.e., that appear in public discourse without a precedent, blast into activity and transition into collective memory. Other entities display a “delayed” pattern, where they appear in public discourse, experience a period of inactivity, and then resurface before transitioning into our cultural collective memory.

Please see https://arxiv.org/abs/1701.04039 and let us know what you think.

We’re hiring (again): Fully funded PhD position in responsible people search

logo-uvaWe have another vacancy for a fully funded four-year PhD position in responsible data science and HR analytics. The starting date is “as soon as possible.” Please visit the UvA vacancies pages for details on how to apply.

We’re hiring: Fully funded PhD student in academic search

logo-uvaWe have a vacancy for a fully funded four-year PhD position in academic search. The starting date is “as soon as possible.” Please visit the UvA vacancies pages for details on how to apply.

FAT/WEB: Workshop on Fairness, Accountability, and Transparency on the Web

Recent academic and journalistic reviews of online web services have revealed that many systems exhibit subtle biases reflecting historic discrimination. Examples include racial and gender bias in search advertising, image recognition services, sharing economy mechanisms, pricing, and web-based delivery. The list of production systems exhibiting biases continues to grow and may be endemic to the way models are trained and the data used.

At the same time, concerns about user autonomy and fairness have been raised in the context of web-based experimentation such as A/B testing or explore/exploit algorithms. Given the ubiquity of this practice and increasing adoption in potentially-sensitive domains (e.g. health, employment), user consent and risk will become fundamental to the practice.

Finally, understanding the reasons behind predictions and outcomes of web services is important in optimizing a system and in building trust with users. However, it also has legal and ethical implications when the algorithm has an unintended or undesirable impact along social boundaries.
The objective of this full day workshop is to study and discuss the problems and solutions with algorithmic fairness, accountability, and transparency of models in the context of web-based services.

See https://fatweb.github.io.

Survey on query auto completion online

A survey on query auto completion in information retrieval written by Fei Cai and myself is online now:

  • Fei Cai and Maarten de Rijke. A survey of query auto completion in information retrieval. Foundations and Trends in Information Retrieval, 10(4):273-363, September 2016. Bibtex, PDF
    @article{cai-survey-2016,
    Author = {Cai, Fei and de Rijke, Maarten},
    Date-Added = {2016-01-27 11:52:34 +0000},
    Date-Modified = {2016-09-20 14:20:08 +0000},
    Journal = {Foundations and Trends in Information Retrieval},
    Month = {September},
    Number = {4},
    Pages = {273--363},
    Title = {A survey of query auto completion in information retrieval},
    Volume = {10},
    Year = {2016}}

In information retrieval, query auto completion, also known as type-ahead and auto-complete suggestion, refers to the following functionality: given a prefix consisting of a number of characters entered into a search box, the user interface proposes alternative ways of extending the prefix to a full query. Ranking query completions is a challenging task due to the limited length of prefixes entered by users, the large volume of possible query completions matching a prefix, and the broad range of possible search intents. In recent years, a large number of query auto completion approaches have been proposed that produce ranked lists of alternative query completions by mining query logs.

In this survey, we review work on query auto completion that has been published before 2016. We focus mainly on web search and provide a formal definition of the query auto completion problem. We describe two dominant families of approaches to the query auto completion problem, one based on heuristic models and the other based on learning to rank. We also identify dominant trends in published work on query auto completion, viz. the use of time-sensitive signals and the use of user-specific signals. We describe the datasets and metrics that are used to evaluate algorithms for query auto completion. We also devote a chapter to efficiency and a chapter to presentation and interaction aspects of query auto completion. We end by discussing related tasks as well as potential research directions to further the area.

CIKM 2016 Workshop on Data-Driven Talen Acquisition

The Workshop on Data-Driven Talent Acquisition (DDTA’16) will be co-located with CIKM 2016, held in Indianapolis, USA, on October 28, 2016.

Expertise search is a well-established field in information retrieval. In recent years, the increasing availability of data enables accumulation of evidence of talent and expertise from a wide range of domains. The availability of big data significantly benefits employers and recruiters. By analyzing the massive amounts of structured and unstructured data, organizations may be able to find the exact skillsets and talent they need to grow their business. The aim of this workshop is to provide a forum for industry and academia to discuss the recent progress in talent search and management, and how the use of big data and data-driven decision making can advance talent acquisition and human resource management.

Important Dates

  • Submission deadline: September 8, 2016
  • Acceptance notification: September 22, 2016
  • Workshop date: October 28, 2016

Further details

Older posts

© 2017 Maarten de Rijke

Theme by Anders NorenUp ↑