Maarten de Rijke

Information retrieval

Page 2 of 13

Open Science

I’m a professor. My job description is very simple: to create new knowledge and to transfer it. To students, colleagues, and anyone else, really. To academia, industry, governments, and the rest of society. I do my job by working with a large team of very talented PhD students and postdocs from around the planet and by presenting our work as broadly as possible, to different types of audience and around the world.

Throughout my career my research has centered around the concept of information. Early on in my career I worked on representing and reasoning with information. For the past 15 years, I have focused on information retrieval – technology to connect people to information. It’s a great area to work in, the research problems are challenging, the area has global impact and attracts amazing talent from around the world. Access to information is a human right and information retrieval is the basis of a critical technology for providing that access.

In a few days, the main conference in my research area, SIGIR, the ACM international conference on research and development in information retrieval, will take place in Ann Arbor, Michigan, US. It promises to be a really good edition. The community is expanding, experimenting with new ideas, new directions, new formats. And developments in machine learning, in conversational agents, and related to societal implications of the technology that we develop are creating new energy and excitement. My team and I are scheduled to present a large number of papers, both full papers and short papers, we are also organizing a workshop, presenting a tutorial, I’m on a panel on future directions in information retrieval, in a lunch session on diversity, a student lunch session, in editorial board meetings for journals for which I have editorial responsibility, and, of course, there will be a large number of exciting high-quality paper presentations by colleagues. To top it off, the conference takes place on a university campus, which I much prefer as a conference location over a hotel. This promises to be a fantastic edition of SIGIR.

But I won’t be going.

In November 2017 I visited Iran. I gave two talks. A strategy talk on our experience in Amsterdam in bringing together data scientists from diverse institutions with diverse disciplinary backgrounds. And a science talk on the interface between data science and information retrieval. A two day trip with two talks plus the opportunity to meet some fantastic students. Of course, the trip to Iran meant that I was no longer authorized to travel to the US under the ESTA visa waiver program. I knew this in advance and so I applied for a DS-160 nonimmigrant visa on December 2, 2017, a few days after my return from Iran. Unfortunately, my visa application was refused. Further information from my end was requested (my CV), which I submitted in mid December 2017. Seven months have passed since I started the process. I have had to skip the WSDM 2018 conference, which took place in Los Angeles, and the FAT*2018 conference, which took place in New York. The latest status update of my visa application as of a few days ago was “no news yet, unfortunately,” which means that I will have to skip SIGIR 2018 as well.

What’s next?

There is no point in lowering myself to the standards of a policy that I oppose and to stop submitting papers to conferences that are being held in the US. Halting the conversation is never the answer. It’s neither helpful nor effective in promoting what is essential to my job as a professor. The pursuit of science “requires freedom of movement, association, expression and communication for scientists”. With a Dutch passport, I’m one of the lucky few: even after having my ESTA privilege revoked, I can still travel visa-free to 185 destinations. The majority of the PhD students and postdocs in my team are from China, Iran and Russia, all of which are countries ranked lower or even much lower on the Passport Index than the Netherlands. I regularly see how they face impenetrable walls in their education or career simply because of the country on their passport.

Scientific progress and human and environmental well-being are our collective responsibility. Open exchange of people and ideas are key to fulfilling such collective responsibilities. If you have the opportunity, please help to facilitate more open exchange, of talent and of knowledge. If you are in academia, share your publications, share your teaching materials, share your code, share your data, share your time. Support and encourage visits to your institution, based on scientific excellence only. If you’re from a country that’s ranked high on the Passport Index, go and visit other institutions around the planet and share your expertise. I believe that every bit helps. For my part, I’ll try to raise 100K Euro per year, over the next 10 years, to bring an information retrieval researcher to my university for a 12-month visiting position. Scientific excellence will be the entry ticket, not the country in someone’s passport.

[1] Justin Zobel. What We Talk About When We Talk About Information Retrieval. SIGIR Forum 51(3):18-26, 2017. PDF
[2] ICSU, Freedom, Responsibility and Universality of Science, 2014. PDF
[3] Henley & Partners, Passport Index, 2018. URL

Now on arXiv: Explainable Fashion Recommendation with Joint Outfit Matching and Comment Generation

Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, and I published “Explainable Fashion Recommendation with Joint Outfit Matching and Comment Generation” on arXiv. Most previous work on fashion recommendation focuses on designing visual features to enhance recommendations. Existing work neglects user comments of fashion items, which have been proved effective in generating explanations along with better recommendation results. We propose a novel neural network framework, neural fashion recommendation (NFR), that simultaneously provides fashion recommendations and generates abstractive comments. NFR consists of two parts: outfit matching and comment generation. For outfit matching, we propose a convolutional neural network with a mutual attention mechanism to extract visual features of outfits. The visual features are then decoded into a rating score for the matching prediction. For abstractive comment generation, we propose a gated recurrent neural network with a cross-modality attention mechanism to transform visual features into a concise sentence. The two parts are jointly trained based on a multi-task learning framework in an end-to-end back-propagation paradigm. Extensive experiments conducted on an existing dataset and a collected real-world dataset show NFR achieves significant improvements over state-of-the-art baselines for fashion recommendation. Meanwhile, our generated comments achieve impressive ROUGE and BLEU scores in comparison to human-written comments. The generated comments can be regarded as explanations for the recommendation results. We release the dataset and code to facilitate future research. You can find the paper here.

Five more vacancies for fully funded PhD students

We just opened five more vacancies for PhD students. The general area is AI for Retail, and the positions are part of the new AIRLab. Areas range from recommendation to federated search and conversational search to replenishment. Deadline: July 16, 2018. Please visit this page for all the details.

Three more papers on arXiv

We’ve just put three more papers on arXiv.

Earlier in June, Sapna Negi, Paul Buitelaar, and I put “Open Domain Suggestion Mining: Problem Definition and Datasets” on arXiv. In the paper we propose a formal definition for the task of suggestion mining in the context of a wide range of open domain applications. Human perception of the term suggestion is subjective and this effects the preparation of hand labeled datasets for the task of suggestion mining. Existing work either lacks a formal problem definition and annotation procedure, or provides domain and application specific definitions. Moreover, many previously used manually labeled datasets remain proprietary. We first present an annotation study, and based on our observations propose a formal task definition and annotation procedure for creating benchmark datasets for suggestion mining. With this study, we also provide publicly available labeled datasets for suggestion mining in multiple domains. You can find the paper here.

Then, in mid June, Branislav Kveton, Chang Li, Tor Lattimore, Ilya Markov, Csaba Szepesvari, Masrour Zoghi, and I put “BubbleRank: Safe Online Learning to Rerank” on arXiv. We study the problem of online learning to re-rank, where users provide feedback to improve the quality of displayed lists. Learning to rank has been traditionally studied in two settings. In the offline setting, rankers are typically learned from relevance labels of judges. These approaches have become the industry standard. However, they lack exploration, and thus are limited by the information content of offline data. In the online setting, an algorithm can propose a list and learn from the feedback on it in a sequential fashion. Bandit algorithms developed for this setting actively experiment, and in this way overcome the biases of offline data. But they also tend to ignore offline data, which results in a high initial cost of exploration. We propose BubbleRank, a bandit algorithm for re-ranking that combines the strengths of both settings. The algorithm starts with an initial base list and improves it gradually by swapping higher-ranked less attractive items for lower-ranked more attractive items. We prove an upper bound on the n-step regret of BubbleRank that degrades gracefully with the quality of the initial base list. Our theoretical findings are supported by extensive numerical experiments on a large real-world click dataset. The paper can be found here.

And, third, Svitlana Vakulenko, Michael Cochez, Vadim Savenkov, Axel Polleres, and I put “Measuring Semantic Coherence of a Conversation” on arXiv. Conversational systems have become increasingly popular as a way for humans to interact with computers. To be able to provide intelligent responses, conversational systems must correctly model the structure and semantics of a conversation. We introduce the task of measuring semantic (in)coherence in a conversation with respect to background knowledge, which relies on the identification of semantic relations between concepts introduced during a conversation. We propose and evaluate graph-based and machine learning-based approaches for measuring semantic coherence using knowledge graphs, their vector space embeddings and word embedding models, as sources of background knowledge. We demonstrate how these approaches are able to uncover different coherence patterns in conversations on the Ubuntu Dialogue Corpus. The paper can be found here.

Another IR vacancy

We just opened another vacancy, for a PhD student to work on dataset search. Come and join us to work in the area of AI & IR, together with great colleagues at Elsevier, VU Amsterdam and KNAW. See https://t.co/um5ldIQAZg for more details.

IR positions at the University of Amsterdam

We have several positions open at the interface of information retrieval, language technology and artificial intelligence, at different levels at the University of Amsterdam.

Feel free to contact me with any questions.

SIGIR 2018 papers online

The SIGIR 2018 papers that I contributed to are online now:

  • Alexey Borisov, Martijn Wardenaar, Ilya Markov, and Maarten de Rijke. A click sequence model for web search. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval, page 45–54. ACM, July 2018. Bibtex, PDF
    @inproceedings{borisov-click-2018,
    Author = {Borisov, Alexey and Wardenaar, Martijn and Markov, Ilya and de Rijke, Maarten},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-04-12 05:45:13 +0000},
    Date-Modified = {2018-08-25 17:13:02 +0200},
    Month = {July},
    Pages = {45--54},
    Publisher = {ACM},
    Title = {A click sequence model for web search},
    Year = {2018}}
  • Wanyu Chen, Fei Cai, Honghui Chen, and Maarten de Rijke. Attention-based hierarchical neural query suggestion. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval, page 1093–1096. ACM, July 2018. Bibtex, PDF
    @inproceedings{chen-attention-based-2018,
    Author = {Chen, Wanyu and Cai, Fei and Chen, Honghui and de Rijke, Maarten},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-04-11 23:31:34 +0000},
    Date-Modified = {2018-08-25 17:14:43 +0200},
    Month = {July},
    Pages = {1093--1096},
    Publisher = {ACM},
    Title = {Attention-based hierarchical neural query suggestion},
    Year = {2018}}
  • Paul Groth, Laura Koesten, Philipp Mayr, Maarten de Rijke, and Elena Simperl. DATA:SEARCH’18 – Searching data on the web. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval, pages 1419-1422. ACM, July 2018. Bibtex, PDF
    @inproceedings{groth-data-2018,
    Author = {Groth, Paul and Koesten, Laura and Mayr, Philipp and de Rijke, Maarten and Simperl, Elena},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-05-05 10:58:53 +0000},
    Date-Modified = {2018-08-25 17:17:04 +0200},
    Month = {July},
    Pages = {1419-1422},
    Publisher = {ACM},
    Title = {DATA:SEARCH'18 -- Searching data on the web},
    Year = {2018}}
  • Harrie Oosterhuis and Maarten de Rijke. Ranking for relevance and display preferences in complex presentation layouts. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval, page 845–854. ACM, July 2018. Bibtex, PDF
    @inproceedings{oosterhuis-ranking-2018,
    Author = {Oosterhuis, Harrie and de Rijke, Maarten},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-04-12 05:41:46 +0000},
    Date-Modified = {2018-08-25 17:13:58 +0200},
    Month = {July},
    Pages = {845--854},
    Publisher = {ACM},
    Title = {Ranking for relevance and display preferences in complex presentation layouts},
    Year = {2018}}
  • Zhaochun Ren, Xiangnan He, Dawei Yin, and Maarten de Rijke. Information discovery in e-commerce. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval, page 1379–1382. ACM, July 2018. Bibtex, PDF
    @inproceedings{ren-information-2018,
    Author = {Ren, Zhaochun and He, Xiangnan and Yin, Dawei and de Rijke, Maarten},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-05-05 10:53:55 +0000},
    Date-Modified = {2018-08-25 17:16:43 +0200},
    Month = {July},
    Pages = {1379--1382},
    Publisher = {ACM},
    Title = {Information discovery in e-commerce},
    Year = {2018}}
  • Christophe Van Gysel and Maarten de Rijke. Pytrec_eval: An extremely fast Python interface to trec_eval. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval, page 873–876. ACM, July 2018. Bibtex, PDF
    @inproceedings{vangysel-pytrec-2018,
    Author = {Van Gysel, Christophe and de Rijke, Maarten},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-04-11 22:02:31 +0000},
    Date-Modified = {2018-08-25 17:14:19 +0200},
    Month = {July},
    Pages = {873--876},
    Publisher = {ACM},
    Title = {Pytrec\_eval: An extremely fast Python interface to trec\_eval},
    Year = {2018}}
  • Nikos Voskarides, Edgar Meij, Ridho Reinanda, Abhinav Khaitan, Miles Osborne, Giorgio Stefanoni, Kambadur Prabhanjan, and Maarten de Rijke. Weakly-supervised contextualization of knowledge graph facts. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval, page 765–774. ACM, July 2018. Bibtex, PDF
    @inproceedings{voskarides-weakly-supervised-2018,
    Author = {Voskarides, Nikos and Meij, Edgar and Reinanda, Ridho and Khaitan, Abhinav and Osborne, Miles and Stefanoni, Giorgio and Kambadur Prabhanjan and de Rijke, Maarten},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-04-12 05:42:50 +0000},
    Date-Modified = {2018-08-25 17:13:40 +0200},
    Month = {July},
    Pages = {765--774},
    Publisher = {ACM},
    Title = {Weakly-supervised contextualization of knowledge graph facts},
    Year = {2018}}
  • Xiaohui Xie, Jiaxin Mao, Maarten de Rijke, Ruizhe Zhang, Min Zhang, and Shaoping Ma. Constructing an interaction behavior model for web image search. In SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval, page 425–434. ACM, July 2018. Bibtex, PDF
    @inproceedings{xie-constructing-2018,
    Author = {Xie, Xiaohui and Mao, Jiaxin and de Rijke, Maarten and Zhang, Ruizhe and Zhang, Min and Ma, Shaoping},
    Booktitle = {SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval},
    Date-Added = {2018-04-11 22:25:25 +0000},
    Date-Modified = {2018-08-25 17:13:23 +0200},
    Month = {July},
    Pages = {425--434},
    Publisher = {ACM},
    Title = {Constructing an interaction behavior model for web image search},
    Year = {2018}}

Hello World: Innovation Center for Artificial Intelligence

Yesterday, ICAI, the national Innovation Center for Artificial Intelligence, was launched. ICAI is a national initiative focused on joint technology development between academia and industry in the area of artificial intelligence.

Artificial intelligence (AI) has become a key technology that is rapidly becoming a disruptor for all economic sectors. Given the impact, AI also generates many societal challenges. It is a proven attractor of investments in countries around the globe and potentially in the Netherlands. And it is likely to be a major change maker for work, today and tomorrow.

The Netherlands needs to better help drive innovation through AI, most importantly by increasing its ability to attract, train and retain top artificial intelligence scientists, connecting them to the business world. Without Dutch business and Dutch data, Dutch AI knowledge cannot be developed. Vice versa, without Dutch AI knowledge, Dutch business faces a serious competitive disadvantage.

The Netherlands has all the required assets to occupy a prominent place in the international AI arena. We have talent, we have world-class research, we have a longstanding tradition in AI education at all levels, and we are one of the world’s top ranked countries in terms of innovation power. ICAI brings these positive forces together in a unique national initiative. Focused on AI innovation through public-private collaborations, ICAI is an open national consortium of academic partners that is based at Amsterdam Science Park and launched by the University of Amsterdam and the VU University Amsterdam.

ICAI’s innovation strategy is organized around industry labs, these are multi-year strategic collaborations between academic and industrial partners with a focus on technology and talent development. Our mantra is that it takes AI innovation talent to make data actionable. By establishing a research lab under the ICAI umbrella, participating companies invest in AI research and innovation, custom made AI training programs, and an ambitious talent pipeline that builds on educational strengths in AI.

ICAI builds on the success of a long-standing tradition of public-private cooperation in research. For companies it is important to absorb knowledge and know-how of AI as it is close to the essential values of business processes and the future perspective of the company. Internationally, this need to cooperate between public and private partners has been recognized and put into action for the Netherlands to follow.

Visit http://icai.ai or contact me for more information.

Now on arXiv: Finding influential training samples for gradient boosted decision trees

Boris Sharchilev, Yury Ustinovsky, Pavel Serdyukov, and I have released a new pre-print on “finding influential training samples for gradient boosted decision trees” on arXiv. In the paper we address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model’s predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric models, this analysis can be conducted in a computationally efficient way. We propose several ways of extending this framework to non-parametric GBDT ensembles under the assumption that tree structures remain fixed. Furthermore, we introduce a general scheme of obtaining further approximations to our method that balance the trade-off between performance and computational complexity. We evaluate our approaches on various experimental setups and use-case scenarios and demonstrate both the quality of our approach to finding influential training samples in comparison to the baselines and its computational efficiency. You can find the paper here.

Now on arXiv: Optimizing interactive systems with data-driven objectives

Ziming Li, Artem Grotov, Julia Kiseleva, Harrie Oosterhuis and I have just released a new preprint on “optimizing interactive systems with data-driven objectives” on arXiv. Effective optimization is essential for interactive systems to provide a satisfactory user experience. However, it is often challenging to find an objective to optimize for. Generally, such objectives are manually crafted and rarely capture complex user needs accurately. Conversely, we propose an approach that infers the objective directly from observed user interactions. These inferences can be made regardless of prior knowledge and across different types of user behavior. Then we introduce: Interactive System Optimizer (ISO), a novel algorithm that uses these inferred objectives for optimization. Our main contribution is a new general principled approach to optimizing interactive systems using data-driven objectives. We demonstrate the high effectiveness of ISO over several GridWorld simulations. Rush over to arXiv to download the paper.

« Older posts Newer posts »

© 2018 Maarten de Rijke

Theme by Anders NorenUp ↑