Maarten de Rijke

Information retrieval

Month: February 2019

WWW 2019 paper on evaluation metrics for web image search online

Grid-based Evaluation Metrics for Web Image Search by Xiaohui Xie, Jiaxin Mao, Yiqun Liu, Maarten de Rijke, Yunqiu Shao, Zixin Ye, Min Zhang, and Shaoping Ma is online now at this location.

Compared to general web search engines, web image search engines display results in a different way. In web image search, results are typically placed in a grid-based manner rather than a sequential result list. In this scenario, users can view results not only in a vertical direction but also in a horizontal direction. Moreover, pagination is usually not (explicitly) supported on image search search engine result pages (SERPs), and users can view results by scrolling down without having to click a “next page” button. These differences lead to different interaction mechanisms and user behavior patterns, which, in turn, create challenges to evaluation metrics that have originally been developed for general web search. While considerable effort has been invested in developing evaluation metrics for general web search, there have been relatively less effort to construct grid-based evaluation metrics.

To inform the development of grid-based evaluation metrics for web image search, we conduct a comprehensive analysis of user behavior so as to uncover how users allocate their attention in a grid-based web image search result interface. We obtain three findings: (1) “Middle bias”: Confirming previous studies, we find that image results in the horizontal middle positions may receive more attention from users than those in the leftmost or rightmost positions. (2) “Slower decay”: Unlike web search, users’ attention does not decrease monotonically or dramatically with the rank position in image search, especially within a row. (3) “Row skipping”: Users may ignore particular rows and directly jump to results at some distance. Motivated by these observations, we propose correspond- ing user behavior assumptions to capture users’ search interaction processes and evaluate their search performance. We show how to derive new metrics from these assumptions and demonstrate that they can be adopted to revise traditional list-based metrics like Discounted Cumulative Gain (DCG) and Rank-Biased Precision (RBP). To show the effectiveness of the proposed grid-based metrics, we compare them against a number of list-based metrics in terms of their correlation with user satisfaction. Our experimental results show that the proposed grid-based evaluation metrics better reflect user satisfaction in web image search.

The paper will be presented at The Web Conference 2019.

WWW 2019 paper on outfit recommendation online

Improving Outfit Recommendation with Co-supervision of Fashion Generation by Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, and Maarten de Rijke is now available at this location.

The task of fashion recommendation includes two main challenges:visual understanding and visual matching. Visual understanding aims to extract effective visual features. Visual matching aims to model a human notion of compatibility to compute a match between fashion items. Most previous studies rely on recommendation loss alone to guide visual understanding and matching. Although the features captured by these methods describe basic characteristics (e.g., color, texture, shape) of the input items, they are not directly related to the visual signals of the output items (to be recommended). This is problematic because the aesthetic characteristics (e.g., style, design), based on which we can directly infer the output items, are lacking. Features are learned under the recommendation loss alone, where the supervision signal is simply whether the given two items are matched or not.

To address this problem, we propose a neural co-supervision learning framework, called the FAshion Recommendation Machine (FARM). FARM improves visual understanding by incorporating the supervision of generation loss, which we hypothesize to be able to better encode aesthetic information. FARM enhances visual matching by introducing a novel layer-to-layer matching mechanism to fuse aesthetic information more effectively, and meanwhile avoiding paying too much attention to the generation quality and ignoring the recommendation performance.

Extensive experiments on two publicly available datasets show that FARM outperforms state-of-the-art models on outfit recom- mendation, in terms of AUC and MRR. Detailed analyses of gener- ated and recommended items demonstrate that FARM can encode better features and generate high quality images as references to improve recommendation performance.

The paper will be presented at the The Web Conference 2019.

WWW 2019 paper on diversity of dialogue response generation online

Improving Neural Response Diversity with Frequency-Aware Cross-Entropy Loss by Shaojie Jiang, Pengjie Ren, Christof Monz, and Maarten de Rijke is online now at this location.

Sequence-to-Sequence (Seq2Seq) models have achieved encouraging performance on the dialogue response generation task. However, existing Seq2Seq-based response generation methods suffer from a low-diversity problem: they frequently generate generic responses, which make the conversation less interesting. In this paper, we address the low-diversity problem by investigating its connection with model over-confidence reflected in predicted distributions. Specifically, we first analyze the influence of the commonly used Cross-Entropy (CE) loss function, and find that the CE loss function prefers high-frequency tokens, which results in low-diversity responses. We then propose a Frequency-Aware Cross-Entropy (FACE) loss function that improves over the CE loss function by incorporating a weighting mechanism conditioned on token frequency. Extensive experiments on benchmark datasets show that the FACE loss function is able to substantially improve the diversity of existing state-of-the-art Seq2Seq response generation methods, in terms of both automatic and human evaluations.

The paper will be presented at The Web Conference 2019.

WWW 2019 paper on visual learning to rank online

ViTOR: Learning to Rank Webpages Based on Visual Features by Bram van den Akker, Ilya Markov, and Maarten de Rijke is available online now at this location.

The visual appearance of a webpage carries valuable information about the page’s quality and can be used to improve the performance of learning to rank (LTR). We introduce the Visual learning TO Rank (ViTOR) model that integrates state-of-the-art visual features extraction methods: (i) transfer learning from a pre-trained image classification model, and (ii) synthetic saliency heat maps generated from webpage snapshots. Since there is currently no public dataset for the task of LTR with visual features, we also introduce and release the ViTOR dataset, containing visually rich and diverse webpages. The ViTOR dataset consists of visual snapshots, non-visual features and relevance judgments for ClueWeb12 webpages and TREC Web Track queries. We experiment with the proposed ViTOR model on the ViTOR dataset and show that it significantly improves the performance of LTR with visual features.

The paper will be presented at The Web Conference 2019.

© 2019 Maarten de Rijke

Theme by Anders NorenUp ↑