Maarten de Rijke

Information retrieval

Month: September 2018

Paper on incremental sparse Bayesian ordinal regression published in Neural Networks

“Incremental sparse Bayesian ordinal regression” by Chang Li and Maarten de Rijke has been published in the October 2018 issue of Neural Networks. See the journal’s site.

Ordinal Regression (OR) aims to model the ordering information between different data categories, which is a crucial topic in multi-label learning. An important class of approaches to OR models the problem as a linear combination of basis functions that map features to a high-dimensional non-linear space. However, most of the basis function-based algorithms are time consuming. We propose an incremental sparse Bayesian approach to OR tasks and introduce an algorithm to sequentially learn the relevant basis functions in the ordinal scenario. Our method, called Incremental Sparse Bayesian Ordinal Regression (ISBOR), automatically optimizes the hyper-parameters via the type-II maximum likelihood method. By exploiting fast marginal likelihood optimization, ISBOR can avoid big matrix inverses, which is the main bottleneck in applying basis function-based algorithms to OR tasks on large-scale datasets. We show that ISBOR can make accurate predictions with parsimonious basis functions while offering automatic estimates of the prediction uncertainty. Extensive experiments on synthetic and real word datasets demonstrate the efficiency and effectiveness of ISBOR compared to other basis function-based OR approaches.

CIKM 2018 paper on Web-based Startup Success Prediction online

Web-based Startup Success Prediction by Boris Sharchilev, Michael Roizner, Andrey Rumyantsev, Denis Ozornin, Pavel Serdyukov, Maarten de Rijke is online now at this page.

In the paper we consider the problem of predicting the success of startup companies at their early development stages. We formulate the task as predicting whether a company that has already secured initial (seed or angel) funding will attract a further round of investment in a given period of time. Previous work on this task has mostly been restricted to mining structured data sources, such as databases of the startup ecosystem consisting of investors, incubators and startups. Instead, we investigate the potential of using web-based open sources for the startup success prediction task and model the task using a very rich set of signals from such sources. In particular, we enrich structured data about the startup ecosystem with information from a business- and employment-oriented social networking service and from the web in general. Using these signals, we train a robust machine learning pipeline encompassing multiple base models using gradient boosting. We show that utilizing companies’ mentions on the Web yields a substantial performance boost in comparison to only using structured data about the startup ecosystem. We also provide a thorough analysis of the obtained model that allows one to obtain insights into both the types of useful signals discoverable on the Web and market mechanisms underlying the funding process.

The paper will be presented at CIKM 2018 in October 2018.

© 2018 Maarten de Rijke

Theme by Anders NorenUp ↑