Our contribution to the NIPS LearningSys 2015 Workshop on Machine Leaning Systems by Damien Lefortier, Anthony Truchet and Maarten de Rijke is available online now:

  • Damien Lefortier, Anthony Truchet, and Maarten de Rijke. Sources of variability in large-scale machine learning systems. In NIPS LearningSys 2015 Workshop on Machine Learning Systems, December 2015. Bibtex, PDF
    @inproceedings{lefortier-sources-2015,
    Author = {Lefortier, Damien and Truchet, Anthony and de Rijke, Maarten},
    Booktitle = {NIPS LearningSys 2015 Workshop on Machine Learning Systems},
    Date-Added = {2015-11-03 15:49:55 +0000},
    Date-Modified = {2016-04-03 17:49:40 +0000},
    Month = {December},
    Title = {Sources of variability in large-scale machine learning systems},
    Year = {2015}}

We investigate sources of variability of a state-of-the-art distributed machine learning system for learning click and conversion prediction models for display advertising. We focus on three main sources of variability: asynchronous updates in the learning algorithm, downsampling of the data, and the non-deterministic order of examples received by each learning instance. We observe that some sources of variability can lead to significant differences between the models obtained and cause issues for, e.g., regression testing, debugging, and offline evaluation. We present effective solutions to stabilize the system and remove these sources of variability, thus fully solving the issues related to regression testing and to debugging. Moreover, we discuss potential limitations of this stabilization for drawing conclusions, in which case we may want to take the variability produced by the machine learning system into account in confidence intervals.