The Bayes Club
The Bayes club is an informal meeting of researchers in the field of Bayesian statistics. Although the interest is general, most of the presentations concern subjects in mathematical statistics, such as non-parametric priors and the asymptotic behaviour of posterior distributions. Our aim is to exchange ideas, to present our work/research and to discuss other developments in the field. Meetings are held on Fridays, 16:00-17:00h, (usually) at the Korteweg-de Vries Institute for Mathematics, University of Amsterdam Science Park, Amsterdam.
This semester's talks
- Fri 24 Feb, 2017 (16:00 - 17:00h)
Johannes Schmidt-Hieber (MI, Leiden) and Markus Reiss (Humboldt, Berlin) Nonparametric Bayesian analysis of irregular models (Location: Sciencepark 105-107, Amsterdam, KdVI, Room F3.20) (abstract)
- Fri 17 Mar, 2017 (16:00 - 17:00h)
Moritz Schauer (Statistics, TU Delft) TBA (Location: Niels Bohrweg 1, Leiden, MI, Room 408)
- Fri 7 Apr, 2017 (16:00 - 17:00h)
Eduard Belitser (Mathematics, Free University Amsterdam) TBA (Location: Niels Bohrweg 1, Leiden, MI, Room 402)
- Fri 12 May, 2017 (16:00 - 17:00h)
Alexander Ly (FMG, Psychological Methods, University of Amsterdam) TBA (Location: Niels Bohrweg 1, Leiden, MI, Room 408)
- Wed 24 May, 2017 (16:00 - 17:00h)
Sonia Petrone (Bocconi, Milano, Italy) TBA (Location: Sciencepark 105-107, Amsterdam, KdVI, Room F3.20)
- Tue 4 July, 2017 (16:00 - 17:00h)
Minwoo Chae (Univ. Austin, Texas, USA) TBA (Location: Sciencepark 105-107, Amsterdam, KdVI, Room F3.20)
Past talks
- Dec 9, 2016 - Frank van der Meulen - Bayesian estimation for hypo-elliptic diffusions (abstract)
- Nov 4, 2016 - Joris Bierkens - The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data (abstract)
- 21 Oct, 2016 - William Weimin Yoo - Making Bayesian inference and quantifying uncertainty through the sup-norm distance (abstract)
- 7 Oct, 2016 - Julyan Arbel - Bayesian nonparametric inference for discovery probabilities: credible intervals and large sample asymptotics (abstract)
- Jun 3, 2016 - Subhashis Ghosal - Bayesian estimation and uncertainty quantification for differential equation models (abstract)
- May 13, 2016 - Paulo Serra - Regression with correlated noise, a non-parametric approach (abstract)
- Apr 22, 2016 - Stephanie van der Pas - Conditions for posterior contraction in the sparse normal means problem (abstract)
- Apr 15, 2016 - Eduard Belitser - Needles and straw in a haystack: robust empirical Bayes confidence for possibly sparse sequences (abstract)
- Mar 18, 2016 - Bas Kleijn - Posterior consistency revisited (abstract)
- Dec 4, 2015 - Peter Gruenwald - Generalized Bayesian inference (abstract)
- Nov 20, 2015 - Botond Szabo - How many needles in a haystack? (abstract)
- Nov 6, 2015 - Moritz Schauer - Bayesian inference for partially observed diffusion processes (abstract)
- Oct 23, 2015 - Jan van Waaij - Adaptive posterior contraction results for computationally efficient priors for diffusion models (abstract)
- Oct 16, 2015 - Marjan Sjerps - Assessing and reporting the strength of forensic evidence (abstract)
- May 29, 2015 - Stephen Walker - Recursive Bayesian predictive distributions (abstract)
- May 1, 2015 - Botond Szabo - Asymptotic behaviour of the empirical Bayes posteriors associated to maximum marginal likelihood estimator (abstract)
- April 24, 2015 - Bartek Knapik - Posterior contraction and nonparametric inverse problems (abstract)
- April 17, 2015 - Antonio Lijoi - Bayesian nonparametrics with heterogeneous data (abstract)
- March 20, 2015 - Alicia Kirichenko - Estimating a smoothly varying function on a large graph (abstract)
- Februari 27, 2015 - Jan van Waaij - Using random scaling to a Gaussian prior gives minimax adapted convergence rates for a diffusion model (abstract)
- June 6, 2014 - Fengnan Gao - Posterior contraction rates for deconvolution of Dirichlet-Laplace mixtures (abstract)
- May 9, 2014 - Bas Kleijn - Bayesian testability and consistency (abstract)
- April 4, 2014 - Stephanie van der Pas - The Horseshoe Estimator: Posterior Concentration around Nearly Black Vectors (abstract)
- March 14, 2014 - Jean-Bernard Salomond - Adaptive Bayes test for monotonicity (abstract)
- February 27, 2014 - Harrison Zhou - Rate-optimal Posterior Contraction for Sparse PCA (abstract)
- February 18, 2014 - Richard Nickl - On nonparametric Bernstein-von Mises (BvM) theorems (abstract)
- January 10, 2014 - Peter Orbanz - Nonparametric priors for graphs and arrays (abstract)
- December 20, 2013 - Yanyun Zhao - Alternatives for Ghosh-Ghosal-van der Vaart priors (abstract)
- November 22, 2013 - Johannes Schmidt-Hieber - On adaptive posterior contraction rates (abstract)
- October 25, 2013 - Aad van der Vaart - A review of Bayesian species sampling models (abstract)
- March 6, 2013 - Levi Boyles - Bayesian Hierarchical Clustering with the Coalescent Prior (abstract)
- March 6, 2013 - Judith Rousseau - On Bayesian nonparametric adaptive estimation under uniform loss (abstract)
- December 21, 2012 - Shota Gugushvili - Posterior consistency for a rescaled Brownian motion (abstract)
- December 14, 2012 - Bas Kleijn - Criteria for Bayesian consistency (abstract)
- November 30, 2012 - Peter Gruenwald - Inconsistency of Bayesian Inference When the Model Is Wrong (abstract)
- November 2, 2012 - Bartek Knapik - Semiparametric posterior limits revisited (abstract)
- October 19, 2012 - Max Welling - Bayesian Posterior Sampling with Stochastic Gradients for "Big Data" Problems (abstract)
- June 15, 2012 - Catia Scricciolo - Bayes and empirical Bayes: do they merge? (abstract)
- May 25, 2012 - Botond Szabó - Adaptive Bayesian techniques for inverse problems (abstract)
- May 11, 2012 - Suzanne Sniekers - Credible sets in the fixed design model with Brownian motion prior (abstract)
- April 27, 2012 - Eduard Belitser - On Bayesian construction of exact confidence sets (abstract)
- April 20, 2012 - Johannes Schmidt-Hieber - Posterior concentration in high-dimensional regression under sparsity (abstract)
- December 16, 2011 - Eduard Belitser - On estimation of high dimensional vector of binomial proportions (abstract)
- November 25, 2011 - Harry van Zanten - A differential equations approach to nonparametric Bayesian drift estimation for diffusions on the circle (abstract)
- September 16, 2011 - Tim van Erven - Bayesian Methods in Online Learning (abstract)
- June 17, 2011 - Frank van der Meulen (abstract)
- June 10, 2011 - Subhashis Ghoshal - Predicting Proportion of False Discovery Proportions in Dependent Multiple Tests
- May 13, 2011 - Haralambie Leahu - On the BvM phenomenon in the Gaussian white noise model (abstract)
- April 8, 2011 - Bartek Knapik - Semiparametric posterior limits under local asymptotic exponentiality (abstract)
- April 1, 2011 - Bas Kleijn - Bayesian efficiency in mixture models
- March 18, 2011 - Aad van der Vaart (abstract)
Abstracts
Given a sample of a Poisson point process with intensity \lambda_f(x,y) = n 1(f(x) less than or equal y), we study recovery of the boundary function f from a nonparametric Bayes perspective. Because of the irregularity of this model, the standard approach for posterior contraction rates cannot be applied. We derive a general result for posterior contraction with respect to the Hellinger distance. This result is applied to several classes of priors, including Gaussian priors, priors based on random series, compound Poisson processes, and sub-ordinators. We also investigate the limiting shape of the posterior distribution and derive a nonparametric version of the Bernstein-von Mises theorem for irregular models. We show that for piecewise constant functions, the marginal posterior of the functional \theta = \int f does some automatic bias correction and contracts with a faster rate than the MLE. We also show that this property is lost if the true underlying function comes from a more general class of functions. [Joint work with M. Reiss]
Suppose X is a discretely observed diffusion process and we wish to sample from the posterior distribution of parameters appearing in either the drift coefficient or the diffusion coefficient. As the likelihood is intractable, a common approach is to derive an MCMC algorithm where the missing diffusion paths in between the observations are augmented to the state space. This requires efficient sampling of diffusion bridges. In recent years some results have appeared in the "uniformly elliptic case, which is characterised by nondegeneracy of the covariance matrix of the noise. The "hypo-elliptic case refers to the situation where the covariance matrix of the noise is degenerate and where observations are made only of variables that are not directly forced by white noise. As far as I am aware, not much is known how to sample bridges in this case. In this talk I will share some recent ideas on extending earlier results with Harry van Zanten (UvA) and Moritz Schauer (Leiden), derived under the assumption of uniformly ellipticity, to this setting. This concerns "work in progress", so I won't be able to provide a full solution to problem.
Standard MCMC methods can scale poorly to big data settings due to the need to evaluate the likelihood at each iteration. There have been a number of approximate MCMC algorithms that use sub-sampling ideas to reduce this computational burden, but with the drawback that these algorithms no longer target the true posterior distribution. We introduce a new family of Monte Carlo methods based upon a multi-dimensional version of the Zig-Zag process of (Bierkens, Roberts, 2016), a continuous time piecewise deterministic Markov process. While traditional MCMC methods are reversible by construction the Zig-Zag process offers a flexible non-reversible alternative. The dynamics of the Zig-Zag process correspond to a constant velocity model, with the velocity of the process switching at events from a point process. The rate of this point process can be related to the invariant distribution of the process. If we wish to target a given posterior distribution, then rates need to be set equal to the gradient of the log of the posterior. Unlike traditional MCMC, We show how the Zig-Zag process can be simulated without discretisation error, and give conditions for the process to be ergodic. Most importantly, we introduce a sub-sampling version of the Zig-Zag process that is an example of an exact approximate scheme. That is, if we replace the true gradient of the log posterior with an unbiased estimator, obtained by sub-sampling, then the resulting approximate process still has the posterior as its stationary distribution. Furthermore, if we use a control-variate idea to reduce the variance of our unbiased estimator, then both heuristic arguments and empirical observations show that Zig-Zag can be super-efficient: after an initial pre-processing step, essentially independent samples from the posterior distribution are obtained at a computational cost which does not depend on the size of the data.
In the context of nonparametric regression with unknown errors, we propose Bayesian methods to estimate the regression function f. We investigate frequentist properties of the resulting posterior distribution using the supremum-norm (sup-norm) distance. In particular, we study sup-norm posterior contraction rates and coverage of credible bands for f and its derivatives. We further study issues concerning adaptation in sup-norm, and provide adaptive Bayesian procedures that achieve the minimax sup-norm rate. We found that priors based onLepskis method, spike and slab priors, and scaled integrated Brownian motion priors will work. The study of posterior contraction rates and credible sets in sup-norm is important, for its natural interpretation and implications for other problems such as function mode estimation.
Given a sample of size n from a population of individuals belonging to different species with unknown proportions, a popular problem of practical interest consists in making inference on the probability D_{n}(l) that the (n+1)-th draw coincides with a species with frequency l in the sample, for any l=0,1,...,n. This paper contributes to the methodology of Bayesian nonparametric inference for D_{n}(l). Specifically, under the general framework of Gibbs-type priors we show how to derive credible intervals for a Bayesian nonparametric estimation of D_{n}(l), and we investigate the large n asymptotic behaviour of such an estimator. Of particular interest are special cases of our results obtained under the specification of the two parameter Poisson--Dirichlet prior and the normalized generalized Gamma prior, which are two of the most commonly used Gibbs-type priors. With respect to these two prior specifications, the proposed results are illustrated through a simulation study and a benchmark Expressed Sequence Tags dataset. To the best our knowledge, this illustration provides the first comparative study between the two parameter Poisson--Dirichlet prior and the normalized generalized Gamma prior in the context of Bayesian nonparemetric inference for D_{n}(l). [Joint work with S. Favaro (University of Torino); Bernardo Nipoti (Trinity College, Dublin); Yee Whye Teh (University of Oxford)]
In several fields like genetics, viral dynamics, pharmacokinetics and pharmacodynamics, population studies and so on, regression models are often given by differential equations which are not analytically solvable. In this talk, Bayesian estimation and uncertainty quantification is addressed in such models. The approach is based on embedding the parametric model in a nonparametric regression model and extending the definition of the parameter beyond the original model. The nonparametric regression function is expanded in a basis and normal priors are given on coefficients leading to a normal posterior, which then induces a posterior distribution on the model parameters through a projection map. The posterior can be obtained by a simple direct sampling. We establish Bernstein-von Mises type theorems for the induced posterior distribution of the model parameters. We consider different choices of the projection map and study its impact on the asymptotic efficiency of the Bayesian estimator. We further show that posterior credible regions have asymptotically correct frequentist coverage. A simulation study and applications to some real date sets show practical usefulness of the method. Ideas of extending the methods to higher order differential equations and partial differential equations will also be discussed. [This talk is based on joint work with Prithwish Bhaumik.]
Regression models, particularly of the signal-plus-noise variant, play a central role in statistics and are a fundamental tool in many applied fields. Typically, the noise terms are assumed to be independent but this is often not a realistic assumption. Methods for selecting bandwidths/smoothing parameters for kernel/spline estimators like generalised cross-validation (GCV) break down even if the correlation is mild. To deal with this, two common approaches are to either ârobustifyâ the criteria for selecting bandwidth/smoothing parameters, or making a parametric assumption on the noise. Unfortunately, both approaches are very sensitive to misspecification. The approach I will talk about is fully non-parametric. In this talk I will focus on penalised spline estimators, essentially smoothing splines with relatively few knots. I will show how they can be interpreted as Bayesian estimators (corresponding to a certain prior on the regression function). An alternative interpretation is as best linear unbiased predictors (BLUPs) in a linear mixed-effects model (LMM). The spline parameters are estimated via the empirical Bayes approach. I will talk a bit about some implementation issues, and about the asymptotics of the estimators. These asymptotics make explicit the influence of the correlation structure on the smoothing parameters of the penalised spline, and introduce some non-trivial constraints on the order of the splines. I will close with some numerical experiments where I compare our approach to two kernel estimators, and to a standard R procedure based on a (parametric) assumption on the noise structure.
[joint work w/ Tatyana Krivobokova, Francisco Rosales (Univ. of Goettingen)]
A large number of continuous shrinkage priors has been proposed to tackle the sparse normal means problems. Many of these shrinkage priors can be written as a scale mixture of normals, which makes them particularly easy to implement. We propose general conditions on the prior on the local variance in scale mixtures of normals, such that posterior contraction at the minimax rate is assured. The conditions require tails at least as heavy as Laplace, but not too heavy, and a large amount of mass around zero relative to the tails, more so as the sparsity increases. These conditions give some general guidelines for choosing a shrinkage prior for estimation under a nearly black sparsity assumption. We verify these conditions for the class of priors considered by Ghosh and Chakrabarti (2015), which includes the horseshoe and the normal-exponential gamma priors, and for the horseshoe+, the inverse-Gaussian prior, the normal-gamma prior, and the spike-and-slab Lasso, and thus extend the number of shrinkage priors which are known to lead to posterior contraction at the minimax estimation rate.
In the many normal means model we construct an empirical Bayes posterior which we then use for uncertainty quantification for the unknown (possibly sparse) parameter by constructing an estimator and a confidence set around it as empirical Bayes credible ball. We allow the model to be misspecified (the normality assumption can be dropped, with some moment conditions instead), leading to the robust empirical Bayes inference. An important step in assessing the uncertainty is the derivation of the fact that the empirical Bayes posterior contracts to the parameter with a local (i.e., depending on the parameter) rate which is the best over certain family of local rates; therefore called oracle rate. We introduce the so called Excessive Bias Restriction under which we establish the local (oracle) confidence optimality of the empirical Bayes credible ball. Adaptive minimax results (for the estimation and posterior contraction problems) over sparsity classes follow from our local results. An extra (square root of) log factor appears in the radial rate of the confidence ball; it is not known whether this is an artifact or not.
Frequentist conditions for asymptotic suitability of Bayesian procedures focus on lower bounds for prior mass in Kullback-Leibler neighbourhoods of the data distribution. In this talk, we investigate the flexibility in criteria for posterior consistency with i.i.d. data. We formulate a new posterior consistency theorem that applies both to well- and mis-specified models and which we use to re-derive Schwartz's theorem, consider Kullback-Leibler consistency and formulate consistency theorems in which priors charge metric balls. We also generalize to sieved models with Barron's negligible prior mass condition and to separable models with variations on
Walker's consistency theorem. Results also apply to marginal semi-parametric consistency: support boundary estimation is considered explicitly and consistency is proved in a model where the Kullback-Leibler priors do not exist. Other applications include Hellinger consistent density estimation in mixture models with Dirichlet or Gibbs-type priors of full weak support. Regarding posterior convergence at a rate, it is shown that under a mild integrability condition, the second-order Ghosal-Ghosh-van~der~Vaart prior mass condition can be relaxed to a lower bound to the prior mass in Schwartz's Kullback-Leibler neighbourhoods. The posterior rate of convergence is derived in a simple model for heavy-tailed distributions in which the Ghosal-Ghosh-van der Vaart
condition cannot be satisfied by any prior.
We develop a theory of 'generalized Bayesian inference' covering both standard Bayesian inference under misspecification and PAC-Bayesian inference. We define the \eta-generalized posterior, \eta=1 corresponding to standard Bayes, smaller \eta weighing the prior more strongly. We also define the \eta-convex hull of a probability model M, which for \eta=1 coincides with the standard convex hull but for smaller \eta shrinks towards M. Generalizing a construction due to Li and Barron, we show that for all \eta, there exists a distribution Q closest to the 'true' P* in KL divergence within the \eta-convex hull of M, and we define the *critical learning rate* \eta* as the largest \eta for which Q is not just in the \eta-convex hull but also in the model M itself. We show that generalized Bayes with any learning rate < \eta* concentrates as long as the prior puts sufficient mass in KL neighborhoods of Q, under no further conditions. A simple regression example shows that if generalized Bayes is run with a larger learning rate, it may not concentrate at all. We also show that conditions from the learning theory literature that ensure fast learning rates such as Tsybakov and Bernstein conditions, mixability and 'stochastic exp-concavity' can all be understood as special cases of the generic condition 'the critical learning rate should be large'. [Partially based on joint work with N. Mehta, T. van Ommen, T. van Erven. B. Williamson and M. Reid]
In our work we investigate the frequentist properties of the hierarchical Bayes and the maximum marginal likelihood empirical Bayes methods in the sparse multivariate mean model with unknown sparsity level. We consider the popular horseshoe prior introduced in Carvalho, Polson, and Scott (2008) and show that both adaptive Bayesian techniques lead to rate optimal posterior contraction without using any information on the sparsity level. Furthermore, we also investigate the frequenstist coverage properties of Bayesian credible sets resulting from the horseshoe prior both in the non-adaptive and adaptive setting. We show that the credible sets have good frequentist coverage and optimal size for appropriate choice of the tuning parameter (using information about the sparsity level). In case this information is not available the construction of adaptive and honest confidence sets is not possible, hence we have to introduce some additional restriction. We show that under a self-similarity type of assumption both the (slightly modified) hierarchical and empirical Bayes credible sets have (almost) rate adaptive size and good coverage. [Joint work with StĂ©phanie van der Pas and Aad van der Vaart.]
A multivariate, non-linear diffusion process with unknown parameters in drift and diffusion coefficient is partially observed with error at fix times. We introduce a process which closely approximates a diffusion bridge conditional on partial information about the location of the diffusion bridge at an intermediate time. We show that the distribution of this approximation and the conditional distribution of the diffusion bridge given the noisy intermediate observation are stochastically equivalent and we find the corresponding Girsanov likelihood in closed form. This leads to a Markov chain Monte Carlo procedure to sample from the joint distribution of the unobserved diffusion trajectory and the model parameters given the noisy, discrete, partial observations. This is illustrated at hand of the stochastic FitzHugh-Nagumo model for spike generation in squid axons modelling the axon membrane potential and a recovery variable, where only the membrane potential is observed.
Suppose we have continuous time observations $X^T=\{X_t:t\in[0,T]\}$ of a diffusion process $dX_t=b(X_t)dt+dW_t,$ with unknown drift parameter $b,$ which is 1-periodic and square integrable on $[0,1].$ In Bayesian setting Gaussian process priors were considered (see for instance Pokern, Stuart and van Zanten (2013), van Waaij and van Zanten (2015)). A randomly scaled and truncated wavelet series prior with Gaussian coefficients was proposed by van der Meulen, Schauer and van Zanten (2014), for which they develop an efficient algorithm to sample from the posterior. In this talk we will discuss our recent work on the asymptotic properties of this prior. Optimal rates up to a log factor and adaptivity to every Besov smoothness bigger than 1/2 will be shown. [Joint work with Frank van der Meulen, Harry van Zanten and Moritz Schauer.]
The so-called "Bayesian framework for interpreting evidence" uses Bayes rule to define the roles of (forensic) experts and lawyers. The experts are supposed to derive a likelihood ratio (LR) / Bayes Factor defined as the ratio of the probability of observing the evidence under two competing hypotheses. The lawyer can use this LR to update his prior beliefs in the hypotheses. Thus, the LR is seen as a numerical measure of evidential strength. This framework is currently considered the state of the art in forensic science, and is implemented in several forensic laboratories including the Netherlands Forensic Institute. I will explain the basic ideas and then highlight some difficulties when trying to actually calculate LRs in forensic casework (DNA, fingerprints, chemical analyses of e.g. glass and fire debris). One of these is a discussion about the fundamental question whether it makes sense to consider the uncertainty of a LR. I would value the opinion of the audience on my answer to this question.
The talk will discuss ideas for the construction of Bayesian predictive distributions; in particular, a recursive expression for fast estimation of the predictive, which avoids the need to compute the posterior distribution, will be presented. The key to the construction is the bivariate Gaussian copula.
In Bayesian nonparametrics it is common to consider a family of prior distribution indexed by some hyper parameters. The best choice of the prior out of this collection crucially depends on certain characteristics (e.g. smoothness,
sparseness,...) of the unknown function of interest, which are usually not available. Therefore in practice it is common to apply data dependent choices for the hyper-parameters. Arguably, the marginal likelihood empirical Bayes method is the best known data-dependent Bayesian procedure. The performance of this method was investigated only in specific models. Our aim is to investigate the performance of this method in a general nonparametric framework. We provide general theorems describing the frequentist behaviour of the empirical Bayes posterior distribution under âstandardâ assumptions. Then we apply the main theorem for various examples, recovering some of the existing results in the literature, along side with new models. [This is joint work with Judith Rousseau.]
General posterior contraction theorems are not suitable to deal with truly ill-posed inverse problems, as they lead to properties of the posterior for Kf rather than f. In other words, we obtain bounds on contraction rates in some natural metric measuring the distance between Kf and Kf_0, whereas the interest lies in the distance between f and f_0, and these two metrics are not equivalent. In this talk we review (a part of) the existing literature on Bayesian
approach to nonparametric inverse problems, and present a general contraction theorem. Our general result allows us to obtain minimax adaptive concentration rates in several settings, including a fixed-design nonparametric inverse regression example. [Joint work with JB Salomond (CWI Amsterdam)]
The talk surveys some recent work on random probability measure vectors and their role in Bayesian statistics. Indeed, dependent nonparametric priors are useful tools for drawing inferences on data that arise from different studies or experiments and for which the usual exhangeability assumption is not satisfied. The specific proposal that will be displayed gives rise to dependent discrete random probability measures and the talk will focus on their application to the analysis of right-censored survival data and to species sampling problems. The theoretical results to be presented are also relevant for devising Gibbs sampling schemes that will be applied to simulated and real datasets.
We propose a nonparametric Bayesian procedure for estimating a smooth function on an expanding graph. In particular, we investigate how the convergence rates of such procedures depend on the smoothness of the function and the geometry of the graph. Here both notions of ''geometry'' and ''smoothness'' are quantified using the Laplacian of the graph. We prove that using a rescaled Gaussian prior we can obtain an estimator that adapts to the degree of smoothness of the unknown function. Finally, we discuss the families of the graphs that satisfy our condition on the spectrum of the Laplacian.
Observe continuous observations of a one-dimensional diffusion process, where the drift function has some Sobolev smoothness. If the Sobolev-smoothness is known, put a Gaussian prior on the drift function. This gives minimax convergence rates and hence improve the results of Pokern, Stuart and Van Zanten (2012). If the Sobolev smoothness is known to be bounded from above, apply random scaling to a sufficiently smooth Gaussian prior. In this case we also obtain minimax convergence rates. If the Sobolev smoothness is unknown, apply random scaling to a Gaussian prior, where the smoothness of the Gaussian prior increases with the time T, this gives minimax rates up to a log factor. In this talk you will see how random scaling applied to a Gaussian process can be used to obtain a prior with adaptive optimal convergence rates.
We study nonparametric Bayesian inference with location mixtures of the Laplace density and a Dirichlet process prior on the mixing distribution. We derive a contraction rate of the corresponding posterior distribution, both for the mixing distribution relative to the Wasserstein metric and for the mixed density relative to the Hellinger and Lq metrics.
Bayesian consistency theorems come in (at least) three distinct types, e.g. Doob's prior-almost-sure consistency on Polish spaces (Doob, 1948), Schwartz's Hellinger consistency with KL-priors (Schwartz, 1965) and the `tailfree' weak consistency of Dirichlet posteriors. We ask the question how these notions are related and argue that one characterises them most conveniently using tests. We show that the existence of Bayesian tests is equivalent with Doob-like consistency of the posterior and show that Bayesian tests exist in much greater abundance than uniform tests. As examples we consider hypothesis testing problems like Cover's rational mean problem (Cover, 1973), tests for smoothness in Sobolev classes and tests for connectedness or cyclicality in networks. To achieve frequentist posterior consistency, we combine Bayesian tests with a prior condition that generalises Schwartz's KL-condition and accommodates weak consistency, e.g. involving the `tailfree' property of the Dirichlet distribution and others.
Carvalho, Polson and Scott (2010) introduced the horseshoe prior for the multivariate normal mean model in the situation that the mean vector is sparse in the nearly black sense. The corresponding posterior mean is used as an estimator of the underlying mean vector. We assume the frequentist framework where the data is generated according to a fixed mean vector. I will discuss some results on the $\ell_2$ risk and the rate of contraction of the posterior distribution around the horseshoe estimator. [Joint work with Bas Kleijn and Aad van der Vaart.]
We propose a Bayesian non parametric approach to test for monotonicity in a regression setting. In that context, the usual Bayes factor approach gives poor results in practice. We thus study an alternative approach that is both efficient and straightforward to implement, which is a great improvement compared to the existing frequentists procedures. Furthermore we study its asymptotic properties and prove that our procedure attains the adaptive minimax separation rate for a wide variety Hoelder smooth alternatives.
Principal component analysis (PCA) is possibly one of the most widely used statistical tools to recover a low rank structure of the data. In the high-dimensional settings, the leading eigenvector of the sample covariance can be nearly orthogonal to the true eigenvector. A sparse structure is then commonly assumed along with a low rank structure. Recently, minimax estimation rates of sparse PCA were established under various interesting settings. On the other side, Bayesian methods are becoming more and more popular in high dimensional estimation. But there is little work to connect frequentist properties and Bayesian methodologies for high dimensional data analysis. In this talk, we propose a prior for the sparse PCA problem, and analyze its theoretical properties. The prior adapts to both sparsity and rank. The posterior distribution is shown to contract to the truth at optimal minimax rates. In addition, a computational strategy for the rank-one case is discussed.
I will the discuss the following aspects of recent work with Ismael Castillo on Bernstein-von Mises theorems in nonparametric models:
1) unlike in finite-dimensional models, whether a nonparametric Bayesian credible set is an exact frequentist confidence sets depends crucially on the geometry of the set. 2) the geometry, or 'spaces', in which exact posterior asymptotics can be obtained have a natural interpretation in terms of `multi-scale statistics', used commonly in the frequentist literature. 3) discuss such multi-scale results that we could obtain for i.i.d. sampling models, including applications to Donsker-Kolmogorov-Smirnov theorems and confidence bands for random histograms.
Suppose we observe data that aggregates into a graph, or more generally, a matrix or a higher-order array. As more data becomes available, the size of the graph increases. I will explain how Bayesian models of such data can be derived if the graph is exchangeable, and what exchangeability means for this type of data. I will then discuss why exchangeable models are misspecified for network data, and summarize what we know about the (so far completely open) problem of finding an alternative concept suitable for networks.
Conditions for the rate of contraction of posterior distribution always put a sufficiency of prior mass in sharpened Kullback-Leibler neighborhoods (Ghosal, Ghosh and Van der Vaart (2000)). In this talk, we try to accommodate larger class of priors and formulate the corresponding part about the rate of convergence of posterior distribution of theorem 1.2 in Kleijn (2013) based on more relaxed assumption for the prior and some stringent conditions for the model. Now we are working on the application for the support boundary estimation and this work is in progress. [This is a joint work with Bas Kleijn.]
We investigate the problem of deriving posterior contraction rates under different loss functions in nonparametric Bayes. In a first part, we derive lower bounds on posterior coverages of shrinking neighbourhoods and discuss implications on proof strategies to derive posterior contraction rates. In a second part, feasible priors are constructed that lead to adaptive rates of contraction under L2 or Lâ metrics and that moreover achieve our lower bound. As an outlook, we discuss some consequences on the asymptotic behaviour of posterior credible balls.
[This ist a joint work with Marc Hoffmann and Judith Rousseau.]
In Bayesian statistics species sampling models are random discrete distributions that can serve as priors for the distribution of the data. The Dirichlet prior is the most famous example. A random sample of size n from a discrete distribution will induce a partition of {1,...,n} by the pattern of ties in the sample (the `distinct species'). This links species sampling models to exchangeable partitions of {1,...,n}, of which there is a rich probabilistic theory. For the Dirichlet process this is the famous `Chinese restaurant process'. This talk is a review of some of this theory, the Bayesian applications, and some open problems.
There has been an increasing literature in the past ten years on asymptotic properties of Bayesian nonparametric procedures, initiated mostly by the work of Ghosal, Ghosh and van der Vaart (1999) on posterior concentration rates for density estimation. There has been an increasing literature on the posterior concentration in nonparametric models and most of this literature deal with measures of concentrations in terms of losses that are "natural", like the $L_2$ in Gaussian regression models or the Hellinger or the $L_1$ metric in density estimation models. Recently some negative results have been obtained showing that bias might appear when some other types of losses are considered. In this work we first give some general results linking the control on the posterior concentration rate to the considered loss and some "natural loss" in a way similar to Cai and Low (2006). Then we study more precisely the case of the $L_\infty$ loss for which we exhibit both lower and upper bounds and we propose an adaptive Bayesian nonparametric prior in the case of the white noise model.
[Joint work with Marc Hoffman and Johannes Schmidt - Hieber (Paris Dauphine and CREST - ENSAE)]
Bayesian hierarchical clustering priors such as the Dirichlet Diffusion Tree and Kingman's Coalescent are flexible modelling tools in which a datum can belong to a family of nested clusters, rather than a single cluster. This representation has many advantages; most notably a better sharing of statistical information among data. However, inference in these models is often difficult and computationally expensive, largely due to a tying of the tree topology and branch lengths in the prior. Through a connection of the Coalescent to Aldous' Beta-splitting model, we can construct priors where the tree topology and branch lengths factorize. This provides two benefits: there is a more flexible choice in the priors that can be constructed and more efficient Gibbs type inference can be used. We demonstrate this on an example model for density estimation and show the model achieves competitive experimental results.
A usual path for establishing posterior consistency for specific statistical models is through application of one of the general posterior consistency results (Schwartz (1965), Barron et al. (1999), Walker (2004)). For many (simpler) models, however, such an approach appears to be overkill. In this talk, assuming that a sample $X_{i/n},i=0,1,\ldots,n$ from a rescaled Brownian motion $X_t=\int_0^t \sigma(s)dW_s$ is available (here $\sigma$ is a deterministic function that parametrises the model), we will show how to establish posterior consistency for non-parametric Bayesian estimation of $\sigma$ using direct arguments that bypass general posterior consistency theorems.
[The talk is based on a joint work with Peter Spreij.]
An unconventional application of the minimax theorem gives rise to a versatile sufficient condition for posterior consistency. We apply this condition to re-derive Schwartz' consistency theorem (Schwartz (1965)), sharpen its assertion somewhat and formulate several other consistency theorems. The main benefit of the proposed approach is enhanced flexibility in the choice of the prior: example consistency theorems formulate priors that charge Kullback-Leibler balls (as in Schwartz' theorem), as well as Hellinger and other metric balls. Marginal consistency in semi-parametric estimation problems falls within the range of application and an example is considered.
We present a family ('model') M of probability distributions, a distribution P outside M and a Bayesian prior distribution on M, such that
- M contains a distribution Q within a small (Hellinger, KL or L2) distance \delta from P. Nevertheless:
- when data are sampled according to P, then, no matter how many data are observed, the Bayesian posterior puts nearly all its mass on distributions that are at a distance from P that is much larger than \delta.
The result is fundamentally different from earlier Bayesian inconsistency results by Diaconis and Freedman, since we can choose M countable and the prior on Q to be > 0; if the model M were well-specified (`true'), then by Doob's theorem this would immediately imply consistency. We also discuss how the results can coexist with the Bayesian misspecification consistency results of Kleijn and Van der Vaart (2006). Partially based on joint work with John Langford (Microsoft Research New England). A preliminary version was presented at ISBA Valencia 2006 and published in the Machine Learning Journal, 2007.
In my talk I will go back to my first Bayes Club Seminar presentation in April 2011. Estimation of the end point of a distribution can be viewed as the shift or the scaling problem (or sometimes both). Assuming the underlying distribution possesses a density function, the behaviour of the density at the end point may simplify the estimation problem. I will show that densities with jumps give rise to a weakly converging expansion of the likelihood called local asymptotic exponentiality (LAE). This type of asymptotic behavior of the likelihood results in a one-sided, exponential posterior limit satisfying the irregular Bernstein-von Mises theorem. This can be then generalized to the semiparametric version of the irregular Bernstein-von Mises theorem. I will present a version of the theorem in a semiparametric LAE model for a shift parameter. Another semiparametric LAE example for a scale parameter will be also presented. In the latter setting one of the conditions of the irregular BvM seems to be much harder to verify than in the former. This might be surprising, since both semiparametric problems studied here are seemingly quite similar. [This is based on a joint work with B. Kleijn]
The volume of data is increasing on a similar exponential curve as Moore's law. Bayesian methods are at risk of becoming too computationally expensive for these large scale datasets. Stochastic gradient methods have served frequentists well to deal with these issues. I claim that Bayesian inference methods, and in particular MCMC sampling can also be adapted to reap the benefits of stochastic approximations. I will discuss different versions of "Stochastic Gradient Langevin Dynamics" that start off as stochastic gradient descent and then automatically transition into posterior samplers with the correct equilibrium distribution as the stepsize decreases. An improved version called "Stochastic Gradient Fisher Scoring" uses a preconditioning matrix to sample from the best Gaussian approximation of the posterior for large stepsizes. Experiments show that these samplers are practical and indeed perform well for large datasets.
Bayesian inference is attractive for its coherence and good frequentist properties. However, it is a common experience that eliciting a honest prior may be difficult and, in practice, people often take an empirical Bayes approach, plugging empirical estimates of the prior hyperparameters into the posterior distribution. Even if not rigorously justified, the underlying idea is that, when the sample size is large, empirical Bayes leads to "similar" inferential answers. Yet, precise mathematical results seem to be missing. In this work, we give a more rigorous justification in terms of merging of Bayes and empirical Bayes posterior distributions. We consider two notions of merging: Bayesian weak merging and frequentist merging in total variation. Since weak merging is related to consistency, we provide sufficient conditions for consistency of empirical Bayes posteriors. Also, we show that, under regularity conditions, the empirical Bayes procedure asymptotically selects the value of the hyperparameter for which the prior mostly favors the "truth". Joint work with Sonia Petrone and Judith Rousseau.
In my presentation I will talk about adaptive Bayesian techniques, such as empirical and hierarchical Bayes method, and apply them to solve mildly ill-posed inverse problems. I will investigate the behaviour of the adaptive posterior distribution from a frequentist point of view and show that the posterior distribution achieves the minimax rate of contraction up to a slowly varying term. Furthermore, if time allows, I will examine how much confidence can we put in the adaptive credible sets and try to construct the largest set on which we can trust adaptive credible sets as a measure of uncertainty. This is ongoing joint work with Aad, Harry, and Bartek.
We will consider the problem of estimating an unknown function f at a given point in a fixed design setting. A Bayesian approach with Brownian motion prior will be used to derive an estimator, which we will study using frequentist methods. We will investigate how the bias depends on the HĂ¶lder smoothness of the function f. This result will be used to study the coverage of Bayesian credible sets for this model.
We consider one general, simple theorem which gives a recipe how to construct exact (i.e., non-asymptotic) confidence sets under certain conditions. Next we discuss some applications.
Recently, highdimensional problems and sparsity constraints have gained a lot of attention due to their applicability in biology. In this talk, we consider a fully Bayesian approach to high-dimensional regression. For a class of priors, which accounts for sparsity, we provide results for the contraction rate of the posterior. We discuss the assumptions on the design matrix and relate them to existing work. For specific situations, we are able to determine the behavior of credible intervals. This is ongoing joint work with Aad and Ismael.
We consider the problem of (minimax and oracle) estimation of high (infinite) dimensional vector of binomial proportions. Under some conditions we derive the asymptotic behavior of the minimax risk over some nonparametric classes, in particular, a "binomial version" of Pinsker's result. Further, we might touch upon the issue of (empirical) Bayesian adaptation and the problem of optimal allocation of observations.
I will report on a joint project with Andrew Stuart and Yvo Pokern in which we study a Bayesian approach to nonparametric estimation of the periodic drift function of a one-dimensional diffusion from continuous-time data. We specify a centered Gaussian prior on the drift with a precision operator that is of differential form. It is proved that the posterior is Gaussian as well and we give an explicit expression for the posterior precision operator and show that the posterior mean is the solution of a certain differential equation. Moreover, we bound the rate at which the posterior contracts around the true drift function. The results rely on tools from the analysis of differential equations and new functional limit theorems for the local time of diffusions on the circle.
I will give an introduction to online learning, which deals with decision problems that can be formulated as a repeated game between the statistician and an adversary. This is a natural way to model problems like spam detection and optimization of financial portfolio's, which have an adversarial component. But there are also applications to data compression, for which particularly strong performance guarantees are possible.
Many standard algorithms in online learning can be interpreted as Bayesian methods, or approximations thereof. I will work out several examples. Time permitting, I will also present recent work with Peter Gruenwald, Wouter Koolen and Steven de Rooij, in which we analyse a new algorithm by viewing it as an approximation to Bayes, and show that fast convergence of the Bayesian posterior implies better decisions. I will review some methods to deal with MCMC on spaces of varying dimension, in particular the reversible jump algorithm by Green.
As an application, I will show how this method can be used to estimate the drift of a discretely observed diffusion. The particular hierarchical prior that we propose requires a slight adaptation of the basic algorithm. This concerns joint work with Moritz and Harry. One of the most interesting consequences of the (parametric) BvM Theorem is that the Bayesian and frequentist distributions of the estimation error are asymptotically the same, having asymptotic Gaussian shape; in particular, Bayesian credible sets and frequentist confidence regions must coincide asymptotically. This talk investigates the occurrence of this "phenomenon" in the Gaussian white noise model.
I will first give a 15 minutes presentation on Bayesian inverse problems, as preparation for the Philips Award session during NMC 2011 in Enschede. Then I will talk about semiparametric posterior limits under the condition of local asymptotic exponentiality. Consider the model indexed by a real-valued parameter \theta and a nuisance parameter \eta. Every element of the model has a density p_{\theta,\eta}(x) given by \eta(x-\theta), where \eta is a density function, supported on positive reals, and \eta(0) is positive and finite. I will show some results on the asymptotic behavior of the marginal posterior for the parameter of interest \theta in this semiparametric model.
My intention is to review some computational methods for posterior based on Gaussian priors: expectation propagation and Laplace approximation. The first I learned (mostly) from a thesis in Nijmegen, regarding the second I review the paper on INLA in JRSSb2009(?) by Chopin, Rue et al. They are claimed to be much faster than MCMC and just as good/bad.
No new stuff, and no theory, but if I find the time I'll prepare some pictures, and there may be possibilities of implementing similar algorithms on other models of interest. If you have any comments, questions, requests, suggestions etc. please contact B. Kleijn at the University of
Amsterdam.
B. Kleijn - Mon Sep 24, 2012 |