While Bayesian methods are extremely popular in statistics and machine
learning, their application to massive data sets is often challenging, when
possible at all. The classical MCMC algorithms are prohibitively slow when
both the model dimension and the sample size are large. Variational Bayesian
methods aim at approximating the posterior by a distribution in a tractable
family F. Thus, MCMC are replaced by an optimization algorithm which is
orders of magnitude faster. VB methods have been applied in such computationally demanding applications as collaborative filtering, image and video
processing or NLP to name a few. However, despite nice results in practice,
the theoretical properties of these approximations are not known. We propose
a general oracle inequality that relates the quality of the VB approximation to
the prior π and to the structure of F. We provide a simple condition that allows to derive rates of convergence from this oracle inequality. We apply our
theory to various examples. First, we show that for parametric models with
log-Lipschitz likelihood, Gaussian VB leads to efficient algorithms and consistent estimators. We then study a high-dimensional example: matrix completion, and a nonparametric example: density estimation.
ALQUIER, P. et RIDGWAY, J. (2020). Concentration of tempered posteriors and of their variational approximations. Annals of Statistics, 48(3), pp. 1475-1497.