Research News Statistical methodology

Taking the high road

Sanjay CHAUDHURI (Group Leader, Statistics and Applied Probability) () September 05, 2016

5 Sep 2016. NUS statisticians have developed a “slope-assisted Hamiltonian Monte Carlo” sampling method to draw samples automatically from complex posterior distributions.

It is well known in the statistical domain that the set of parameter values on which the “empirical likelihood” (a constrained data reweighting method) is strictly positive is non-convex. A team led by Prof Sanjay CHAUDHURI, comprising NUS graduate student YIN Teng and Prof Debashis MONDAL from the Oregon State University, USA showed that at least one component of the slope vector of the log-empirical likelihood becomes extremely steep near the boundary of this set. They used this result to develop a method which is able to sample data automatically from the posterior distribution obtained using a Bayesian empirical likelihood procedure.

Due to its flexibility in several applications, when performing statistical modelling, it is advantageous to use “empirical likelihood”-based methods in the Bayesian paradigm. In this paradigm, any inference is drawn from the so-called “posterior distribution”, which represents the statistician’s belief about the parameter of interest after observing the data. This distribution cannot be expressed using mathematical symbols and equations in an analytic form. Statisticians need to generate data samples from the posterior distribution for it to be useful.

An efficient sampling procedure should be able to move between the regions where the posterior density is high. One problem of using the empirical likelihood method comes from the fact that the regions with high posterior density from a non-convex set. The “non-convexity” implies that the sampler cannot move easily and gets stuck at narrow, high-density ridges. Specific tuning is required for the sampler to move along the ridge in such cases. This cannot be automated easily. The research team has shown that under mild assumptions and primarily due to the slope of the log-empirical likelihood, the slope of the log-posterior density at the boundaries is steep. Any sampler which uses this information has a high chance of staying on the ridge, automatically moving along the “high road” and efficiently sampling the posterior density.

Other than proving this fundamental property of the empirical likelihood method, the research findings provide an automated means of obtaining data samples using a Bayesian empirical likelihood (BayesEL) procedure. Difficulty in sampling has been a huge bottleneck in using empirical likelihood-based methods in the Bayesian paradigm. Their findings provide a solution which can lead to the application of the BayesEL procedure in many areas of interest like cancer research, epidemiology, personalised medicine, genetics, social networks, public policy research etc., among others.

The proposed method assumes that the slope of the log-posterior can be defined. This may not always be the case. The research team plans to extend the flexibility of this method to handle situations whereby the log-posterior surface may be a step function.

26. Chaudhuri STA 20160719 1

Figure shows that the procedure efficiently samples the posterior by either climbing the high-density ridges (i.e. taking the high road) or in many cases jumping from one ridge to another. Should the sampler move to a low-density valley, the high value of the slope makes it bounce back to the top of the ridge.

Reference

Chaudhuri, S., Mondal, D. and Teng, Y. “Hamiltonian Monte Carlo sampling in Bayesian empirical likelihood computation.” Journal of the Royal Statistical Society Series B (Statistical Methodology). Available online, DOI: 10.1111/rssb.1216