Bayesian blocks via PELT in R

The Bayesian blocks algorithm of Scargle, Jackson, Norris, and Chiang has an enthusiastic user community in astrostatistics, in data mining, and among some in machine learning. It is a dynamic programming algorithm (see VanderPlas referenced below) and, so, exhibits optimality when used without performance-related shortcuts. There is an implementation by Scargle and others in MATLAB, and a popular version exists in the astropy and astroML modules for Python. Scargle introduced the idea about 2001, and has repeatedly improved it. I searched for a comparable implementation for R, the programming language I use most heavily and could not find one.

In time, however, I discovered the work of Killick, Fearnhead, and Eckley, led to it by an oceanographer colleague. They build upon the work of Jackson, Scargle, Barnes, Arabhi, Alt, Gioumousis, Gwin, Sangtrakulcharoen, Tan, and Tsai, but find a pruning step which lets the algorithm achieve the same result but in linear time. Their algorithm, known as PELT, has been incorporated into two R packages on CRAN, changepoint and changepoint.np, and has found use in oceanography and other areas. Still, a one-for-one substitute for the Scargle-Jackson-Norris-Chiang Bayesian blocks was missing, and it is the purpose of this post to remedy that.

  • R. Killick, P. Fearnhead, And I. A. Eckley, “Optimal detection of changepoints with a linear computational cost”, Journal of the American Statistical Association, December 2012, 107(500), DOI: 10.1080/01621459.2012.737745
  • R. Killick, I. A. Eckley, K. Ewans, P. Johnson, “Detection of changes in the characteristics of oceanographic time-series using changepoint analysis”, Ocean Engineering, 2010, 37(13), 10.1016/j.oceaneng.2010.04.009
  • R. Killick, I. A. Eckley, “changepoint: An R package for changepoint analysis”, Journal of Statistical Software, June 2014, 58(3), 10.18637/jss.v058.i03
  • B. Jackson, J. D. Scargle, D. Barnes, S. Arabhi, A. Alt, P. Gioumousis, E. Gwin, P. Sangtrakulcharoen, L. Tan, T. T. Tsai, “An algorithm for optimal partitioning of data on an interval”, IEEE Signal Processing Letters, February 2005, 12(2), 10.1109/LSP.2001.838216
  • J. D. Scargle, J. P. Norris, B. Jackson, J. Chiang, “Studies in astronomical time series analysis. VI. Bayesian Block representations”, The Astrophysical Journal, 2013 February 20, 764(167), 10.1088/0004-637X/764/2/167
  • J. VanderPlas, “Dynamic programming in Python: Bayesian Blocks”, Pythonic Perambulations, 12 September 2012, a blog entry
  • bayesian_blocks, a component of astropy.stats, the astropy module for Python
  • 4.1 Bayesian Blocks: Histograms the right way, a component of 4. Unsupervised Learning: Density Estimation, the astroML module for Python
  • J. D. Scargle, “Bayesian Blocks in two or more dimensions: Image segmentation and cluster analysis”, arXiv:math/0111128v1

Jake VanderPlas is one of the astroML authors and, specifically that of the Bayesian blocks implementation therein. He offers a description of the Scargle algorithm, explaining it as an instance of a dynamic programming code pattern. He also offers a clever mixture distribution consisting of five Cauchy distributions for illustrating his code, a mixture distribution I have adopted for illustrating this, my implementation of Bayesian blocks in R. (R code is at link.)

Comparing its performance to that of a standard histogramming facility, truehist in the MASS package, we see the following:
bb_histogram_comparison_2016-08-01_195029
(Click on figure to see a larger image, then use browser Back Button to return to blog.)

The result from VanderPlas’ implementation can be seen at his blog, just above the Conclusion at its bottom.

But VanderPlas’ mixture also can serve as a series for another test. Bayesian blocks has been proposed as a signals representation as well, one which can be used for compression as well as specifying features. The implementation here, based upon the Killick-Fearnhead-Eckley PELT, can do that as well, as can be seen by the figure below:
bb_segmentation_of_series_2016-08-01_195518
(Click on figure to see a larger image, then use browser Back Button to return to blog.)

The code is available from my Google space as a gzip‘d R file. Please feel free to use for anything you like, and, not only would I be happy to answer questions about it, I’d like to know what kinds of things you are using it for, or receiving suggestions for improvements. Please comment below!

I am also intending to generalize the code so it can be used for multidimensional Bayesian blocks, as Scargle has suggested (in J. D. Scargle, “Bayesian Blocks in two or more dimensions: Image segmentation and cluster analysis”, arXiv:math/0111128v1), but, of course, based upon the Killick-Fearnhead-Eckley PELT. My particular interest is cluster-finding.

About hypergeometric

See http://www.linkedin.com/in/deepdevelopment/ and http://667-per-cm.net
This entry was posted in American Statistical Association, AMETSOC, anomaly detection, astrophysics, Cauchy distribution, changepoint detection, engineering, geophysics, multivariate statistics, numerical analysis, numerical software, numerics, oceanography, population biology, population dynamics, Python 3, quantitative biology, quantitative ecology, R, Scargle, spatial statistics, square wave approximation, statistics, stepwise approximation, time series, Woods Hole Oceanographic Institution. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s