A new feature: Technical publications of the week

I’m beginning a new style of column, called technical publications of the week. While I can’t promise these will be weekly, I will, from time to time, highlight technical publications I’ve recently read which I consider to be noteworthy. I am going to read them all again.

My professional emphasis, recently, for Akamai Technologies, has been on the plethora of adaptions of random projection methods (see also), generally based upon direct application of the Johnson-Lindenstrauss lemma or its several improvements. Many of these are collected under the rubric of locality sensitive hashing or LSH.

A first paper is called Earthquake detection through computationally efficient similarity search, and is by C. E. Yoon, O. O’Reilly, K. J. Bergen, and G. C. Beroza, and appeared in 2015 in Science Advances. It also has supporting online material. Using a technique for audio fingerprinting by Baluja and Covell, the authors develop fingerprints for earthquakes and convert these to signatures using LSH. These were used to assess classification accuracy of uncatalogued and catalogued earthquakes relative to a manually identified set for the Calaveras Fault in California, comparing performance to that obtained through the well-known but slower and more computationally expensive technique of autocorrelation, as well as the catalogue.

Yoon, O’Reilly, Bergen, and Beroza report very promising results, despite the great reduction in computation needed. Of greater interest to me is fitting the LSH into a larger signal processing task, including prefiltering and then interpreting results afterwards. They document the progress of a canonical data science project, offering the finished product, but strongly suggesting the pitfalls and backtracking they needed to undertake to bring it to success. That kind of experience is instructive for both students of data science, and the managers that expect results from these investigations.

Second, two papers applying LSH to health-related time series, with nice discussion of engineering tradeoffs for these applications:

Third, a paper, C. Luo, A. Shrivastava, “SSH (Sketch, Shingle, & Hash) for indexing massive-scale time series,” NIPS Time Series Workshop 2016, which offers an LSH-derived technique for preconditioning problems of time series comparison and lookups using dynamic time warping resulting in a net improvement of speed.

Fourth, not a paper, but an interview, from Dr Stephen Chu:

About ecoquant

See https://wordpress.com/view/667-per-cm.net/ Retired data scientist and statistician. Now working projects in quantitative ecology and, specifically, phenology of Bryophyta and technical methods for their study.
This entry was posted in Anthropocene, big data, climate change, climate disruption, data science, data streams, earthquakes, geophysics, global warming, Hyper Anthropocene, Locality Sensitive Hashing, LSH, MinHash, numerical algorithms, numerical analysis, random projections, seismology, subspace projection methods, SVD, the right to be and act stupid, the tragedy of our present civilization, the value of financial assets. Bookmark the permalink.

1 Response to A new feature: Technical publications of the week

  1. Eamonn Keogh says:

    I agree Yoon’s paper is very nice. But it approximately solves the problem. You can EXACTLY solve the problem faster, see….
    http://www.cs.ucr.edu/~eamonn/STOMP_GPU_final_submission_camera_ready.pdf


    The SSH paper is also approximately solving a problem that you can solve faster, but EXACTLY, see http://www.cs.ucr.edu/~eamonn/SIGKDD_trillion.pdf or watch video https://www.youtube.com/watch?v=d_qLzMMuVQg

    The Yongwook Bryce Kim paper is also approximately solving a problem that you can solve faster, but EXACTLY, see http://www.cs.ucr.edu/~eamonn/SIGKDD_trillion.pdf or watch video https://www.youtube.com/watch?v=d_qLzMMuVQg

Leave a Reply to Eamonn KeoghCancel reply