A new feature: Technical publications of the week

Posted on 18 February 2017 by ecoquant

I’m beginning a new style of column, called technical publications of the week. While I can’t promise these will be weekly, I will, from time to time, highlight technical publications I’ve recently read which I consider to be noteworthy. I am going to read them all again.

My professional emphasis, recently, for Akamai Technologies, has been on the plethora of adaptions of random projection methods (see also), generally based upon direct application of the Johnson-Lindenstrauss lemma or its several improvements. Many of these are collected under the rubric of locality sensitive hashing or LSH.

A first paper is called Earthquake detection through computationally efficient similarity search, and is by C. E. Yoon, O. O’Reilly, K. J. Bergen, and G. C. Beroza, and appeared in 2015 in Science Advances. It also has supporting online material. Using a technique for audio fingerprinting by Baluja and Covell, the authors develop fingerprints for earthquakes and convert these to signatures using LSH. These were used to assess classification accuracy of uncatalogued and catalogued earthquakes relative to a manually identified set for the Calaveras Fault in California, comparing performance to that obtained through the well-known but slower and more computationally expensive technique of autocorrelation, as well as the catalogue.

Yoon, O’Reilly, Bergen, and Beroza report very promising results, despite the great reduction in computation needed. Of greater interest to me is fitting the LSH into a larger signal processing task, including prefiltering and then interpreting results afterwards. They document the progress of a canonical data science project, offering the finished product, but strongly suggesting the pitfalls and backtracking they needed to undertake to bring it to success. That kind of experience is instructive for both students of data science, and the managers that expect results from these investigations.

Second, two papers applying LSH to health-related time series, with nice discussion of engineering tradeoffs for these applications:

Y. B. Kim, E. Hemberg, U.-M. O’Reilly, “Stratiﬁed Locality-Sensitive Hashing for accelerated physiological time series retrieval,” 2016 IEEE 38th Annual International Conference of the Engineering in Medicine and Biology Society, October 2016
D. C. Kale, D. Gong, Z. Che, R. Wetzel, P. Ross, Y. Liu, G. Medioni, “An examination of multivariate time series hashing with applications to health care,” 2014 IEEE Conference on Data Mining, December 2014

Third, a paper, C. Luo, A. Shrivastava, “SSH (Sketch, Shingle, & Hash) for indexing massive-scale time series,” NIPS Time Series Workshop 2016, which offers an LSH-derived technique for preconditioning problems of time series comparison and lookups using dynamic time warping resulting in a net improvement of speed.

Fourth, not a paper, but an interview, from Dr Stephen Chu:

About ecoquant

See https://wordpress.com/view/667-per-cm.net/ Retired data scientist and statistician. Now working projects in quantitative ecology and, specifically, phenology of Bryophyta and technical methods for their study, notably Macrophotography. Some photos of mine: https://www.flickr.com/photos/198372469@N03/

View all posts by ecoquant →

This entry was posted in Anthropocene, big data, climate change, climate disruption, data science, data streams, earthquakes, geophysics, global warming, Hyper Anthropocene, Locality Sensitive Hashing, LSH, MinHash, numerical algorithms, numerical analysis, random projections, seismology, subspace projection methods, SVD, the right to be and act stupid, the tragedy of our present civilization, the value of financial assets. Bookmark the permalink.

1 Response to A new feature: Technical publications of the week

Eamonn Keogh says:

23 February 2017 at 22:54

I agree Yoon’s paper is very nice. But it approximately solves the problem. You can EXACTLY solve the problem faster, see….
http://www.cs.ucr.edu/~eamonn/STOMP_GPU_final_submission_camera_ready.pdf

—
The SSH paper is also approximately solving a problem that you can solve faster, but EXACTLY, see http://www.cs.ucr.edu/~eamonn/SIGKDD_trillion.pdf or watch video https://www.youtube.com/watch?v=d_qLzMMuVQg

The Yongwook Bryce Kim paper is also approximately solving a problem that you can solve faster, but EXACTLY, see http://www.cs.ucr.edu/~eamonn/SIGKDD_trillion.pdf or watch video https://www.youtube.com/watch?v=d_qLzMMuVQg

Loading...

Reply