Algorithmics of Data Streams

EECS 684, Current Topics in Databases, Winter 2005

Original announcement

Announcements

Presentations

Presenter Paper/Topic Date
Joel Lepak An Improved Data Stream Summary: The Count-Min Sketch and its Applications, G. Cormode and S. Muthukrishnan. Feb 22
Yufan Zhu M. Greenwald, S. Khanna. ``Space-Efficient Online Computation of Quantile Summaries'', Proceedings of the 2001 ACM SIGMOD Intl. Conference on Management of Data, pp. 58-66, Santa Barbara, CA, May 21-24, 2001. See here. Feb 24 and March 8
Spring Break March 1 to March 3
Yufan Zhu continued March 8
Mark Iwen Sliding Window/Decay March 10
Xuan Zheng Workload-Optimal Histograms on Streams, Muthukrishnan, Strauss, and Zheng. March 15
Young Ham TAG: A Tiny AGgregation Service for Ad-Hoc Sensor Networks, by Samuel Madden, Michael J. Franklin, and Joseph M. Hellerstein, Wei Hong March 17
Smita Krishnaswamy Clustering Data Streams: Theory and Practice, by S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'callaghan March 22
Zhigang Chen Model-Driven Data Acquisition in Sensor Networks March 24
Martin Strauss Moses Charikar, Liadan O'Callaghan, Rina Panigrahy: Better streaming algorithms for clustering problems. STOC 2003: 30-39. March 31
Xuan Zheng M. B. Greenwald and S. Khanna, Power-Conserving Computation of Order-Statistics over Sensor Networks, in the 23rd ACM Symposium on Principles of Database Systems (PODS 2004), pp. 275-285, Paris, France, June 13-18, 2004. April 5
Mark Iwen Edith Cohen, Haim Kaplan: Spatially-decaying aggregation over a network: model and algorithms. SIGMOD Conference 2004: 707-718. April 7
Joel Lepak Suman Nath, Phillip B. Gibbons, Srinivasan Seshan, Zachary R. Anderson: Synopsis Diffusion for Robust Aggregation in Sensor Networks April 12
Young Ham Madden et al, Sigmod'03: Design of an Acquisitional Query Processor (TinyDB) April 14
Yufan Zhu What's new: Finding significant differences in network data streams. G. Cormode and S. Muthukrishnan. INFOCOM 2004. April 19

Tentative Reading List

Topic Paper(s)
Basic histogram building, aggregate model (including orthogonality and Haar wavelets and one-dimensional range queries) Approximate Histogram and Wavelet Summaries of Streaming Data, draft book chapter, S. Muthukrishnan and M. Strauss.
Histograms under Non-uniform workload Workload-Optimal Histograms on Streams, S. Muthukrishnan, M. Strauss, and X. Zheng.
Basic sketches (randomized linear projections) and their use in histograms for dynamic data Class notes
Count-min sketch An Improved Data Stream Summary: The Count-Min Sketch and its Applications, G. Cormode and S. Muthukrishnan.
Quantiles
  1. M. Greenwald, S. Khanna. ``Space-Efficient Online Computation of Quantile Summaries'', Proceedings of the 2001 ACM SIGMOD Intl. Conference on Management of Data, pp. 58-66, Santa Barbara, CA, May 21-24, 2001. See here.
  2. How to Summarize the Universe: Dynamic Maintenance of Quantiles, A. Gilbert, Y. Kotidis, S. Muthukrishnan and M. Strauss.
Clustering
  1. Clustering Data Streams: Theory and Practice (S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'callaghan) See here.
  2. Moses Charikar, Liadan O'Callaghan, Rina Panigrahy: Better streaming algorithms for clustering problems. STOC 2003: 30-39. See here.
Sliding windows and decay
  1. Maintaining Stream Statistics over Sliding Windows, M. Data, A. Gionis, P. Indyk, and R. Motwani. See here.
  2. Maintaining time-decaying stream aggregates, E. Cohen and M. Strauss, PODS, 2003.
Sensor networks
  1. Samuel Madden, Michael J. Franklin, and Joseph M. Hellerstein, Wei Hong: TAG: A Tiny AGgregation Service for Ad-Hoc Sensor Networks
  2. Suman Nath, Phillip B. Gibbons, Srinivasan Seshan, Zachary R. Anderson: Synopsis Diffusion for Robust Aggregation in Sensor Networks
  3. Edith Cohen, Haim Kaplan: Spatially-decaying aggregation over a network: model and algorithms. SIGMOD Conference 2004: 707-718. See here.
  4. More to be determined; see below.

Additional papers

Topic Paper(s)
Histograms
  1. Approximation Algorithms for Histogram Construction Problems (S.Guha, N. Koudas and K. Shim). See here.
  2. REHIST:Relative Error Histogram Construction Algorithms (S. Guha, K. Shim and J. Woo). See here.
Quantiles / sensor networks M. B. Greenwald and S. Khanna, ``Power-Conserving Computation of Order-Statistics over Sensor Networks'', in the 23rd ACM Symposium on Principles of Database Systems (PODS 2004), pp. 275-285, Paris, France, June 13-18, 2004. See here.
Sketches "Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation", P. Indyk. 41st Symposium on Foundations of Computer Science, 2000. See here.

Additional Links

Massive datasets generally: Streaming: Sensor networks courses and reading lists:

Copyright Notices

In most cases, papers are available for classroom use with no fee. Some copyright notices are found here.

Other Announcements

Class announcement

Many guest lectures from natural scientists. Stat 701-2, Large Datasets: Research and Applications in the Natural Sciences Mondays, 4-5:30.

Thanks

Thanks to Muthu Muthukrishnan of Rutgers and Phil Gibbons of Intel for suggesting some of the papers.