Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/83754
Title: New query and analytics over large sequence data : a study on streaks and stream
Authors: Bai, Ran
Degree: Ph.D.
Issue Date: 2019
Abstract: Sequence data consists of ordered items or elements. Query processing and analytics on large sequence data have many research challenges. This thesis studies two important problems in this domain. The first part of this thesis studies a new problem of finding "historic moments" from sequence data. Specifically, we introduce a new concept called "historic moments", which is motivated from real applications such as computational journalism. We present algorithms to efficiently compute historic moments from sequence data. The algorithm is incremental and space-optimal, meaning that when facing new data arrival, it is able to efficiently refresh the results by keeping minimal information. Case studies show that historic moments can significantly improve the insights offered by prominent streaks alone. The second part of this thesis studies another new problem of answering range-count query over data stream. Specifically, in applications such as network monitoring, telecommunication analysis, and sensor measurements, massive amounts of data arrive as a high-rate stream and real-time analytic over the stream data is required. Maintaining a succinct synopsis structure called sketch over the data stream has been a dominant approach to support analysis in those applications. Recent applications, however, demand more sophisticated types of queries and range-count query is our focus in this thesis. Unfortunately, state-of-the-art sketches perform poorly when facing range-counting as none of them was designed to support range-count queries at the outset. In this thesis, we aim to fill the gap and present LSH-Sketch, a sketch that supports range-counting over rapid data stream. As a sketch that supports range-count queries, LSH-Sketch can naturally support point-count queries as well. As its name suggests, LSH-Sketch is based on the use of locality sensitive hashing. Like the classic CM-Sketch, LSH-Sketch is also a core sketch that many sketch variants and applications can be built on top. Empirical results show that LSH-Sketch's insertion throughput is as good as CM-Sketch and it outperforms CM-Sketch in terms of accuracy and query throughput under all query ranges. LSH-Sketch thus has the potential to replace CM-Sketch to serve as the core sketch in multiple application domains.
Subjects: Hong Kong Polytechnic University -- Dissertations
Electronic data processing
Big data
Pages: xv, 123 pages : illustrations
Appears in Collections:Thesis

Show full item record

Page views

3
Citations as of May 22, 2022

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.