Anytime concurrent clustering of multiple streams with an indexing tree

Razavi Hesabi, Z, Sellis, T and Zhang, X 2015, 'Anytime concurrent clustering of multiple streams with an indexing tree', in Wei Fan, Albert Bifet, Qiang Yang, Philip S. Yu (ed.) Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, Sydney, Australia, 10 August 2015, pp. 19-32.


Document type: Conference Paper
Collection: Conference Papers

Attached Files
Name Description MIMEType Size
n2006055735.pdf Published Version application/pdf 699.62KB
Title Anytime concurrent clustering of multiple streams with an indexing tree
Author(s) Razavi Hesabi, Z
Sellis, T
Zhang, X
Year 2015
Conference name BigMine 2015: Volume 41
Conference location Sydney, Australia
Conference dates 10 August 2015
Proceedings title Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Editor(s) Wei Fan, Albert Bifet, Qiang Yang, Philip S. Yu
Publisher Microtome Publishing
Place of publication Cambridge, Massachusetts, United States
Start page 19
End page 32
Total pages 14
Abstract With the advancement of data generation technologies such as sensor networks, multiple data streams are continuously generated. Clustering multiple data streams is challenging as the requirement of clustering at anytime becomes more critical. We aim to cluster multiple data streams concurrently and in this paper we report our work in progress. ClusTree is an anytime clustering algorithm for a single stream. It uses a hierarchical tree structure to index micro-clusters, which are summary statistics for streaming data objects. We design a dynamic, concurrent indexing tree structure that extends the ClusTree structure to achieve more granular micro clusters (summaries) of multiple streams at any time. We devised algorithms to search, expand and update the hierarchical tree structure of storing micro clusters concurrently, along with an algorithm for anytime concurrent clustering of multiple streams. As this is work in progress, we plan to test our proposed algorithms, on sensor data sets, and evaluate the space and time complexity of creating and accessing micro-clusters. We will also evaluate the quality of clustering in terms of number of created clusters and compare our technique with other approaches. Keywords: Distributed data mining, clustering, stream mining, parallel processing
Subjects Pattern Recognition and Data Mining
Keyword(s) Distributed data mining
Clustering
Stream mining
Parallel processing
Copyright notice © 2015 Z.R. Hesabi, T. Sellis, X. Zhang
ISSN 1938-7228
Versions
Version Filter Type
Access Statistics: 142 Abstract Views, 13 File Downloads  -  Detailed Statistics
Created: Thu, 07 Jul 2016, 12:45:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us