Approximate order-sensitive k-NN queries over correlated high-dimensional data

Gu, Y, Guo, Y, Song, Y, Zhou, X and Yu, G 2018, 'Approximate order-sensitive k-NN queries over correlated high-dimensional data', IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 11, pp. 2037-2050.

Document type: Journal Article
Collection: Journal Articles

Title Approximate order-sensitive k-NN queries over correlated high-dimensional data
Author(s) Gu, Y
Guo, Y
Song, Y
Zhou, X
Yu, G
Year 2018
Journal name IEEE Transactions on Knowledge and Data Engineering
Volume number 30
Issue number 11
Start page 2037
End page 2050
Total pages 14
Publisher IEEE
Abstract The k Nearest Neighbor (k-NN) query has been gaining more importance in extensive applications involving information retrieval, data mining and databases. Specifically, in order to trade off accuracy for efficiency, approximate solutions for the k-NN query are extensively explored. However, the precision is usually order-insensitive, which is defined on the result set instead of the result sequence. In many situations, it cannot reasonably reflect the query result quality. In this paper, we focus on the approximate k-NN query problem with the order-sensitive precision requirement and propose a novel scheme based on the projection-filter-refinement framework. Basically, we adopt PCA to project the high-dimensional data objects into the low-dimensional space. Then, a filter condition is inferred to execute efficient pruning over the projected data. In addition, an index strategy named OR-tree is proposed to reduce the I/O cost. The extensive experiments based on several real-world data sets and a synthetic data set are conducted to verify the effectiveness and efficiency of the proposed solution. Compared to the state-of-the-art methods, our method can support order-sensitive k-NN queries with higher result precision while retaining satisfactory CPU and I/O efficiency.
Subject Database Management
Keyword(s) approximate query
Data mining
Data preprocessing
Dimensionality reduction
Gaussian distribution
high-dimensional data
k-nearest neighbor query
Principal component analysis
DOI - identifier 10.1109/TKDE.2018.2812153
Copyright notice © 2018 IEEE
ISSN 1041-4347
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 0 times in Scopus Article
Altmetric details:
Access Statistics: 26 Abstract Views  -  Detailed Statistics
Created: Thu, 21 Feb 2019, 12:10:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us