Efficient frequent pattern mining on web logs

Sun, L and Zhang, X 2004, 'Efficient frequent pattern mining on web logs', in J. Yu et al. (ed.) Advanced Web Technologies and Applications: Sixth Asia-Pacific Web Conference, APWeb 2004, Hangzhou, China, 15 March 2004, pp. 533-542.


Document type: Conference Paper
Collection: Conference Papers

Attached Files
Name Description MIMEType Size
n2004000392.pdf Accepted manuscript application/pdf 202.76KB
Title Efficient frequent pattern mining on web logs
Author(s) Sun, L
Zhang, X
Year 2004
Conference name Asia-Pacific Web Conference on Advanced Web Technologies and Applications
Conference location Hangzhou, China
Conference dates 15 March 2004
Proceedings title Advanced Web Technologies and Applications: Sixth Asia-Pacific Web Conference, APWeb 2004
Editor(s) J. Yu et al.
Publisher Springer
Place of publication Berlin, Germany
Start page 533
End page 542
Total pages 10
Abstract Mining frequent patterns from Web logs is an important data mining task. Candidate-generation-and-test and pattern-growth are two representative frequent pattern mining approaches. We have conducted extensive experiments on real world Web log data to analyse the characteristics of Web logs and the behaviours of these two approaches on Web logs. To improve the performance of current algorithms on mining Web logs, we propose a new algorithm - Combined Frequent Pattern Mining (CFPM) to cater for Web log data specifically. We use heuristics to prune search space and reduce costs in mining so that better efficiency is achieved. Experimental results show that CFPM significantly improves the performance of the pattern-growth approach by 1.2-7.8 times on mining frequent patterns from Web logs. Mining frequent patterns from Web logs is an important data mining task. Candidate-generation-and-test and pattern-growth are two representative frequent pattern mining approaches. We have conducted extensive experiments on real world Web log data to analyse the characteristics of Web logs and the behaviours of these two approaches on Web logs. To improve the performance of current algorithms on mining Web logs, we propose a new algorithm - Combined Frequent Pattern Mining (CFPM) to cater for Web log data specifically. We use heuristics to prune search space and reduce costs in mining so that better efficiency is achieved. Experimental results show that CFPM significantly improves the performance of the pattern-growth approach by 1.2-7.8 times on mining frequent patterns from Web logs.
Subjects Information and Computing Sciences not elsewhere classified
Keyword(s) data mining
efficiency
frequent patterns
pattern mining
DOI - identifier 10.1007/b96838
Copyright notice © Springer-Verlag Berlin Heidelberg 2004
ISBN 978-3-540-21371-0
Versions
Version Filter Type
Altmetric details:
Access Statistics: 295 Abstract Views, 1440 File Downloads  -  Detailed Statistics
Created: Wed, 08 Apr 2009, 09:42:32 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us