CDS: Collaborative distant supervision for Twitter account classification

Cui, L, Zhang, X, Qin, A, Sellis, T and Wu, L 2017, 'CDS: Collaborative distant supervision for Twitter account classification', Expert Systems with Applications, vol. 83, pp. 94-103.

Document type: Journal Article
Collection: Journal Articles

Title CDS: Collaborative distant supervision for Twitter account classification
Author(s) Cui, L
Zhang, X
Qin, A
Sellis, T
Wu, L
Year 2017
Journal name Expert Systems with Applications
Volume number 83
Start page 94
End page 103
Total pages 10
Publisher Elsevier
Abstract Individuals use Twitter for personal communication, whereas businesses, politicians and celebrities use Twitter for branding purposes. Distinguishing Personal from Branding Twitter accounts is important for Twitter analytics. Existing studies of Twitter account classification apply classical supervised learning, which requires intensive manual annotation for training. In this paper, we propose CDS (Collaborative Distant Supervision), a novel learning scheme for Twitter account classification that does not require intensive manual labelling. Twitter accounts are automatically labelled using heuristics for distant supervision learning. To achieve effective learning from heuristic labels, active learning is applied to identify and correct false positive labels, and semi-supervised learning is applied to further use false negatives missed by labelling heuristics for learning. Extensive experiments on Twitter data showed that CDS achieved high classification accuracy.
Subject Pattern Recognition and Data Mining
Keyword(s) Active learning
Distant supervision
Semi-supervised learning
DOI - identifier 10.1016/j.eswa.2017.03.075
Copyright notice © 2017 Elsevier Ltd
ISSN 0957-4174
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 4 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 1 times in Scopus Article | Citations
Altmetric details:
Access Statistics: 130 Abstract Views  -  Detailed Statistics
Created: Tue, 05 Sep 2017, 13:22:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us