Discovering filter keywords for company name disambiguation in twitter

Spina, D, Gonzalo, J and Amigó, E 2013, 'Discovering filter keywords for company name disambiguation in twitter', Expert Systems with Applications, vol. 40, no. 12, pp. 4986-5003.


Document type: Journal Article
Collection: Journal Articles

Title Discovering filter keywords for company name disambiguation in twitter
Author(s) Spina, D
Gonzalo, J
Amigó, E
Year 2013
Journal name Expert Systems with Applications
Volume number 40
Issue number 12
Start page 4986
End page 5003
Total pages 18
Publisher Elsevier
Abstract A major problem in monitoring the online reputation of companies, brands, and other entities is that entity names are often ambiguous (apple may refer to the company, the fruit, the sin ger, etc.). The prob- lem is particularly hard in microblogging services such as Twitter, where texts are very short and there is little context to disambiguate.In this paper we address the filtering task of determining, out of a set of tweets that contain a company name, which ones do refer to the company.Our approach relies on the identification of filter keywords : those whose presence in a tweet reliably confirm(positive keywords) or discard (negative keywords) that the tweet refers to the company. We describe an algorithm to extract filter keywords that does not use any previously annotated data about the target company. The algorithm allows to classify 58% of the tweets with 75% accuracy; and those can be used to feed a machine learning algorithm to obtain a complete classification of all tweets with an overall accuracy of 73%. In comparison, a 10-fold validation of the same machine learning algo- rithm provides an accuracy of 85%, i.e., our unsupervised algorithm has a 14% loss with respect to its supervised counterpart. Our study also shows that (i) filter keywords for Twitter does not directly derive from the public in for- mation about the company in the Web: a manual selection of keywords from relevant web sources only covers 15% of the tweets with 86% accuracy;(ii) filter keywords can indeed be a productive way of clas- sifying tweets: the five best possible keywords cover, in average,28% of the tweets for acompany inour test collection.
Subject Information Systems not elsewhere classified
Information Retrieval and Web Search
Keyword(s) Filtering
Name disambiguation
Online reputation management
Twitter
DOI - identifier 10.1016/j.eswa.2013.03.001
Copyright notice © 2013 Elsevier Ltd. All rights reserved.
ISSN 0957-4174
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 14 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 0 times in Scopus Article
Altmetric details:
Access Statistics: 9 Abstract Views  -  Detailed Statistics
Created: Thu, 21 Feb 2019, 12:10:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us