Ting Hua, Feng Chen, Liang Zhao, Chang-Tien Lu, Naren Ramakrishnan
Twitter has become an important data source for detecting events, especially tracking detailed information for events of a specific domain. Previous studies on targeted-domain Twitter information extraction have used supervised learning techniques to identify domain-related tweets, however, the need for extensive manual labeling makes these supervised systems extremely expensive to build and maintain. What’s more, most of these existing work fail to consider spatiotemporal factors, which are essential attributes of target-domain events. In this paper, we propose a semi-supervised method for Automatical Targeted-domain Spatiotemporal Event Detection (ATSED) in Twitter. Given a targeted domain, ATSED first learns tweet labels from historical data, and then detects on-going events from real-time Twitter data streams. Specifically, an efficient label generation algorithm is proposed to automatically recognize tweet labels from domain-related news articles, a customized classifier is created for Twitter data analysis by utilizing tweets’ distinguishing features, and a novel multinomial spatial-scan model is provided to identify geographical locations for detected events. Experiments on 305 million tweets demonstrated the effectiveness of this new approach.
- Date of publication:
- October 4, 2016
- Page number(s):
- Issue Number: