Check the preview of 2nd version of this platform being developed by the open MLCommons taskforce on automation and reproducibility as a free, open-source and technology-agnostic on-prem platform.

Hashtag Processing for Enhanced Clustering of Tweets

lib:4ff4b959abbade7a (v1.0.0)

Authors: Dagmar Gromann,Thierry Declerck
Where published: RANLP 2017 9
Document:  PDF  DOI 
Abstract URL: https://www.aclweb.org/anthology/R17-1038/


Rich data provided by tweets have beenanalyzed, clustered, and explored in a variety of studies. Typically those studies focus on named entity recognition, entity linking, and entity disambiguation or clustering. Tweets and hashtags are generally analyzed on sentential or word level but not on a compositional level of concatenated words. We propose an approach for a closer analysis of compounds in hashtags, and in the long run also of other types of text sequences in tweets, in order to enhance the clustering of such text documents. Hashtags have been used before as primary topic indicators to cluster tweets, however, their segmentation and its effect on clustering results have not been investigated to the best of our knowledge. Our results with a standard dataset from the Text REtrieval Conference (TREC) show that segmented and harmonized hashtags positively impact effective clustering.

Relevant initiatives  

Related knowledge about this paper Reproduced results (crowd-benchmarking and competitions) Artifact and reproducibility checklists Common formats for research projects and shared artifacts Reproducibility initiatives

Comments  

Please log in to add your comments!
If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!