Detecting Cyber-Related Discussions in Online Social Platforms

lib:321bb3b4493986e0 (v1.0.0)

Authors: Ruth Ikwu,Panos Louisvieris
ArXiv: 1907.02383
Document:  PDF  DOI 
Abstract URL: https://arxiv.org/abs/1907.02383v1


As the use of social platforms continues to evolve, in areas such as cyber-security and defence, it has become imperative to develop adaptive methods for tracking, identifying and investigating cyber-related activities on these platforms. This paper introduces a new approach for detecting cyber-related discussions in online social platforms using a candidate set of terms that are representative of the cyber domain. The objective of this paper is to create a cyber lexicon with cyber-related terms that is applicable to the automatic detection of cyber activities across various online platforms. The method presented in this paper applies natural language processing techniques to representative data from multiple social platform types such as Reddit, Stack overflow, twitter and cyberwar news to extract candidate terms for a generic cyber lexicon. In selecting the candidate terms, we introduce the APMIS Aggregated Pointwise Mutual Information Score in comparison with the Term Frequency-Term Degree Ratio (FDR Score) and Term Frequency-Inverse Document Frequency Score (TF-IDF Score). These scoring mechanisms are robust to account for term frequency, term relevance and mutual dependence between terms. Finally, we evaluate the performance of the cyber lexicon by measuring its precision of in classifying discussions as 'Cyber-Related' or 'Non-Cyber-Related'.

Relevant initiatives  

Related knowledge about this paper Reproduced results (crowd-benchmarking and competitions) Artifact and reproducibility checklists Crowd-benchmarking tools Reproducibility initiatives

Comments  

Please log in to add your comments!
If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!