Towards a semi-automatic classifier of malware through tweets for early warning threat detection

From Firenze University Press Journal: JLIS.it, an Italian journal of Library Science, Archival Science and Information Science

3 min readOct 1, 2024

Claudia Lanza, University of Calabria

Lorenzo Lodi, Zanasi & Partners

This paper will describe a preliminary study on a method to detect new data about malware and structure them in an ontology model. The ontology represents a means through which is possible to build a classifier able to structurally organize the knowledge behind the malicious events within the cyber-sphere. The novelty brought from our approach can be envisaged in the source documentation taken into account to construct the malware classifier. More specifically, not only the set of documents considered, i.e., tweets, but also the techniques applied to retrieve the information about the malware can constitute the originality of this work. In detail, we propose an ap-proach which, through the Natural Language Processing (NLP) tasks over the normalized group of tweets is meant to systematize the informative set of obtained data into an ontology framework. The ontology represents the classifier created in a semi-automatic way and can help cyber analysts in creating a conceptual structure to infer knowledge about malicious events as well as in support-ing malware triaging operations from a semantic point of view.

Related Works

The detection of new events from Twitter represents a common research branch and usually is focused on the interpretation of tweets’ content from a topic-based approach, as Twitter Stand platform created at Maryland University (Sankaranarayanan 2009) shows by capturing the late breaking news from tweets becoming popular topics per each country. Regarding the cyber threats detection from Twitter, Gaglio (2015) proposed an extension of Soft Frequent Pattern Mining (SFPM) through an improved topic detection algorithm with the presentation of Twitter Live Detection Framework (TLDF) able to face the new incoming data from a topic detection perspective. Cordeiro (2012) presented a work on topic inference events from the social platform by using the Latent Dirichlet Allocation topic inference model based on Gibbs Sampling. Concone et al. (2017) also proposed a methodology to detect, and give an alert on, new malware using the data coming from reliable Twitter’s subscribers by means of a Bayes naıve classifier. Specifically, they worked with the “Bayes classifier trained on a set of tweets containing an equal number of i) events related to security attacks, viruses, malware, and ii) generic messages”, and realized “groups of tweets discussing the same topic, e.g, a new malware infection, are summarized in order to produce an alert”. The authoritativeness of users selected by the authors has been based on an “influence metric” which links the users’ interaction with the community in terms of retweets, feelings, answers and number of likes. Another study covering cyber threat detection from Twitter is that of Sabottke (2015), where the authors specifically refer to the exploit detection by creating a Twitter-based exploit detector. This system detects on Twitter the use of exploits against known vulnerabilities by looking within the tweets containing texts mentioning vulnerabilities and com-paring, as ground truth, to CVE IDs as well as ExploitDB and classifying them using the SVM classifier.Given the increase in the variants of malware, a resource able to analyze similarities and gather these features as informative elements in a classification structure becomes a valid means for the enhancement of cybersecurity predictive actions. In the literature review, malware classification has been considered as an urgent and evolving study to foster and a wide range of techniques has been proposed within the scientific community. The most common way to identify increasingly complicated malware is signature-based, (Akhtar and Feng 2022) offer a literary review of new machine learning based techniques which aim at analyzing the efficacy of those approaches in the identification of Polymorphic malwares.

DOI: https://doi.org/10.36253/jlis.it-591

Read Full Text: https://www.jlis.it/index.php/jlis/article/view/591

Towards a semi-automatic classifier of malware through tweets for early warning threat detection

From Firenze University Press Journal: JLIS.it, an Italian journal of Library Science, Archival Science and Information Science

Written by University of Florence

No responses yet