Mostrar el registro sencillo del ítem
IoC Stalker: Early detection of Indicators of Compromise
dc.contributor.author | Mischinger, Mariella | |
dc.contributor.author | Pastrana, Sergio | |
dc.contributor.author | Suarez-Tangil, Guillermo | |
dc.date.accessioned | 2025-01-13T16:18:27Z | |
dc.date.available | 2025-01-13T16:18:27Z | |
dc.date.issued | 2024-12 | |
dc.identifier.citation | [1] Z. Li and X. Liao, “Understanding and analyzing appraisal systems in the underground marketplaces.” in NDSS, 2024. [2] ´ A. Feal, P. Vallina, J. Gamba, S. Pastrana, A. Nappa, O. Hohlfeld, N. Vallina-Rodriguez, and J. Tapiador, “Blocklist babel: On the trans parency and dynamics of open source blocklisting,” IEEE Transactions on Network and Service Management, vol. 18, no. 2, pp. 1334–1349, 2021. [3] V. G. Li, M. Dunn, P. Pearce, D. McCoy, G. M. Voelker, and S. Savage, “Reading the tea leaves: A comparative analysis of threat intelligence,” in 28th USENIX security symposium (USENIX Security 19), 2019, pp. 851–867. [4] M. Motoyama, D. McCoy, K. Levchenko, S. Savage, and G. M. Voelker, “An analysis of underground forums,” in Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, 2011, pp. 71–80. [5] S. Pastrana, D. R. Thomas, A. Hutchings, and R. Clayton, “Crimebb: Enabling cybercrime research on underground forums at scale,” in Proceedings of the 2018 World Wide Web Conference, 2018, pp. 1845 1854. [6] E. Nunes, A. Diab, A. Gunn, E. Marin, V. Mishra, V. Paliath, J. Robert son, J. Shakarian, A. Thart, and P. Shakarian, “Darknet and deepnet mining for proactive cybersecurity threat intelligence,” in 2016 IEEE Conference on Intelligence and Security Informatics (ISI). IEEE, 2016, pp. 7–12. [7] L. Allodi, “Economic factors of vulnerability trade and exploitation,” in Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, 2017, pp. 1483–1499. [8] T. Paladini, L. Ferro, M. Polino, S. Zanero, and M. Carminati, “You might have known it earlier: Analyzing the role of underground fo rums in threat intelligence,” in Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses, 2024, pp. 368–383. [9] S. Pastrana, A. Hutchings, A. Caines, and P. Buttery, “Characterizing eve: Analysing cybercrime actors in a large underground forum,” in Research in Attacks, Intrusions, and Defenses: 21st International Sym posium, RAID 2018, Heraklion, Crete, Greece, September 10-12, 2018, Proceedings 21. Springer, 2018, pp. 207–227. [10] J. Hughes, S. Pastrana, A. Hutchings, S. Afroz, S. Samtani, W. Li, and E. Santana Marin, “The art of cybercrime community research,” ACM Computing Surveys, vol. 56, no. 6, pp. 1–26, 2024. [11] K. Thomas, D. Huang, D. Wang, E. Bursztein, C. Grier, T. J. Holt, C. Kruegel, D. McCoy, S. Savage, and G. Vigna, “Framing dependencies introduced by underground commoditization,” in Annual Workshop on the Economics of Information Security (WEIS), 2015. [12] A. De La Cruz Alvarado and S. Pastrana, “Understanding crypter-as-a service in a popular underground marketplace,” in 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), 2024, pp. 85–90. [13] J. Caballero, G. Gomez, S. Matic, G. S´anchez, S. Sebasti´ an, and A. Villaca˜nas, “The rise of goodfatr: A novel accuracy comparison methodology for indicator extraction tools,” Future Generation Com puter Systems, vol. 144, pp. 74–89, 2023. [14] Virustotal, “Virustotal,” https://www.virustotal.com/gui/home/upload, [Online] Last accessed: September, 21 2023. [15] X. Liao, K. Yuan, X. Wang, Z. Li, L. Xing, and R. Beyah, “Acing the ioc game: Toward automatic discovery and analysis of open-source cyber threat intelligence,” in Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, 2016, pp. 755–766. [16] R. Van Wegberg, S. Tajalizadehkhoob, K. Soska, U. Akyazi, C. H. Ganan, B. Klievink, N. Christin, and M. Van Eeten, “Plug and prey? measuring the commoditization of cybercrime via online anonymous markets,” in 27th USENIX security symposium (USENIX security 18), 2018, pp. 1009–1026. [17] D. Arp, E. Quiring, F. Pendlebury, A. Warnecke, F. Pierazzi, C. Wress negger, L. Cavallaro, and K. Rieck, “Dos and don’ts of machine learning in computer security,” in 31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 3971–3988. [18] N. Sun, M. Ding, J. Jiang, W. Xu, X. Mo, Y. Tai, and J. Zhang, “Cyber threat intelligence mining for proactive cybersecurity defense: A survey and new perspectives,” IEEE Communications Surveys & Tutorials, vol. 25, no. 3, pp. 1748–1774, 2023. [19] J. Gharibshah, T. C. Li, A. Castro, K. Pelechrinis, E. E. Papalexakis, and M. Faloutsos, “Mining actionable information from security forums: the case of malicious ip addresses,” From Security to Community Detection in Social Networking Platforms, pp. 193–211, 2019. [20] P. Dewan and P. Kumaraguru, “Towards automatic real time identifica tion of malicious posts on facebook,” in 2015 13th Annual Conference on Privacy, Security and Trust (PST), 2015, pp. 85–92. [21] C. Liu, L. Wang, B. Lang, and Y. Zhou, “Finding effective classifier for malicious url detection,” in Proceedings of the 2018 2nd International Conference on Management Engineering, Software Engineering and Service Sciences, ser. ICMSS 2018. New York, NY, USA: Association for Computing Machinery, 2018, p. 240–244. [Online]. Available: https://doi.org/10.1145/3180374.3181352 [22] F. Alkhudair, M. Alassaf, R. Ullah Khan, and S. Alfarraj, “Detecting malicious url,” in 2020 International Conference on Computing and Information Technology (ICCIT-1441), 2020, pp. 1–5. [23] R. Islam, B. Treves, M. O. F. Rokon, and M. Faloutsos, “Linkman: hyperlink-driven misbehavior detection in online security forums,” in Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ser. ASONAM ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 270–277. [Online]. Available: https://doi.org/10.1145/3487351. 3488323 [24] J. Chen, Z. Hu, and Z. Qian, “Research on malicious url detection based on random forest,” in 2022 14th International Conference on Computer Research and Development (ICCRD), 2022, pp. 30–36. [25] J. Gharibshah, T. C. Li, M. S. Vanrell, A. Castro, K. Pelechrinis, E. E. Papalexakis, and M. Faloutsos, “Inferip: Extracting actionable information from security discussion forums,” in Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, ser. ASONAM ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 301–304. [Online]. Available: https://doi.org/10.1145/3110025.3110055 [26] J. Gharibshah, E. E. Papalexakis, and M. Faloutsos, “Ripex: Extracting malicious ip addresses from security forums using cross-forum learning,” in Advances in Knowledge Discovery and Data Mining: 22nd Pacific Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part III 22. Springer, 2018, pp. 517–529. [27] J. Gharibshah and M. Faloutsos, “Extracting actionable information from security forums,” in Companion Proceedings of The 2019 World Wide Web Conference, ser. WWW ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 27–32. [Online]. Available: https://doi.org/10.1145/3308560.3314197 [28] A. Sapienza, A. Bessi, S. Damodaran, P. Shakarian, K. Lerman, and E. Ferrara, “Early warnings of cyber threats in online discussions,” in 2017 IEEE International Conference on Data Mining Workshops (ICDMW), 2017, pp. 667–674. [29] H. Shin, W. Shim, S. Kim, S. Lee, Y. G. Kang, and Y. H. Hwang, “#twiti: Social listening for threat intelligence,” ser. WWW ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 92–104. [Online]. Available: https://doi.org/10.1145/3442381.3449797 [30] K. Allix, T. F. Bissyand´ e, J. Klein, and Y. Le Traon, “Are your training datasets yet relevant? an investigation into the importance of timeline in machine learning-based malware detection,” in International Symposium on Engineering Secure Software and Systems. Springer, 2015, pp. 51 67. [31] K. Turk, S. Pastrana, and B. Collier, “A tight scrape: Methodological approaches to cybercrime research data collection in adversarial envi ronments,” in 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). IEEE, 2020, pp. 428–437. [32] M. Campobasso, P. Burda, and L. Allodi, “Caronte: crawling adversarial resources over non-trusted, high-profile environments,” in 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). IEEE, 2019, pp. 433–442. [33] Google, “What is cloud translation?” https://cloud.google. com/translate/docs/overview, [Online] Last accessed: October, 14 2022. [34] S. Pastrana and G. Suarez-Tangil, “A first look at the crypto-mining malware ecosystem: A decade of unrestricted wealth,” in Proceedings of the Internet Measurement Conference, 2019, pp. 73–86. [35] C. Smutz and A. Stavrou, “Malicious pdf detection using metadata and structural features,” in Proceedings of the 28th annual computer security applications conference, 2012, pp. 239–248. [36] K. Yuan, H. Lu, X. Liao, and X. Wang, “Reading thieves’ cant: auto matically identifying and understanding dark jargons from cybercrime marketplaces,” in 27th USENIX Security Symposium (USENIX Security 18), 2018, pp. 1027–1041. [37] D. Seyler, W. Liu, Y. Zhang, X. Wang, and C. Zhai, “Darkjargon. net: A platform for understanding underground conversation with latent mean ing,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 2526 2530. [38] Y. Jin, E. Jang, Y. Lee, S. Shin, and J.-W. Chung, “Shedding new light on the language of the dark web,” 4 2022. [Online]. Available: https://arxiv.org/abs/2204.06885v2 [39] L. Zhou, A. Caines, I. Pete, and A. Hutchings, “Automated hate speech detection and span extraction in underground hacking and extremist forums,” Natural Language Engineering, vol. 29, no. 5, pp. 1247–1274, 2023. [40] V. Ghafouri, V. Agarwal, Y. Zhang, N. Sastry, J. Such, and G. Suarez Tangil, “Ai in the gray: Exploring moderation policies in dialogic large language models vs. human answers in controversial topics,” in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, ser. CIKM ’23. New York, NY, USA: Association for Computing Machinery, 2023, p. 556–565. [Online]. Available: https://doi.org/10.1145/3583780.3614777 [41] H. Face, “all-mpnet-base-v2,” https://huggingface.co/sentence transformers/all-mpnet-base-v2, 07 2022. [42] L. McInnes and J. Healy, “Umap: Uniform manifold approximation and projection for dimension reduction,” 02 2018. [43] S. Zhu, J. Shi, L. Yang, B. Qin, Z. Zhang, L. Song, and G. Wang, “Measuring and modeling the label dynamics of online {Anti-Malware} engines,” in 29th USENIX Security Symposium (USENIX Security 20), 2020, pp. 2361–2378. [44] L. Yang, W. Guo, Q. Hao, A. Ciptadi, A. Ahmadzadeh, X. Xing, and G. Wang, “{CADE}: Detecting and explaining concept drift samples for security applications,” in 30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 2327–2344. [45] A. Guerra-Manzanares, M. Luckner, and H. Bahsi, “Android malware concept drift using system calls: detection, characterization and chal lenges,” Expert Systems with Applications, vol. 206, p. 117200, 2022. [46] S. T. Jan, Q. Hao, T. Hu, J. Pu, S. Oswal, G. Wang, and B. Viswanath, “Throwing darts in the dark? detecting bots with limited data using neural data augmentation,” in 2020 IEEE symposium on security and privacy (SP). IEEE, 2020, pp. 1190–1206. [47] C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, and A. Folleco, “An em pirical study of the classification performance of learners on imbalanced and noisy software quality data,” in 2007 IEEE International Conference on Information Reuse and Integration, 2007, pp. 651–658. [48] T. imbalanced-learn developers, “Randomoversampler,” https://imbalanced-learn.org/stable/references/generated/imblearn. over sampling.RandomOverSampler.html, 05 2024. [49] M. Lindorfer, M. Neugschwandtner, and C. Platzer, “Marvin: Efficient and comprehensive mobile app classification through static and dynamic analysis,” in 2015 IEEE 39th annual computer software and applications conference, vol. 2. IEEE, 2015, pp. 422–433. [50] S. H. Bach and M. A. Maloof, “Paired learners for concept drift,” in 2008 Eighth IEEE International Conference on Data Mining. IEEE, 2008, pp. 23–32. [51] B. Jin, E. Kim, H. Lee, E. Bertino, D. Kim, and H. Kim, “Sharing cyber threat intelligence: Does it really help?” 2024. [52] J. Caballero, C. Grier, C. Kreibich, and V. Paxson, in “Measuring Pay-per-Install: The commoditization of malware distribution,” 20th USENIX Security Symposium. San Francisco, CA: USENIX Association, Aug. 2011. [On line]. Available: https://www.usenix.org/conference/usenix-security 11/measuring-pay-install-commoditization-malware-distribution [53] J. R. Asl, P. Panzade, E. Blanco, D. Takabi, and Z. Cai, “Robustsen tembed: Robust sentence embeddings using adversarial self-supervised contrastive learning,” preprint arXiv:2403.11082, 2024. [54] R. Overdorf, C. Troncoso, R. Greenstadt, and D. McCoy, “Under the underground: Predicting private interactions in underground forums,” arXiv preprint arXiv:1805.04494, 2018. [55] J. Cabrero-Holgueras and S. Pastrana, “A methodology for large-scale identification of related accounts in underground forums,” Computers & Security, vol. 111, p. 102489, 2021. [56] C. Grier, K. Thomas, V. Paxson, and M. Zhang, “@spam: the underground on 140 characters or less,” in Proceedings of the 17th ACM Conference on Computer and Communications Security, ser. CCS ’10. New York, NY, USA: Association for Computing Machinery, 2010, p. 27–37. [Online]. Available: https://doi.org/10.1145/1866307.1866311 [57] A. H. Wang, “Don’t follow me: Spam detection in twitter,” in 2010 international conference on security and cryptography (SECRYPT). IEEE, 2010, pp. 1–10. [58] F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida, “Detecting spammers on twitter,” in Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), vol. 6, no. 2010, 2010, p. 12. [59] P. Yang, G. Zhao, and P. Zeng, “Phishing website detection based on multidimensional features driven by deep learning,” IEEE access, vol. 7, pp. 15196–15209, 2019. [60] O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, “Machine learning based phishing detection from urls,” Expert Systems with Applications, vol. 117, pp. 345–357, 2019. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0957417418306067 [61] M. Korkmaz, O. K. Sahingoz, and B. Diri, “Detection of phishing websites by using machine learning-based url analysis,” in 2020 11th In ternational Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE, 2020, pp. 1–7. [62] C. Do Xuan, H. D. Nguyen, and V. N. Tisenko, “Malicious url de tection based on machine learning,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 1, 2020. [63] M. Bitaab, H. Cho, A. Oest, Z. Lyu, W. Wang, J. Abraham, R. Wang, T. Bao, Y. Shoshitaishvili, and A. Doup´e, “Beyond phish: Toward detecting fraudulent e-commerce websites at scale,” in 2023 IEEE Symposium on Security and Privacy (SP), 2023, pp. 2566–2583. [64] B. Treves, M. R. Masud, and M. Faloutsos, “Urlytics: Profiling forum users from their posted urls,” in 2022 IEEE/ACM International Confer ence on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 2022, pp. 510–513. [65] M. Almukaynizi, E. Nunes, K. Dharaiya, M. Senguttuvan, J. Shakarian, and P. Shakarian, “Proactive identification of exploits in the wild through vulnerability mentions online,” in 2017 International Conference on Cyber Conflict (CyCon U.S.), 2017, pp. 82–88. [66] N. Arnold, M. Ebrahimi, N. Zhang, B. Lazarine, M. Patton, H. Chen, and S. Samtani, “Dark-net ecosystem cyber-threat intelligence (cti) tool,” in 2019 IEEE International Conference on Intelligence and Security Informatics (ISI), 2019, pp. 92–97. [67] I. Deliu, C. Leichter, and K. Franke, “Extracting cyber threat intelligence from hacker forums: Support vector machines versus convolutional neural networks,” in 2017 IEEE International Conference on Big Data (Big Data), 2017, pp. 3648–3656. [68] S. Samtani, R. Chinn, and H. Chen, “Exploring hacker assets in under ground forums,” in 2015 IEEE International Conference on Intelligence and Security Informatics (ISI), 2015, pp. 31–36. [69] J. Gharibshah, E. E. Papalexakis, and M. Faloutsos, “Rest: A thread embedding approach for identifying and classifying user-specified in formation in security forums,” in Proceedings of the International AAAI Conference on Web and Social Media, vol. 14, 2020, pp. 217–228. [70] D. Nevado-Catal´an, S. Pastrana, N. Vallina-Rodriguez, and J. Tapiador, “An analysis of fake social media engagement services,” Computers & Security, vol. 124, p. 103013, 2023. | es |
dc.identifier.uri | https://hdl.handle.net/20.500.12761/1890 | |
dc.description.abstract | Online underground forums are used by cybercriminals to share information and knowledge related to malicious activities. Participants exchange "Indicators of Compromise" (IoCs) within the discussions. These may include Hashes, Domains, URLs, or IPs with potential malicious intent. While Open Source Intelligence (OSINT) eventually identifies these malicious IoCs, it may take an extensive amount of time, sometimes up to years, before they are identified as threats. However, the context in which these IoCs appear, and the information provided through the posts' and authors' context can already offer valuable insights about their malicious nature. Unfortunately, the large amount of unstructured noisy forum data presents a hurdle for automation. In this paper, we address the challenge of automatically distinguishing between posts containing IoCs posing a threat and those being harmless. We design a learning pipeline that does not use features derived from IoCs, enabling a timely identification of novel threats. We operate over a temporal representation of forum data and offer valuable insights into the optimal time window that tracks concept drift. We also study which types of IoCs are harder to predict (e.g., IPs) and how transfer learning from other types can help to improve their identification. We conduct our analysis on a prominent hacking forum, spanning over 18 years of data, and find that our model can detect IoCs ≈490 days before they appear in OSINT. | es |
dc.language.iso | eng | es |
dc.title | IoC Stalker: Early detection of Indicators of Compromise | es |
dc.type | conference object | es |
dc.conference.date | 9-13 Dec 2024 | es |
dc.conference.title | Annual Computer Security Applications Conference | * |
dc.event.type | conference | es |
dc.pres.type | paper | es |
dc.type.hasVersion | AM | es |
dc.rights.accessRights | open access | es |
dc.acronym | ACSAC | * |
dc.rank | National: usa | * |
dc.description.refereed | TRUE | es |
dc.description.status | pub | es |