Towards Improving Code Stylometry Analysis in Underground Forums

Tereszkowski-Kaminski, Michal; Pastrana, Sergio; Blasco, Jorge; Suarez-Tangil, Guillermo

dc.contributor.author	Tereszkowski-Kaminski, Michal
dc.contributor.author	Pastrana, Sergio
dc.contributor.author	Blasco, Jorge
dc.contributor.author	Suarez-Tangil, Guillermo
dc.date.accessioned	2021-10-11T14:48:42Z
dc.date.available	2021-10-11T14:48:42Z
dc.date.issued	2022-07-18
dc.identifier.uri	http://hdl.handle.net/20.500.12761/1523
dc.description.abstract	Code Stylometry has emerged as a powerful mechanism to identify programmers. While there have been significant advances in the field, existing mechanisms underperform in challenging domains. One such domain is studying the provenance of code shared in underground forums, where code posts tend to have small or incomplete source code fragments. This paper proposes a method designed to deal with the idiosyncrasies of code snippets shared in these forums. Our system fuses a forum-specific learning pipeline with Conformal Prediction to generate predictions with precise confidence levels as a novelty. We see that identifying unreliable code snippets is paramount to generate high accuracy predictions, and this is a task where traditional learning settings fail. Overall, our method performs as twice as well as the state-of-the-art in a constrained setting with a large number of authors (i.e., 100). When dealing with a smaller number of authors (i.e., 20), it performs at high accuracy (89%). We also evaluate our work on an open-world assumption and see that our method is more effective at retaining samples.	es
dc.language.iso	eng	es
dc.title	Towards Improving Code Stylometry Analysis in Underground Forums	es
dc.type	conference object	es
dc.conference.date	18-23 July 2022
dc.conference.place	Sydney, Australia
dc.conference.title	Proceedings on Privacy Enhancing Technologies (PETS)	*
dc.event.type	conference	es
dc.pres.type	paper	es
dc.type.hasVersion	AM	es
dc.rights.accessRights	open access	es
dc.description.refereed	TRUE	es
dc.description.status	pub	es

Files in this item

Name:: 2022pets-attribution-uf.pdf
Size:: 1.033Mb
Format:: PDF

This item appears in the following Collection(s)

IMDEA Networks

Show simple item record