Show simple item record

dc.contributor.authorTereszkowski-Kaminski, Michal 
dc.contributor.authorPastrana, Sergio
dc.contributor.authorBlasco, Jorge
dc.contributor.authorSuarez-Tangil, Guillermo 
dc.date.accessioned2021-10-11T14:48:42Z
dc.date.available2021-10-11T14:48:42Z
dc.date.issued2022-07-18
dc.identifier.urihttp://hdl.handle.net/20.500.12761/1523
dc.description.abstractCode Stylometry has emerged as a powerful mechanism to identify programmers. While there have been significant advances in the field, existing mechanisms underperform in challenging domains. One such domain is studying the provenance of code shared in underground forums, where code posts tend to have small or incomplete source code fragments. This paper proposes a method designed to deal with the idiosyncrasies of code snippets shared in these forums. Our system fuses a forum-specific learning pipeline with Conformal Prediction to generate predictions with precise confidence levels as a novelty. We see that identifying unreliable code snippets is paramount to generate high accuracy predictions, and this is a task where traditional learning settings fail. Overall, our method performs as twice as well as the state-of-the-art in a constrained setting with a large number of authors (i.e., 100). When dealing with a smaller number of authors (i.e., 20), it performs at high accuracy (89%). We also evaluate our work on an open-world assumption and see that our method is more effective at retaining samples.es
dc.language.isoenges
dc.titleTowards Improving Code Stylometry Analysis in Underground Forumses
dc.typeconference objectes
dc.conference.date18-23 July 2022
dc.conference.placeSydney, Australia
dc.conference.titleProceedings on Privacy Enhancing Technologies (PETS)*
dc.event.typeconferencees
dc.pres.typepaperes
dc.type.hasVersionAMes
dc.rights.accessRightsopen accesses
dc.description.refereedTRUEes
dc.description.statuspubes


Files in this item

This item appears in the following Collection(s)

Show simple item record