Mostrar el registro sencillo del ítem

dc.contributor.authorTereszkowski-Kaminski, Michal
dc.contributor.authorPastrana, Sergio
dc.contributor.authorBlasco, Jorge
dc.contributor.authorSuarez-Tangil, Guillermo 
dc.date.accessioned2021-10-11T14:48:42Z
dc.date.available2021-10-11T14:48:42Z
dc.date.issued2022
dc.identifier.urihttp://hdl.handle.net/20.500.12761/1523
dc.description.abstractCode Stylometry has emerged as a powerful mechanism to identify programmers. While there have been significant advances in the field, existing mechanisms underperform in challenging domains. One such domain is studying the provenance of code shared in underground forums, where code posts tend to have small or incomplete source code fragments. This paper proposes a method designed to deal with the idiosyncrasies of code snippets shared in these forums. Our system fuses a forum-specific learning pipeline with Conformal Prediction to generate predictions with precise confidence levels as a novelty. We see that identifying unreliable code snippets is paramount to generate high accuracy predictions, and this is a task where traditional learning settings fail. Overall, our method performs as twice as well as the state-of-the-art in a constrained setting with a large number of authors (i.e., 100). When dealing with a smaller number of authors (i.e., 20), it performs at high accuracy (89%). We also evaluate our work on an open-world assumption and see that our method is more effective at retaining samples.es
dc.language.isoenges
dc.titleTowards Improving Code Stylometry Analysis in Underground Forumses
dc.typeconference objectes
dc.conference.titleProceedings on Privacy Enhancing Technologies (PETS)*
dc.event.typeconferencees
dc.pres.typepaperes
dc.type.hasVersionAMes
dc.rights.accessRightsopen accesses
dc.description.refereedTRUEes
dc.description.statusinpresses


Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem