• español
    • English
  • Login
  • español 
    • español
    • English
  • Tipos de Publicaciones
    • bookbook partconference objectdoctoral thesisjournal articlemagazinemaster thesispatenttechnical documentationtechnical report
Ver ítem 
  •   IMDEA Networks Principal
  • Ver ítem
  •   IMDEA Networks Principal
  • Ver ítem
JavaScript is disabled for your browser. Some features of this site may not work without it.

Lost in Translation: Analyzing Non-English Cybercrime Forums

Compartir
Identificadores
URI: https://hdl.handle.net/20.500.12761/2008
Metadatos
Mostrar el registro completo del ítem
Autor(es)
Mischinger, Mariella; Hughes, Jack; Vitiugin, Fedor; Pastrana, Sergio; Hutchings, Alice; Suarez-Tangil, Guillermo
Fecha
2025-11
Resumen
Cybercrime analysis and Cyber Threat Intelligence are crucial for understanding and defending against cyber threats, with online underground communities serving as a key source of information. Classification tasks are popular but demand significant manual effort and language-specific expertise. Prior work focuses on English-language forums, as non-English languages require fluent domain experts. We evaluate machine translation tools for suitability in preserving contextual information in posts and find GPT-4 is most reliable. We leverage existing underground forum post classification pipelines to compare their performance on translated text and original language text. We find classification performed on translated underground forum data is as effective as on original language text, enabling researchers to reuse existing pipelines. Finally, we investigate a fully machine-generated few-shot and zero-shot classification to reduce reliance on manual labeling, followed by a two-step machine-based classification, combining machine-generated labels with the existing classification pipeline. We find machine-based labeling causes errors to propagate downstream. For tasks requiring high-quality label creation, human expertise remains essential. Finally, we provide a qualitative evaluation of disagreements in annotator labels of the original language and the translations, as well as disagreements between annotators and machine labeling.
Compartir
Identificadores
URI: https://hdl.handle.net/20.500.12761/2008
Metadatos
Mostrar el registro completo del ítem

Listar

Todo IMDEA NetworksPor fecha de publicaciónAutoresTítulosPalabras claveTipos de contenido

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Difusión

emailContacto person Directorio wifi Eduroam rss_feed Noticias
Iniciativa IMDEA Sobre IMDEA Networks Organización Memorias anuales Transparencia
Síguenos en:
Comunidad de Madrid

UNIÓN EUROPEA

Fondo Social Europeo

UNIÓN EUROPEA

Fondo Europeo de Desarrollo Regional

UNIÓN EUROPEA

Fondos Estructurales y de Inversión Europeos

© 2021 IMDEA Networks. | Declaración de accesibilidad | Política de Privacidad | Aviso legal | Política de Cookies - Valoramos su privacidad: ¡este sitio no utiliza cookies!