SkillVet: Automated Traceability Analysis of Amazon Alexa Skills

Edu, Jide; Ferrer-Aran, Xavier; Such, Jose; Suarez-Tangil, Guillermo

doi:10.1109/TDSC.2021.3129116

Ficheros

2103.02637.pdf (936.8Kb)

Identificadores

URI: http://hdl.handle.net/20.500.12761/1574

ISSN: 1941-0018

DOI: 10.1109/TDSC.2021.3129116

Metadatos

Mostrar el registro completo del ítem

Autor(es)

Edu, Jide; Ferrer-Aran, Xavier; Such, Jose; Suarez-Tangil, Guillermo

Fecha

2023

Resumen

Skills are essential components of Smart Personal Assistants (SPA). Their numbers have grown rapidly, dominated by a changing environment that has no clear business model. Skills can access personal information and this may pose a risk to users. However, there is little information about how this ecosystem works, let alone the tools to facilitate its study. In this paper, we present a systematic measurement of the Alexa skill ecosystem. We study developers practices, including how they collect and justify the need for sensitive information, by designing a methodology to identify over-privileged skills with broken privacy policies. We collect 199,295 skills and uncover that around 43% of the skills (and 50% of developers) that request permissions follow bad privacy practices, including (partially) broken data permissions traceability. To perform this kind of analysis at scale, we present SkillVet that leverages machine learning and natural language processing techniques, and generates high-accuracy prediction sets. We report several concerning practices, including how developers can bypass Alexas permission system through account linking and conversational skills, and offer recommendations on how to improve transparency, privacy and security. Resulting from the responsible disclosure we did, 13% of the reported issues no longer pose a threat at submission time.