• español
    • English
  • Login
  • español 
    • español
    • English
  • Tipos de Publicaciones
    • bookbook partconference objectdoctoral thesisjournal articlemagazinemaster thesispatenttechnical documentationtechnical report
Ver ítem 
  •   IMDEA Networks Principal
  • Ver ítem
  •   IMDEA Networks Principal
  • Ver ítem
JavaScript is disabled for your browser. Some features of this site may not work without it.

Performance and Explainability of Feature Selection-Boosted Tree-based Classifiers for COVID-19 Detection

Compartir
Ficheros
Manuscript (11.27Mb)
Identificadores
URI: https://hdl.handle.net/20.500.12761/1764
DOI: 10.1016/j.heliyon.2023.e23219
Metadatos
Mostrar el registro completo del ítem
Autor(es)
Rufino, Jesús; Ramirez, Juan Marcos; Aguilar, Jose; Baquero, Carlos; Champati, Jaya Prakash; Frey, Davide; Lillo, Rosa Elvira; Fernández Anta, Antonio
Fecha
2023-12-07
Resumen
In this paper, we evaluate the performance and analyze the explainability of machine learning models boosted by feature selection in predicting COVID-19-positive cases from self-reported information. In essence, this work describes a methodology to identify COVID-19 infections that considers the large amount of information collected by the University of Maryland Global COVID-19 Trends and Impact Survey (UMD-CTIS). More precisely, this methodology performs a feature selection stage based on the recursive feature elimination (RFE) method to reduce the number of input variables without compromising detection accuracy. A tree-based supervised machine learning model is then optimized with the selected features to detect COVID-19-active cases. In contrast to previous approaches that use a limited set of selected symptoms, the proposed approach builds the detection engine considering a broad range of features including self-reported symptoms, local community information, vaccination acceptance, and isolation measures, among others. To implement the methodology, three different supervised classifiers were used: random forests (RF), light gradient boosting (LGB), and extreme gradient boosting (XGB). Based on data collected from the UMD-CTIS, we evaluated the detection performance of the methodology for four countries (Brazil, Canada, Japan, and South Africa) and two periods (2020 and 2021). The proposed approach was assessed in terms of various quality metrics: F1-score, sensitivity, specificity, precision, receiver operating characteristic (ROC), and area under the ROC curve (AUC). This work also shows the normalized daily incidence curves obtained by the proposed approach for the four countries. Finally, we perform an explainability analysis using Shapley values and feature importance to determine the relevance of each feature and the corresponding contribution for each country and each country/year.
Compartir
Ficheros
Manuscript (11.27Mb)
Identificadores
URI: https://hdl.handle.net/20.500.12761/1764
DOI: 10.1016/j.heliyon.2023.e23219
Metadatos
Mostrar el registro completo del ítem

Listar

Todo IMDEA NetworksPor fecha de publicaciónAutoresTítulosPalabras claveTipos de contenido

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Difusión

emailContacto person Directorio wifi Eduroam rss_feed Noticias
Iniciativa IMDEA Sobre IMDEA Networks Organización Memorias anuales Transparencia
Síguenos en:
Comunidad de Madrid

UNIÓN EUROPEA

Fondo Social Europeo

UNIÓN EUROPEA

Fondo Europeo de Desarrollo Regional

UNIÓN EUROPEA

Fondos Estructurales y de Inversión Europeos

© 2021 IMDEA Networks. | Declaración de accesibilidad | Política de Privacidad | Aviso legal | Política de Cookies - Valoramos su privacidad: ¡este sitio no utiliza cookies!