Performance and Explainability of Feature Selection-Boosted Tree-based Classifiers for COVID-19 Detection

Rufino, Jesús; Ramírez, Juan Marcos; Aguilar, Jose; Baquero, Carlos; Champati, Jaya Prakash; Frey, Davide; Lillo, Rosa Elvira; Fernández Anta, Antonio

doi:10.1016/j.heliyon.2023.e23219

dc.contributor.author	Rufino, Jesús
dc.contributor.author	Ramírez, Juan Marcos
dc.contributor.author	Aguilar, Jose
dc.contributor.author	Baquero, Carlos
dc.contributor.author	Champati, Jaya Prakash
dc.contributor.author	Frey, Davide
dc.contributor.author	Lillo, Rosa Elvira
dc.contributor.author	Fernández Anta, Antonio
dc.date.accessioned	2023-12-12T13:11:06Z
dc.date.available	2023-12-12T13:11:06Z
dc.date.issued	2023-12-07
dc.identifier.uri	https://hdl.handle.net/20.500.12761/1764
dc.description.abstract	In this paper, we evaluate the performance and analyze the explainability of machine learning models boosted by feature selection in predicting COVID-19-positive cases from self-reported information. In essence, this work describes a methodology to identify COVID-19 infections that considers the large amount of information collected by the University of Maryland Global COVID-19 Trends and Impact Survey (UMD-CTIS). More precisely, this methodology performs a feature selection stage based on the recursive feature elimination (RFE) method to reduce the number of input variables without compromising detection accuracy. A tree-based supervised machine learning model is then optimized with the selected features to detect COVID-19-active cases. In contrast to previous approaches that use a limited set of selected symptoms, the proposed approach builds the detection engine considering a broad range of features including self-reported symptoms, local community information, vaccination acceptance, and isolation measures, among others. To implement the methodology, three different supervised classifiers were used: random forests (RF), light gradient boosting (LGB), and extreme gradient boosting (XGB). Based on data collected from the UMD-CTIS, we evaluated the detection performance of the methodology for four countries (Brazil, Canada, Japan, and South Africa) and two periods (2020 and 2021). The proposed approach was assessed in terms of various quality metrics: F1-score, sensitivity, specificity, precision, receiver operating characteristic (ROC), and area under the ROC curve (AUC). This work also shows the normalized daily incidence curves obtained by the proposed approach for the four countries. Finally, we perform an explainability analysis using Shapley values and feature importance to determine the relevance of each feature and the corresponding contribution for each country and each country/year.	es
dc.description.sponsorship	Comunidad de Madrid	es
dc.description.sponsorship	Spanish Ministry of Science and Innovation	es
dc.description.sponsorship	European Union "Next Generation EU"	es
dc.description.sponsorship	Shapley Values	es
dc.language.iso	eng	es
dc.publisher	Elsevier	es
dc.title	Performance and Explainability of Feature Selection-Boosted Tree-based Classifiers for COVID-19 Detection	es
dc.type	journal article	es
dc.journal.title	Heliyon	es
dc.rights.accessRights	open access	es
dc.identifier.doi	10.1016/j.heliyon.2023.e23219	es
dc.relation.projectID	SocialProbing (TED2021-131264B-I00 )	es
dc.relation.projectID	COMODIN-CM ( REACT-COMODIN-CM-23459 )	es
dc.relation.projectName	CoronaSurveys-CM	es
dc.relation.projectName	COMODIN-CM	es
dc.relation.projectName	SocialProbing (Técnicas de análisis y recopilación de datos escalables y asequibles para el sondeo social)	es
dc.relation.projectName	PredCov-CM	es
dc.subject.keyword	COVID-19 Detection	es
dc.subject.keyword	Explainability Analysis	es
dc.subject.keyword	Gradient Boosting Classifiers	es
dc.subject.keyword	Random Forest	es
dc.subject.keyword	Recursive Feature Elimination	es
dc.description.refereed	TRUE	es
dc.description.status	pub	es

Ficheros en el ítem

Nombre:: main.pdf
Tamaño:: 11.27Mb
Formato:: PDF
Descripción:: Manuscript

Este ítem aparece en la(s) siguiente(s) colección(ones)

IMDEA Networks

Mostrar el registro sencillo del ítem