Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical

Behera, Adarsh Prasad; Daubaris, Paulius; Bravo Aramburu, Iñaki; Gallego Delgado, José; Morabito, Roberto; Widmer, Joerg; Champati, Jaya Prakash

doi:10.1109/JIOT.2025.3583477

dc.contributor.author	Behera, Adarsh Prasad
dc.contributor.author	Daubaris, Paulius
dc.contributor.author	Bravo Aramburu, Iñaki
dc.contributor.author	Gallego Delgado, José
dc.contributor.author	Morabito, Roberto
dc.contributor.author	Widmer, Joerg
dc.contributor.author	Champati, Jaya Prakash
dc.date.accessioned	2025-09-04T15:52:25Z
dc.date.available	2025-09-04T15:52:25Z
dc.date.issued	2025-07
dc.identifier.uri	https://hdl.handle.net/20.500.12761/1958
dc.description.abstract	On-device inference offers significant benefits in edge ML systems, such as improved energy efficiency, responsiveness, and privacy, compared to traditional centralized approaches. However, the resource constraints of embedded devices limit their use to simple inference tasks, creating a trade-off between efficiency and capability. In this context, the Hierarchical Inference (HI) system has emerged as a promising solution that augments the capabilities of the local ML by offloading selected samples to an edge server/cloud for remote ML inference. Existing works, primarily based on simulations, demonstrate that HI improves accuracy. However, they fail to account for the latency and energy consumption in real-world deployments, nor do they consider three key heterogeneous components that characterize ML-enabled IoT systems: hardware, network connectivity, and models. To bridge this gap, this paper systematically evaluates HI against standalone on-device inference by analyzing accuracy, latency, and energy trade-offs across five devices and three image classification datasets. Our findings show that, for a given accuracy requirement, the HI approach we designed achieved up to 73% lower latency and up to 77% lower device energy consumption than an on-device inference system. Despite these gains, HI introduces a fixed energy and latency overhead from on-device inference for all samples. To address this, we propose a hybrid system called Early Exit with HI (EE-HI) and demonstrate that, compared to HI, EE-HI reduces the latency up to 59.7% and lowers the device’s energy consumption up to 60.4%. These findings demonstrate the potential of HI and EE-HI to enable more efficient ML in IoT systems.	es
dc.description.sponsorship	Ministerio de Asuntos Económicos y Transformación Digital	es
dc.description.sponsorship	European Union	es
dc.description.sponsorship	Ministerio de Trabajo y Economía Social	es
dc.language.iso	eng	es
dc.publisher	IEEE	es
dc.title	Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical	es
dc.type	journal article	es
dc.journal.title	IEEE Internet of Things Journal	es
dc.rights.accessRights	open access	es
dc.identifier.doi	10.1109/JIOT.2025.3583477	es
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020-MSCA-2021-PF-01/101062011	es
dc.relation.projectName	RISC-6G (Reconfigurable Intelligent Surfaces and Low-power Technologies for Communication and Sensing in 6G Mobile Networks)	es
dc.relation.projectName	MAP-6G (Machine Learning-based Privacy Preserving Analytics for 6G Mobile Networks)	es
dc.relation.projectName	DIME (Distributed Inference for Energy-efficient Monitoring at the Network Edge Note)	es
dc.relation.projectName	Programa Investigo	es
dc.description.refereed	TRUE	es
dc.description.status	pub	es

Ficheros en el ítem

Nombre:: Exploring_the_Boundaries_of_On ...
Tamaño:: 5.087Mb
Formato:: PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

IMDEA Networks

Mostrar el registro sencillo del ítem