dc.description.abstract | Hierarchical Inference (HI) has emerged as a promising approach for efficient distributed inference between end devices deployed with small pre-trained Deep Learning (DL) models and edge/cloud servers running large DL models. Under HI, a device uses the local DL model to perform inference on the data samples it collects, and only the data samples on which this inference is likely to be incorrect are offloaded to a remote DL model running on the server. Thus, gauging the likelihood of incorrect local inference is key to implementing HI. A natural approach is to compute a confidence metric for the local DL inference and then use a threshold on this confidence metric to determine whether to offload or not. Recently, the HI online learning problem was studied to learn an optimal threshold for the confidence metric over a sequence of data samples collected over time. However, existing algorithms have computation complexity that grows with the number of rounds and do not exhibit a sub-linear regret bound. In this work, we propose the Hedge-HI algorithm and prove that it has $O\left(T^\frac{2}{3}\E_\Z[N_T]^\frac{1}{3}\right)$ regret, where $T$ is the number of rounds, and $N_T$ is the number of distinct confidence metric values observed till round $T$. Further, under a mild assumption, we propose Hedge-HI-Restart, which has an $O\left(T^\frac{2}{3}\log (\E_\Z[ N_T])^\frac{1}{3}\right)$ regret bound with high probability and has a much lower computation complexity that grows sub-linearly in the number of rounds. Using runtime measurements on Raspberry Pi, we demonstrate that Hedge-HI-Restart has a runtime lower by order of magnitude and achieves cumulative loss close to that of the alternatives. | es |