Sharing Heterogeneous Computing Resources in Virtualized Open Radio Access Networks
Autor(es)
Lo Schiavo, LeonardoFecha
2025-01-28Resumen
Virtualization has recently become a fundamental paradigm in the implementation of 5G Networks, specifically in the context of Radio Access Networks (RANs). RAN virtualization enables baseband processing on general-purpose computing platforms, thus overcoming the coupling of RAN functions with dedicated hardware of traditional hardwired RANs. This approach allows RAN operators to disrupt traditional hardware vendor lock-ins and enable sharing and multiplexing of the available computing resources, as the RAN Base Station (BS) is disaggregated into minimal Radio Unit (RU) hardware connected to cloud-oriented computing platforms that run in software virtual signal processing tasks of the Distributed Unit (DU) and Centralized Unit (CU). However, the execution of such tasks in a timely manner at high probabilities (reliably) is challenging due to their computationally intensive nature, especially at DU level. For this purpose, carrier-grade virtualized RANs (vRANs) today rely on general-purpose computing platforms equipped with Hardware Accelerators (HAs), which are generally energy-hungry and monetarily expensive but can guarantee DU processing reliability. Traditional HAs include Application-Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs). Recently, Graphics Processing Units (GPUs) have also been considered as HAs, hinging on their unique capability to be easily programmable in software and to efficiently process Machine Learning (ML) workloads, whose algorithms can be used to automate and optimize DU operations. HAs also significantly increase the energy and monetary costs of vRANs, which puts at stake the environmental and economic sustainability of next-generation mobile networks. Therefore, deploying reliable hardware accelerated vRANs with the lowest possible energy toll while reducing deployment costs has become a challenging problem for operators. Indeed, current industrial solutions fail to provide energy and cost efficiency in vRAN, as respectively (i) more energy-efficient processors are shunned for DU tasks processing and (ii) dedicated HAs are assigned to individual DUs following an overprovisioned approach. This thesis investigates the deployment of energy- and cost-effective yet reliable vRANs in order to close the gap with respect to the aforementioned standard solutions. The vision set forth by this thesis is twofold: (i) increasing the energy-efficiency of traditional hardware accelerated vRANs by opportunistically complementing HAs with less energy-hungry computing processors like Central Processing Units (CPUs) to process DU tasks; (ii) improving the cost-efficiency by means of DU centralization, that allows to amortize the cost of each expensive HA by sharing the same HA across multiple DUs. In line with this vision, the first solution proposed in this thesis is ECORAN, an efficient multi-agent contextual bandit ML algorithm operating in the O-RAN Near Real-Time RAN Intelligent Controller (Near-RT-RIC). ECORAN configures policies to
opportunistically offload DU workloads to either a GPU-based HA or CPUs in the O-Cloud to save energy while preserving the reliability of the vRAN. To address cost-efficiency, instead, ECORAN applies concepts from mean field theory to be fully scalable and thus deal with an arbitrarily large and dynamic number of DUs, that are centralized in the same shared and HA-powered computing platform. Using traffic traces from a production mobile network, ECORAN can provide up to 40% energy savings and roughly up to 60x cost gains with respect to the standard approach used today by the industry. As with many other studies on RAN control in the literature, the offloading policy used in ECORAN is determined by an ML model that requires GPU resources for efficient training and execution in production. However, dedicating a GPU to each ML model that
automates a specific RAN control function and reserving a GPU-based HA solely for DU
processing is not a cost-efficient approach. Conversely, indiscriminately co-locating DU
workloads and multiple ML services can compromise the processing reliability of the
former and the throughput performance of the latter. Under this perspective, this thesis explores reliable multiplexing opportunities of the resources of a single GPU to further
squeeze cost-efficiency in vRANs. To this end, this thesis proposes YinYangRAN, an innovative system operating in the Non-Real-Time-RIC that supervises the multiplexing of the computing resources of a GPU-based HA as to ensure reliability in processing DU tasks while maximizing the throughput of a concurrent ML service running in the same GPU. Experiments performed with workloads collected in real RANs show that YinYangRAN can potentially reduce the deployment cost by a factor of N compared to the solution using N dedicated GPUs per process, and improve vRAN reliability by over 50% compared to hardware-accelerated vRANs using conventional GPU multiplexing
methods, with minimal impact on co-located ML workloads. Based on insights from tracking workload dynamics in real-world cells, it is observed that traffic exhibits burstiness at the Transmission Time Interval (TTI) level, i.e. with a timescale of 1 ms. However, YinYangRAN operates in Non-Real-Time (timescale≥ 1 s), and ECORAN operates in Near-Real-Time (timescale ∼ 10 − 100 ms). Consequently, to ensure reliability in the vRAN, both solutions adopt a conservative approach by configuring resources for the highest expected peak over the entire decision period, that is significantly longer than 1 ms. This approach leads to wasting resources for most of the decision period due to overprovisioning. Thus, additional gains in energy and cost efficiency can potentially be achieved by exploiting a heterogeneous O-Cloud infrastructure with both HAs and CPUs through a real-time controller capable of responding to TTI-level traffic fluctuations. To address this, this thesis introduces CloudRIC, a real-time brokering system powered by lightweight data-driven models that jointly coordinates a centralized access for multiple DUs to a heterogeneous pool of computing processors, including HAs and CPUs, and assists DUs with compute-aware radio policies while meeting vRAN-specific reliability targets. Extensive experimental evaluations on GPU-accelerated vRANs demonstrate that CloudRIC can achieve, respectively, 3x and 15x average gains in energy and cost efficiency under real and even dense RAN workloads compared to the industry-standard solution that assigns dedicated HAs to individual DUs, while maintaining the same 99.999% target reliability. At the time of writing this thesis, the proposed solutions are, to the best of our knowledge, the only approaches aimed at reliably deploying energy- and cost-efficient hardware-accelerated vRANs through both DU centralization and the combined use of HAs and CPUs for DU processing.