| dc.description.abstract | Over the past decade, network complexity has grown exponentially to support the emergence of new and innovative applications. This increased sophistication has rendered most human-in-the-loop approaches to network management tasks obsolete, calling for more automation and flexibility in the network. The advent of Software Defined Networking (SDN) with a programmable control plane was a huge step in the right direction, giving rise to a variety of network automation applications running in the SDN control plane. With the vulgarization of Machine Learning (ML) in recent years, many network automation applications use ML techniques to address network problems like intrusion detection, routing optimization, quality of service prioritization, and fault detection. Yet, as these applications run in the control plane, many of them cannot respond in real-time to network issues and incur a response delay in the order of milliseconds to seconds, which is undesirable in ultra-low-latency applications that will abound in 6G networks.
Recent advances in user-plane programmability have led to the current availability of off-the-shelf programmable user-plane equipment like Intel Tofino switches, alongside compatible network programming languages like P4. This has sparked a strong interest in in-network computation, with efforts to offload ML models from the control plane to the user plane to reduce their response time and enable inference at line rate with low latency and high throughput. Most work on user-plane inference has focused on programmable switches due to their ubiquitous presence in the network and the availability of multiple high-speed ports. However, switches are highly constrained in terms of available memory, support for mathematical operations, and the number of allowed operations per packet. This makes it impossible to train ML models in the switch and shifts the focus to deploying trained models into user-plane switches. The constraints above also make complex models like Neural Networks (NN) less feasible for in-switch deployment. Instead, most prior works have deployed tree-based models like Decision Trees (DT) and Random Forests (RF) for in-switch inference due to their simple logical structure and few operations required at inference, which make them ideal for constrained environments. Yet, these works have several limitations such as limited scalability and adaptability which translate into performance barriers when handling complex inference tasks.
This thesis proposes efficient solutions for embedding ML models into production-grade programmable switches, thereby addressing the above limitations and advancing the state of the art in ML-based user-plane inference. To illustrate the evolution across the solutions presented in the thesis, a practical application of user-plane inference is considered to show how in-switch inference can enable rapid detection of cyberattacks on SDN-based Smart Grid (SG) networks. Current power grids are smart, with millions of electronic devices interconnected by data networks. This exposes them to many cyberattacks which could lead to power outages and data breaches with far-reaching impacts. Thus, the timely detection of cyberattacks is critical. ML models are widely used for cyberattack detection in SDN-based SGs, where the models either run in external servers or in-network but fully in the control plane or distributed between the control and user planes. In these cases, the models do not run at line rate and incur millisecond-level delays in attack detection. The application developed in this thesis explores how ML inference in programmable switches at Packet-Level (PL) can enable accelerated attack detection and mitigation in SGs at line rate with sub-microsecond delay. The proposed workflow brings the concept of user-plane inference to SDN-based SGs for the first time, and deploys a trained DT model into the switch pipeline for real-time inference on live traffic. Results produced in this thesis show how a pure user plane solution achieves up to 99% accuracy in attack detection and classification, while operating up to four orders of magnitude faster than solutions running entirely in the control plane.
The above solution and all earlier solutions for PL inference in the user-plane focus on flat classification, and have significant structural limitations that prevent them from scaling when handling complex inference tasks. To tackle these limitations, this thesis proposes Henna, the pioneer implementation of an in-switch multi-stage hierarchical classification system. The concept upon which Henna hinges is that of splitting a difficult classification task into easier cascaded inference tasks, which can then be addressed with separate resource-efficient tree-based classifiers. The design of Henna aligns with the internal organization of the Protocol Independent Switch Architecture (PISA), and integrates state-of-the-art strategies for mapping decision trees to switch hardware. Henna is then implemented into a real-world testbed with off-the-shelf Intel Tofino programmable switches using the P4 language. Experiments with a complex 21-category classification task based on measurement data exhibit how Henna improves the F1-score of an advanced single-stage model by 21%, while maintaining usage of switch resources at 8% on average.
Despite the improvements brought about by Henna, existing hardware-compatible in switch inference solutions are still either limited to only PL operation, lack support for rich statistical features, or are not scalable, hitting performance barriers in complex tasks involving large decision spaces. To address this limitation, Flowrest is presented as a first complete RF model implementation that operates at the level of individual flows in commercial switches. The proposed solution builds on (i) novel guidelines for tailoring RF models to operation in programmable switches right from the design phase, (ii) an original framework to embed flow-level (FL) machine learning models into programmable switch ASICs, and (iii) efficient strategies for maintaining state within switches to compute, store and employ FL features for inference. Flowrest is implemented in a hardware switch as an open-source software using the P4 language.
Flowrest sets a new standard for FL inference in the user plane. To validate this claim, a thorough evaluation of the proposed solution is conducted in an experimental platform based on Intel Tofino switches in two steps; (i) Flowrest is evaluated on unencrypted traffic, comparing it to major existing proposals for in-switch inference which all target unencrypted traffic, and (ii) it is then evaluated on encrypted traffic classification. Results from the evaluation with tasks of unprecedented complexity show how Flowrest achieves accuracy gains in the 10% − 39% range over previous approaches to implement DT and RF models in real-world equipment.
Despite the improved performance resulting from FL classification, a major dichotomy still exists between works for in-switch inference, based on whether they operate at PL or FL. The former relies on simple features from packet headers that are simple to implement but limit accuracy in challenging use cases; the latter exploits richer flow based statistical features to improve accuracy, but leaves early packets in each flow unclassified. To close this gap, this thesis presents Jewel, an in-switch ML solution based on a fully joint PL and FL design, which offers the best of both worlds by classifying early flow packets individually at PL and shifting to FL inference as soon as possible. The proposed solution involves (i) a single RF model trained to classify both packets and flows, and (ii) hardware-aware model selection and training techniques for resource footprint minimization. Jewel is implemented in P4 and deployed in a testbed with Intel Tofino switches, where extensive experiments are conducted with a variety of real world use cases. Results from experiments conducted in this thesis reveal how Jewel outperforms four state-of-the-art benchmarks, with absolute accuracy gains in the 2.0%−5.3% range, while consuming a modest amount of switch resources.
In summary, this thesis proposes novel solutions for inference in programmable network user planes. Technical details on the design and implementation of the proposed solutions are described first, followed by thorough experimental evaluations that shed light on the merits of each solution in comparison to prior work. Through these contributions, this thesis sets new standards in user-plane ML inference and makes steps towards enabling and encouraging the pervasive adoption of user-plane inference in programmable networks by making all the solutions open-source. | es |