Interpreting Anticipatory Deep Reinforcement Learning for Proactive Mobile Network Control
Fecha
2026-05Resumen
Deep Reinforcement Learning (DRL) is widely used
for adaptive control in mobile networks, yet most agents remain
reactive. This limitation is particularly problematic for exogenous
Key Performance Indicators (KPIs), whose dynamics cannot be
directly controlled by agent action and evolve independently.
Anticipatory DRL addresses this issue by augmenting the state
with short-horizon KPIs forecasts, but it remains unclear whether
such information truly influences decisions. We use SIA, a
symbolic interpretability tool, to explain whether and how
anticipatory information is actually exploited by the policy,
enabling principled redesign of forecast inputs and performance
improvements. Using policy graphs and Mutual Information (MI)
over symbolic temporal features, SIA distinguishes proactive
and reactive behaviors. Using a standard Pensieve ABR agent
augmented with throughput forecasts, experiments on realworld
5G traces show a 3% average reward improvement, with
anticipatory policies spending more time at high bitrates while
reducing unnecessary oscillations.


