Contributions to centralized dynamic channel allocation reinforcement learning agents
Abstract
We introduce a domain-specific policy improvement operator for reassigningchannels during call hand-offs with the intent of reducing hand-off blockingprobability. We construct an RL agent for maximizing average grid utilization,which uses a linear neural network as state value-function approximator andafterstates for action selection. A variant of TD(0) with gradient correction(TDC) is proposed for average-reward MDPs, which in conjunction withthe policy improvement operator contributes decreased hand-off call blockingprobability in a simulated centralized caller environment without any penalty topreviously shown (Singh & Bertsekas 1997) state of the art new call blocking probability.The policy improvement operator is also applied to the table-lookup based SARSAagent of Lilith (2004) where it shows state of the art performance in termsof hand-off blocking probability for an all-admission agent.
While this works considers centralized systems, the policy improvement operatoris applicable to distributed agents so long as the channel usages of theinterfering neighbors of the hand-off arrival BS are known to the hand-offdeparture BS.