Contributions to centralized dynamic channel allocation reinforcement learning agents

2018

We introduce a domain-specific policy improvement operator for reassigning

channels during call hand-offs with the intent of reducing hand-off blocking

probability. We construct an RL agent for maximizing average grid utilization,

which uses a linear neural network as state value-function approximator and

afterstates for action selection. A variant of TD(0) with gradient correction

(TDC) is proposed for average-reward MDPs, which in conjunction with

the policy improvement operator contributes decreased hand-off call blocking

probability in a simulated centralized caller environment without any penalty to

previously shown (Singh & Bertsekas 1997) state of the art new call blocking probability.

The policy improvement operator is also applied to the table-lookup based SARSA

agent of Lilith (2004) where it shows state of the art performance in terms

of hand-off blocking probability for an all-admission agent.

While this works considers centralized systems, the policy improvement operator

is applicable to distributed agents so long as the channel usages of the

interfering neighbors of the hand-off arrival BS are known to the hand-off

departure BS.

NTNU