Multi-Armed Restless Bandits, Index Policies and Dynamic Priority Allocation Articles uri icon

publication date

  • January 2010

start page

  • 124

end page

  • 133

issue

  • 2

volume

  • 26

international standard serial number (ISSN)

  • 1699-8871

abstract

  • This paper presents a brief introduction to the emerging research field of multi-armed restless bandits (MARBs), which substantially extend the modeling power of classic multi-armed bandits. MARBs are Markov decision
    process models for optimal dynamic priority allocation to a collection
    of stochastic binary-action (active/passive) projects evolving over
    time. Interest in MARBs has grown steadily, spurred by the breadth of
    their possible applications. Although MARBs are generally intractable, a
    Lagrangian relaxation and decomposition approach yields a unifying
    design principle for heuristic priority-index policies, which are often
    tractable and nearly optimal, along with an upper bound on the optimal
    reward.