Probabilistic Policy Reuse for Inter-Task Transfer Learning Articles uri icon

publication date

  • July 2010

start page

  • 866

end page

  • 871

issue

  • 7

volume

  • 58

International Standard Serial Number (ISSN)

  • 0921-8890

Electronic International Standard Serial Number (EISSN)

  • 1872-793X

abstract

  • Policy Reuse is a reinforcement learning technique that efficiently learns a new policy by using past similar learned policies. The Policy Reuse learner improves its exploration by probabilistically including the exploitation of those past policies. Policy Reuse was introduced, and its effectiveness was previously demonstrated, in problems with different reward functions in the same state and action spaces. In this article, we contribute Policy Reuse as transfer learning among different domains. We introduce extended Markov Decision Processes (MDPs) to include domains and tasks, where domains have different state and action spaces, and tasks are problems with different rewards within a domain. We show how Policy Reuse can be applied among domains by defining and using a mapping between their state and action spaces. We use several domains, as versions of a simulated RoboCup Keepaway problem, where we show that Policy Reuse can be used as a mechanism of transfer learning significantly outperforming a basic policy learner.