Learning adversarial attack policies through multi-objective reinforcement learning Articles uri icon

publication date

  • November 2020

start page

  • 1

end page

  • 11


  • 96

International Standard Serial Number (ISSN)

  • 0952-1976

Electronic International Standard Serial Number (EISSN)

  • 1873-6769


  • Deep Reinforcement Learning has shown promising results in learning policies for complex sequential decision-making tasks. However, different adversarial attack strategies have revealed the weakness of these policies toperturbations to their observations. Most of these attacks have been built on existing adversarial examplecrafting techniques used to fool classifiers, where an adversarial attack is considered a success if it makes the classifier outputs any wrong class. The major drawback of these approaches when applied to decision-makingtasks is that they are blind for long-term goals. In contrast, this paper suggests that it is more appropriate toview the attack process as a sequential optimization problem, with the aim of learning a sequence of attacks, where the attacker must consider the long-term effects of each attack. In this paper, we propose that suchan attack policy must be learned with two objectives in view. On the one hand, the attack must pursue themaximum performance loss of the attacked policy. On the other hand, it also should minimize the cost ofthe attacks. Therefore, in this paper we propose a novel modelization of the process of learning an attackpolicy as a Multi-objective Markov Decision Process with two objectives: maximizing the performance loss of the attacked policy and minimizing the cost of the attacks. We also reveal the conflicting nature of thesetwo objectives and use a Multi-objective Reinforcement Learning algorithm to draw the Pareto fronts for four wel-known tasks: the GridWorld, the Cartpole, the Mountain car and the Breakout.


  • adversarial reinforcement learning; multi-objective reinforcement learning