Automating the evaluation of planning systems

Research in automated planning is getting more and more focused on empirical evaluation. Likewise the need for methodologies and benchmarks to build solid evaluations of planners is increasing. In 1998 the planning community made a move to address this need and initiated the International Planning Competition - or IPC for short. This competition has typically been conducted every two years in the context of the International Conference on Automated Planning and Scheduling (ICAPS) and tries to define standard metrics and benchmarks to reliably evaluate planners. In the sixth edition of the competition, IPC 2008, there was an attempt to automate the evaluation of all entries in the competition which was imitated to a large extent and extended in several ways in the seventh edition, IPC 2011. As a result, a software for automatically running planning experiments and inspecting the results is available, encouraging researchers to use it for their own research interests. The software allows researchers to reproduce and inspect the results of IPC 2011, but also to generate and analyze new experiments with private sets of planners and problems. In this paper we provide a gentle introduction to this software and examine the main difficulties, both from a scientific and engineering point of view, in assessing the performance of automated planners.

Automating the evaluation of planning systems Articles