Electronic International Standard Serial Number (EISSN)
This paper presents a distributed architecture for automating data mining (DM) processes using standard languages. DM is a difficult task that relies on an exploratory and analytic process of processing large quantities of data in order to discover meaningful patterns. The increasing heterogeneity and complexity of available data requires some expert knowledge on how to combine the multiple and alternative DM tasks to process the data. Here, we describe DM tasks in terms of Automated Planning, which allows us to automate the DM knowledge flow construction. The work is based on the use of standards that have been defined in both DM and automated-planning communities. Thus, we use PMML (Predictive Model Markup Language) to describe DM tasks. From the PMML, a problem description in PDDL (Planning Domain Definition Language) can be generated, so any current planning system can be used to generate a plan. This plan is, again, translated to a DM workflow description, Knowledge Flow for Machine Learning format (Knowledge Flow file for the WEKA (Waikato Environment for Knowledge Analysis) tool), so the plan or DM workflow can be executed in WEKA.