The evaluation of data sources using multivariate entropy tools Articles uri icon

publication date

  • July 2017

start page

  • 145

end page

  • 157

volume

  • 78

International Standard Serial Number (ISSN)

  • 0957-4174

Electronic International Standard Serial Number (EISSN)

  • 1873-6793

abstract

  • We introduce from first principles an analysis of the information content of multivariate distributions as information sources. Specifically, we generalize a balance equation and a visualization device, the Entropy Triangle, for multivariate distributions and find notable differences with similar analyses done on joint distributions as models of information channels. As an example application, we extend a framework for the analysis of classifiers to also encompass the analysis of data sets. With such tools we analyze a handful of UCI machine learning task to start addressing the question of how well do datasets convey the information they are supposed to capture about the phenomena they stand for.

subjects

  • Telecommunications

keywords

  • machine learning evaluation; dataset entropy; multivariate entropy; entropic measures; exploratory analysis; entropy ternary diagram; entropy balance equation