Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle

authors

VALVERDE ALBACETE, FRANCISCO JOSE
PELAEZ MORENO, CARMEN

published in

Entropy Journal

publication date

June 2018

start page

1

end page

20

issue

7

volume

20

Digital Object Identifier (DOI)

https://doi.org/10.3390/e20070498

full text

http://hdl.handle.net/10016/28121

International Standard Serial Number (ISSN)

1099-4300

abstract

Data transformation, e.g., feature transformation and selection, is an integral part of any machine learning procedure. In this paper, we introduce an information-theoretic model and tools to assess the quality of data transformations in machine learning tasks. In an unsupervised fashion, we analyze the transformation of a discrete, multivariate source of information (X) over bar into a discrete, multivariate sink of information (Y) over bar related by a distribution P-(XY) over bar. The first contribution is a decomposition of the maximal potential entropy of ((X) over bar, (Y) over bar), which we call a balance equation, into its (a) non-transferable, (b) transferable, but not transferred, and (c) transferred parts. Such balance equations can be represented in (de Finetti) entropy diagrams, our second set of contributions. The most important of these, the aggregate channel multivariate entropy triangle, is a visual exploratory tool to assess the effectiveness of multivariate data transformations in transferring information from input to output variables. We also show how these decomposition and balance equations also apply to the entropies of (X) over bar and (Y) over bar, respectively, and generate entropy triangles for them. As an example, we present the application of these tools to the assessment of information transfer efficiency for Principal Component Analysis and Independent Component Analysis as unsupervised feature transformation and selection procedures in supervised classification tasks.

Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle Articles

Overview

authors

published in

publication date

start page

end page

issue

volume

Digital Object Identifier (DOI)

full text

International Standard Serial Number (ISSN)

abstract

Classification

subjects

keywords