This paper proposes a framework in which a multivariate analysis method (MVA) guides a selection of input variables that leads to a sparse feature extraction. This framework, called parsimonious MVA, is specially suited for high dimensional data such as gene arrays, digital pictures, etc. The feature selection relies on the analysis of consistency in the behaviour of the input variables through the elements of an ensemble of MVA projection matrices. The ensemble is constructed following a bootstrap that builds on an efficient and generalized MVA formulation that covers PCA, CCA and OPLS. Moreover, it allows the estimation of the relative relevance of each selected input variable. Experimental results point out that the features extracted by the parsimonious MVA have excellent discrimination power, comparing favorably with state-of-the-art methods, and are potentially useful to build interpretable features. Besides, the parsimonious feature extractor is shown to be robust against to parameter selection, as we all computationally efficient.
feature selection; dimensionality reduction; multivariate analysis; principal component analysis; canonical correlation analysis; orthonormalized partial least squares