Handling Ill-Conditioned Omics Data with Deep Probabilistic Models Articles
Overview
published in
publication date
- September 2023
issue
- 9
volume
- 27
Digital Object Identifier (DOI)
International Standard Serial Number (ISSN)
- 2168-2194
Electronic International Standard Serial Number (EISSN)
- 2168-2208
abstract
- The advent of high-throughput technologies has produced an increase in the dimensionality of omics datasets, which limits the application of machine learning methods due to the great unbalance between the number of observations and features. In this scenario, dimensionality reduction is essential to extract the relevant information within these datasets and project it in a low-dimensional space, and probabilistic latent space models are becoming popular given their capability to capture the underlying structure of the data as well as the uncertainty in the information. This article aims to provide a general classification and dimensionality reduction method based on deep latent space models that tackles two of the main problems that arise in omics datasets: the presence of missing data and the limited number of observations against the number of features. We propose a semi-supervised Bayesian latent space model that infers a low-dimensional embedding driven by the target label: the Deep Bayesian Logistic Regression (DBLR) model. During inference, the model also learns a global vector of weights that allows it to make predictions given the low-dimensional embedding of the observations. Since this kind of dataset is prone to overfitting, we introduce an additional probabilistic regularization method based on the semi-supervised nature of the model. We compared the performance of the DBLR against several state-of-the-art methods for dimensionality reduction, both in synthetic and real datasets with different data types. The proposed model provides more informative low-dimensional representations, outperforms the baseline methods in classification, and can naturally handle missing entries.
Classification
keywords
- bayesian; classification; deep generative model; dimensionality reduction; latent space model; missing data; semi-supervised; vae