Handling Ill-Conditioned Omics Data with Deep Probabilistic Models

authors

Martinez Garcia, Maria
MARTINEZ OLMOS, PABLO

published in

IEEE Journal of Biomedical and Health Informatics Journal

publication date

September 2023

issue

9

volume

27

Digital Object Identifier (DOI)

https://doi.org/10.1109/jbhi.2023.3279493

International Standard Serial Number (ISSN)

2168-2194

Electronic International Standard Serial Number (EISSN)

2168-2208

abstract

The advent of high-throughput technologies has produced an increase in the dimensionality of omics datasets, which limits the application of machine learning methods due to the great unbalance between the number of observations and features. In this scenario, dimensionality reduction is essential to extract the relevant information within these datasets and project it in a low-dimensional space, and probabilistic latent space models are becoming popular given their capability to capture the underlying structure of the data as well as the uncertainty in the information. This article aims to provide a general classification and dimensionality reduction method based on deep latent space models that tackles two of the main problems that arise in omics datasets: the presence of missing data and the limited number of observations against the number of features. We propose a semi-supervised Bayesian latent space model that infers a low-dimensional embedding driven by the target label: the Deep Bayesian Logistic Regression (DBLR) model. During inference, the model also learns a global vector of weights that allows it to make predictions given the low-dimensional embedding of the observations. Since this kind of dataset is prone to overfitting, we introduce an additional probabilistic regularization method based on the semi-supervised nature of the model. We compared the performance of the DBLR against several state-of-the-art methods for dimensionality reduction, both in synthetic and real datasets with different data types. The proposed model provides more informative low-dimensional representations, outperforms the baseline methods in classification, and can naturally handle missing entries.

Handling Ill-Conditioned Omics Data with Deep Probabilistic Models Articles

Overview

authors

published in

publication date

issue

volume

Digital Object Identifier (DOI)

International Standard Serial Number (ISSN)

Electronic International Standard Serial Number (EISSN)

abstract

Classification

keywords