Iterative variable selection for high-dimensional data: Prediction of pathological response in triple-negative breast cancer Articles uri icon

authors

  • LARIA DE LA CRUZ, JUAN CARLOS
  • AGUILERA MORILLO, MARIA DEL CARMEN
  • ÁLVAREZ CASTILLO, ENRIQUE LUIS
  • LILLO RODRIGUEZ, ROSA ELVIRA
  • LÓPEZ TARUELLA, SARA
  • DEL MONTE MILLÁN, MARÍA
  • Picornell, Antonio C
  • MARTIN, MIGUEL
  • ROMO URROZ, JUAN

publication date

  • January 2021

start page

  • 1

end page

  • 14

issue

  • 3

volume

  • 9

International Standard Serial Number (ISSN)

  • 2227-7390

abstract

  • Over the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced statistics and can use an agnostic point of view or include some a priori knowledge, but overfitting remains a problem. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data and a real dataset from a triple-negative breast cancer study.

keywords

  • variable selection; high-dimension; regularization; classification; sparse-group lasso