Feature and functional form selection in additive models via mixed-integer optimization

authors

NAVARRO GARCÍA, MANUEL
GUERRERO LOZANO, VANESA
DURBAN REGUERA, MARIA LUZ
DEL CERRO VERGARA, ARTURO

published in

COMPUTERS & OPERATIONS RESEARCH Journal

publication date

April 2025

start page

106945

volume

176

Digital Object Identifier (DOI)

https://doi.org/10.1016/j.cor.2024.106945

full text

https://hdl.handle.net/10016/46059

International Standard Serial Number (ISSN)

0305-0548

Electronic International Standard Serial Number (EISSN)

1873-765X

abstract

Feature selection is a recurrent research topic in modern regression analysis, which strives to build interpretable models, using sparsity as a proxy, without sacrificing predictive power. The best subset selection problem is central to this statistical task: it has the goal of identifying the subset of covariates of a given size that provides the best fit in terms of an empirical loss function. In this work, we address the problem of feature and functional form selection in additive regression models under a mathematical optimization lens. Penalized splines (
splines) are used to estimate the smooth functions involved in the regression equation, which allow us to state the feature selection problem as a cardinality-constrained mixed-integer quadratic program (MIQP) in terms of both linear and non-linear covariates. To strengthen this MIQP formulation, we develop tight bounds for the regression coefficients. A matheuristic approach, which encompasses the use of a preprocessing step, the construction of a warm-start solution, the MIQP formulation and the large neighborhood search metaheuristic paradigm, is proposed to handle larger instances of the feature and functional form selection problem. The performance of the exact and the matheuristic approaches are compared in simulated data. Furthermore, our matheuristic is compared to other methodologies in the literature that have publicly available implementations, using both simulated and real-world data. We show that the stated approach is competitive in terms of predictive power and in the selection of the correct subset of covariates with the appropriate functional form. A public Python library is available with all the implementations of the methodologies developed in this paper.

Feature and functional form selection in additive models via mixed-integer optimization Articles

Overview

authors

published in

publication date

start page

volume

Digital Object Identifier (DOI)

full text

International Standard Serial Number (ISSN)

Electronic International Standard Serial Number (EISSN)

abstract

Classification

subjects

keywords