A mathematical optimization approach to shape-constrained generalized additive models Articles uri icon

publication date

  • December 2024

start page

  • 124654-1

end page

  • 124654-16

issue

  • C

volume

  • 255

International Standard Serial Number (ISSN)

  • 0957-4174

Electronic International Standard Serial Number (EISSN)

  • 1873-6793

abstract

  • The vast amount of data generated nowadays demands innovative and flexible techniques that allow to accommodate expert knowledge and help in decision-making. In this work, we address the problem of estimating a generalized additive regression model in which conditions about the sign, monotonicity, or curvature need to be satisfied by the functions involved in its terms. The univariate and multivariate functions, i.e., interaction terms, involved in the regression model, are defined through a B-splines basis and fitted using a penalized splines (P-splines) approach. In the multivariate case the shape constraints are imposed into a finite set of curves belonging to the hypersurface defined by the function, thus defining a skeleton in which the required conditions have to be verified. To do so, new conic optimization models are proposed which can accommodate different conditions along each covariate involved in the regression model. Furthermore, our approach can be used for a continuous response variable, as well as for Poisson and logistic regression. Therefore, a new mathematical optimization framework for shape-constrained regression is stated which copes with different model specifications, involving main and/or interaction effects, and types of response variables. We prove that our methodology is competitive in terms of accuracy and computational times, which are improved in some cases by more than two orders of magnitude, with other state-of-the-art approaches in both simulated and real data sets with applications in economics, social sciences, and medicine.

subjects

  • Statistics

keywords

  • shape-constrained regression; generalized additive models; penalized splines; conic optimization; smooth-anova decomposition; data envelopment analysis