electronic international standard serial number (EISSN)
Based on a large and recently developed database of 1-min irradiance and ancillary data observations at 54 world stations, this study uses the gradient boosting Machine Learning (ML) technique to improve the process of components separation, through which the direct and diffuse solar radiation components are estimated from 1-min global horizontal irradiance data. Here, the XGBoost implementation of gradient boosting is used both with ensembles of linear and ensembles of non-linear weak prediction models. The predictions of 140 separation models of the literature are combined using XGBoost to overall improve the random errors of the predictions of the individual separation models at any of the validation sites. The minimum prediction error is essentially achieved by a combination of 26 out of the original 140 models, with no meaningful reduction in error by combining more models. Most of these 26 models use at least three inputs in addition to clearness index. In parallel, XGBoost is also used to separate the components directly from the inputs to the separation models. From the 24 possible inputs used in the original 140 separation models, only 14 are found relevant. These 14 inputs could be used with appropriate formalism to subsequently develop a better separation model. It is found that when the training and validation datasets are not collocated, the RMSD of the predictions increases, on average, 2% with respect to the case of collocated datasets. Overall, the present results indicate that a data-driven ML approach combining a limited number of existing models can be used to considerably decrease the currently large random errors associated with such models when used separately at high temporal frequency. (C) 2017 Elsevier Ltd. All rights reserved.