SYnthetic Samples GENerator (SYSGEN): an approach to increase the size of incidence samples in coffee leaf rust modelling

authors

Girón, Edwar Javier
CORRALES MÚÑOZ, DAVID CAMILO
SESMERO LORENTE, MARIA PAZ
IGLESIAS MARTINEZ, JOSE ANTONIO
CORRALES, JUAN CARLOS

published in

Evolving Systems Journal

publication date

August 2021

start page

625

end page

636

volume

13

Digital Object Identifier (DOI)

https://doi.org/10.1007/s12530-021-09395-0

full text

https://hdl.handle.net/10016/45233

International Standard Serial Number (ISSN)

1868-6478

Electronic International Standard Serial Number (EISSN)

1868-6486

abstract

Rust is declared as big problem for coffee farmers. Several rust attacks were occurred in Latin American countries as Colombia, Mexico, Peru, Ecuador and Salvador. Due to damage caused by coffee rust, several regression models have been proposed to estimate the rust from weather variables. However, these models lack real rust samples because the recollection process of samples requires large expenses of money and time. Considering this issue, we propose in this paper a mechanism called SYnthetic Samples GENerator (SYSGEN). This proposal is based on cubic spline interpolation to increase the size of rust incidence samples (RIS) and expert knowledge to adjust the rust progress curve in Colombian coffee crops. In order to demonstrate the reliability of SYSGEN, we built 132 regression models from synthetic incidence samples (dependent variable) and weather observations (independent variables). To do this, we considered three Colombian coffee regions, five experiments and four regression models. Besides, we used Recursive Feature Elimination (RFE) to select the relevant weather variables. The analysis of these models and RFE are promising since several aspects and effects related with the rust development are revealed. One of these aspects is that the regression models used frequently temperature (maximum, minimum and average) and relative humidity variables. In this sense, it is important to highlight that these meteorological variables are considered by experts as key drivers in germination, penetration, colonization and sporulation phases. In terms of performance, our experiments allow us to conclude that random forest (RF) and bagging trees (BT) reached the lowest Root Mean Square Error (RMSE). Finally, it is important to consider that different datasets produce different performance. For example, if we consider those experiments that involve flowering periods datasets, the lowest RMSE was reached by RF. However, in datasets of coffee harvest periods, BT reached lowest RMSE.

SYnthetic Samples GENerator (SYSGEN): an approach to increase the size of incidence samples in coffee leaf rust modelling Articles

Overview

authors

published in

publication date

start page

end page

volume

Digital Object Identifier (DOI)

full text

International Standard Serial Number (ISSN)

Electronic International Standard Serial Number (EISSN)

abstract

Classification

subjects

keywords