Electronic International Standard Serial Number (EISSN)
1872-8286
abstract
Stream Processing has recently become one of the current commercial trends to face huge amounts of data. However, normally these techniques need specific infrastructures and high resources in terms of memory and computing nodes. This paper shows how mini-batch techniques and topology extraction methods can help making gigabytes of data to be manageable for just one server using computationally costly Machine Learning techniques as Support Vector Machines. The algorithm iGNGSVM is proposed to improve the performance of Support Vector Machines in datasets where the data is continuously arriving. It is benchmarked against a mini-batch version of LibSVM, achieving good accuracy rates and performing faster than this.
Classification
subjects
Computer Science
keywords
data classification; topology extraction; online learning; large datasets; growing neural gas; support vector machines