An online classification algorithm for large scale data streams: IGNGSVM Articles uri icon

authors

  • Suárez-Cetrulo, Andrés.L
  • CERVANTES ROVIRA, ALEJANDRO

publication date

  • November 2017

start page

  • 67

end page

  • 76

volume

  • 262

International Standard Serial Number (ISSN)

  • 0925-2312

Electronic International Standard Serial Number (EISSN)

  • 1872-8286

abstract

  • Stream Processing has recently become one of the current commercial trends to face huge amounts of data. However, normally these techniques need specific infrastructures and high resources in terms of memory and computing nodes. This paper shows how mini-batch techniques and topology extraction methods can help making gigabytes of data to be manageable for just one server using computationally costly Machine Learning techniques as Support Vector Machines. The algorithm iGNGSVM is proposed to improve the performance of Support Vector Machines in datasets where the data is continuously arriving. It is benchmarked against a mini-batch version of LibSVM, achieving good accuracy rates and performing faster than this.

subjects

  • Computer Science

keywords

  • data classification; topology extraction; online learning; large datasets; growing neural gas; support vector machines