Evaluating and accelerating vision transformers on GPU-based embedded edge AI systems Articles uri icon

publication date

  • December 2024

start page

  • 1

end page

  • 21

issue

  • 349

volume

  • 81

International Standard Serial Number (ISSN)

  • 0920-8542

Electronic International Standard Serial Number (EISSN)

  • 1573-0484

abstract

  • Many current embedded systems comprise heterogeneous computing components including quite powerful GPUs, which enables their application across diverse sectors. This study demonstrates the efficient execution of a medium-sized self-supervised audio spectrogram transformer (SSAST) model on a low-power system-on-chip (SoC). Through comprehensive evaluation, including real time inference scenarios, we show that GPUs outperform multi-core CPUs in inference processes. Optimization techniques such as adjusting batch size, model compilation with TensorRT, and reducing data precision significantly enhance inference time, energy consumption, and memory usage. In particular, negligible accuracy degradation is observed, with post-training quantization to 8-bit integers showing less than 1% loss. This research underscores the feasibility of deploying transformer neural networks on low-power embedded devices, ensuring efficiency in time, energy, and memory, while maintaining the accuracy of the results.

subjects

  • Computer Science
  • Electronics
  • Telecommunications

keywords

  • vision transformer; gpu; low-power; system-on-chip