Evaluating and accelerating vision transformers on GPU-based embedded edge AI systems

Many current embedded systems comprise heterogeneous computing components including quite powerful GPUs, which enables their application across diverse sectors. This study demonstrates the efficient execution of a medium-sized self-supervised audio spectrogram transformer (SSAST) model on a low-power system-on-chip (SoC). Through comprehensive evaluation, including real time inference scenarios, we show that GPUs outperform multi-core CPUs in inference processes. Optimization techniques such as adjusting batch size, model compilation with TensorRT, and reducing data precision significantly enhance inference time, energy consumption, and memory usage. In particular, negligible accuracy degradation is observed, with post-training quantization to 8-bit integers showing less than 1% loss. This research underscores the feasibility of deploying transformer neural networks on low-power embedded devices, ensuring efficiency in time, energy, and memory, while maintaining the accuracy of the results.

Evaluating and accelerating vision transformers on GPU-based embedded edge AI systems Articles