Agglomerative clustering and rResidual-VLAD encoding for human action recognition Articles uri icon

authors

  • BUTT, AMMAR MOSHIN
  • Yousaf, Muhammad Haroon
  • Murtaza, Fiza
  • Nazir, Saima
  • VIRIRI, SERESTINA
  • VELASTIN CARROZA, SERGIO ALEJANDRO

publication date

  • June 2020

start page

  • 1

end page

  • 14

issue

  • 12

volume

  • 10

International Standard Serial Number (ISSN)

  • 2076-3417

abstract

  • Human action recognition has gathered significant attention in recent years due to its high demand in various application domains. In this work, we propose a novel codebook generation and hybrid encoding scheme for classification of action videos. The proposed scheme develops a discriminative codebook and a hybrid feature vector by encoding the features extracted from CNNs (convolutional neural networks). We explore different CNN architectures for extracting spatio-temporal features. We employ an agglomerative clustering approach for codebook generation, which intends to combine the advantages of global and class-specific codebooks. We propose a Residual Vector of Locally Aggregated Descriptors (R-VLAD) and fuse it with locality-based coding to form a hybrid feature vector. It provides a compact representation along with high order statistics. We evaluated our work on two publicly available standard benchmark datasets HMDB-51 and UCF-101. The proposed method achieves 72.6% and 96.2% on HMDB51 and UCF101, respectively. We conclude that the proposed scheme is able to boost recognition accuracy for human action recognition.

subjects

  • Computer Science

keywords

  • action recognition; bag-of-words; deep residual networks; clustering; feature encoding; classification