Electronic International Standard Serial Number (EISSN)
The detection of the spatial-temporal interest points has a key role in human action recognition algorithms. This research work aims to exploit the existing strength of bag-of-visual features and presents a method for automatic action recognition in realistic and complex scenarios. This paper provides a better feature representation by combining the benefit of both a well-known feature detector and descriptor i.e. the 3D Harris space-time interest point detector and the 3D Scale-Invariant Feature Transform descriptor. Finally, action videos are represented using a histogram of visual features by following the traditional bag-of-visual feature approach. Apart from video representation, a support vector machine (SVM) classifier is used for training and testing. A large number of experiments show the effectiveness of our method on existing benchmark datasets and shows state-of-the-art performance. This article reports 68.1% mean Average Precision (mAP), 94% and 91.8% average accuracy for Hollywood-2, UCF Sports and KTH datasets respectively. (C) 2018 Elsevier Ltd. All rights reserved.
human action recognition; local spatio-temporal features; bag-of-visual features; hollywood-2 dataset