Automatic aesthetics prediction of multimedia content is bound to be a powerful tool for artificial intelligence due to the wide range of applications where it could be used. With this paper we contribute to the research in the field of video aesthetics assessment by carrying out a comparative study of (1) the performance of eight families of visual descriptors in accounting for the general aesthetics perception of videos and (2) the suitability of different YouTube metadata for providing successful strategies for automatic annotation of a data set. Regarding the descriptors, some families, tested on their own, have provided significant classification rates (62.3% with only two features), which is increased when the best families are combined (65% accuracy). With respect to the YouTube metadata, we have created strategies for automatic annotation and found out that using the number of likes and dislikes (quality-based metadata) provides successful ways of annotating the corpus, whereas the number of views (quantity) is not useful for deriving a metric related to aesthetics perception.
automatic aesthetics prediction; image descriptors; video descriptors; youtube; automatic annotation