Deep Multi-Shot Network for modelling Appearance Similarity in Multi-Person Tracking applications Articles uri icon

authors

  • GOMEZ SILVA, MARIA JOSE

publication date

  • January 2021

start page

  • 23701

end page

  • 23721

issue

  • 80

International Standard Serial Number (ISSN)

  • 1380-7501

Electronic International Standard Serial Number (EISSN)

  • 1573-7721

abstract

  • The automatization of Multi-Object Tracking becomes a demanding task in real unconstrained scenarios, where the algorithms have to deal with crowds, crossing people, occlusions, disappearances and the presence of visually similar individuals. In those circumstances, the data association between the incoming detections and their corresponding identities could miss some tracks or produce identity switches. In order to reduce these tracking errors, and even their propagation in further frames, this article presents a Deep Multi-Shot neural model for measuring the Degree of Appearance Similarity (MS-DoAS) between person observations. This model provides temporal consistency to the individuals" appearance representation, and provides an affinity metric to perform frame-by-frame data association, allowing online tracking. The model has been deliberately trained to be able to manage the presence of previous identity switches and missed observations in the handled tracks. With that purpose, a novel data generation tool has been designed to create training tracklets that simulate such situations. The model has demonstrated a high capacity to discern whether a new observation corresponds to a certain track or not, achieving a classification accuracy of 97% in a hard test that simulates tracks with previous mistakes. Moreover, the tracking efficiency of the model in a Surveillance application has been demonstrated by integrating that into the frame-by-frame association of a Tracking-by-Detection algorithm.