山东大学视觉感知与智能系统实验室

A Spatio-Temporal Appearance Representation for Video-Based Pedestrian Re-Identification

Abstract
Pedestrian re-identification is a difficult problem due to the large variations in a person's appearance caused by different poses and viewpoints, illumination changes, and occlusions. Spatial alignment is commonly used to address these issues by treating the appearance of different body parts independently. However, a body part can also appear differently during different phases of an action. In this paper we consider the temporal alignment problem, in addition to the spatial one, and propose a new approach that takes the video of a walking person as input and builds a spatio-temporal appearance representation for pedestrian re-identification. Particularly, given a video sequence we exploit the periodicity exhibited by a walking person to generate a spatio-temporal body-action model, which consists of a series of body-action units corresponding to certain action primitives of certain body parts. Fisher vectors are learned and extracted from individual body-action units and concatenated into the final representation of the walking person. Unlike previous spatio-temporal features that only take into account local dynamic appearance information, our representation aligns the spatio-temporal appearance of a pedestrian globally. Extensive experiments on public datasets show the effectiveness of our approach compared with the state of the art.

The benefits of our representation are:

1) It describes a person's appearance during a walking cycle, hence covers almost the entire variety of poses and shapes;

2) It aligns the appearance of different people both spatially and temporally;

3) The formation of each body-action unit can be very flexible and different for each person, while Fisher vectors can work with any volume topologies, so the final representation is a consistent feature vector.

Reference

[1]	K. Liu, B. Ma, W. Zhang, and R. Huang, "A Spatio-Temporal Appearance Representation for Video-Based Pedestrian Re-Identification", ICCV, 2015. [Full Text]
[2]	B. Ma, Y. Su, and F. Jurie, "Local descriptors encoded by Fisher vectors for person re-identification", ECCV Workshops, 2012. [Full Text]
[3]	T. Wang, S. Gong, X. Zhu, and S. Wang, "Person reidentification by video ranking", ECCV, 2014. [Full Text]