Towards Robust and Fast Visual Object Tracking: Scale Estimation and Channel Reliability Estimation
Visual object tracking (VOT) is a critical task in numerous applications ranging from video surveillance to autonomous vehicle guidance. Given the state of a target object in an initial frame, the goal of VOT is to estimate the state of the target object in the following frames. The wealth, ubiquity and volume of video data collected demand a robust and fast tracking algorithm for searching and analyzing video databases. With the advancement of discriminative modeling and feature representations, many important works have been carried out, such as the discriminative correlation filter based tracking and Siamese network based tracking. However, there are drawbacks of the current tracking framework that hinder optimal tracking performance.
On one hand, the equal weighting of each feature channel in current tracking methods hinders the full exploitation of the discriminative power of the employed multi-channel feature. Each feature channel often corresponds to a certain type of visual pattern. Therefore, some feature channels are more significant than others in certain circumstances and hence yield much stronger and less noisy response maps. To obtain the optimal tracking performance using the currently employed feature, we propose to first model the importance of each feature channel as channel reliability, and then compute the final response map adaptively using the channel reliability measure.
On the other hand, most tracking methods employ a fixed size template and hence yield inferior performance when scale variation challenges are encountered. We intend to address the scale estimation problem in three successive steps. First, a novel criterion called the average peak-to-correlation energy is incorporated into the exhaustive scale searching framework to obtain robust and accurate scale estimation. Furthermore, to address the problem of heavy computation associated with an exhaustive scale searching scheme, we investigate strategies to reduce the computational cost. In the third stage, a coarse-to-fine scale estimation approach is proposed to enable aspect ratio adaptability by integrating a class-agnostic detection proposal method. Our scale estimation strategies are shown to exhibit state-of-the-art tracking performance when applied to notable online tracking benchmark (OTB) datasets.