Basic information about the novel GIT task.
We first propose the Global Instance Tracking (GIT) task, a new task of searching an arbitrary user-specified instance in a video without any assumptions on camera or motion consistency, to accurately model the human eye tracking ability. An ideal GIT algorithm is supposed to work in different types of video environments like rapid view angle changes, frequent camera switches, or long-term target absences.
This figure compares GIT with other video-related vision tasks. VID (a) and MOT (b) can only locate limited instances, while SOT (c) and GIT (d) do not constrain the target category. Furthermore, GIT expands the SOT task by canceling the motion continuity assumption, allowing the target to move in a broader and more complex environment. The detailed comparsion of GIT with above vision tasks are listed in (e). Obviously, GIT is a new visual task without restrictions on target categories and scenarios.