A general benchmark for visual tracking and visual language tracking intelligence evaluation
VideoCube is a high-quality and large-scale benchmark to create a challenging real-world
experimental environment for Global Instance Tracking (GIT). MGIT is a high-quality and multi-modal
benchmark based on VideoCube-Tiny to fully represent the complex spatio-temporal and causal relationships
coupled in longer narrative content. DTVLT is a new visual language tracking benchmark with diverse texts,
based on five prominent VLT and SOT benchmarks, including three sub-tasks: short-term tracking,
long-term tracking, and global instance tracking.
Task
Global Instance Tracking (GIT) task aims to model the fundamental visual function of humans for motion
perception without any assumptions about camera or motion consistency.
Key Features
Large-Scale
VideoCube contains 500 video segments of real-world moving objects and over 7.4 million labeled bounding
boxes. We guarantee that each video contains at least 4008 frames, and the average frame length in
VideoCube is around 14920.
Multiple Collection Dimension
The collection of VideoCube is based on six dimensions to describe the spatio-temporal relationship and
causal relationship of film narrative, which provides an extensive dataset for the novel GIT task.
Comprehensive Attribute Selection
VideoCube provides 12 attributes for each frame to reflect the challenging situations in actual
applications, and implement a more elaborate reference for the performance analysis.
Scientific Evaluation
VideoCube provides classical metrics and novel metrics for to evaluation algorithms. Besides, this
benchmark also provides human baseline to measure the intelligence level of existing methods.
Multi-granularity Semantic Annotation
MGIT design a hierarchical multi-granular semantic annotation strategy to provide scientific natural
language information. Video content is annotated by three grands (i.e., action, activity, and
story).
Evaluation Mechanism for Multi-modal Tracking
MGIT expand the evaluation mechanism by conducting experiments under both traditional evaluation
mechanisms (multi-modal single granularity, single visual modality) and evaluation mechanisms adapted to
MGIT (multi-modal multi-granularity).
Latest News
[2024.6.10] Recently, we have proposed a new visual language tracking benchmark with diverse texts, named
DTVLT, based on five prominent VLT and
SOT benchmarks, including three sub-tasks: short-term
tracking, long-term tracking, and global instance tracking, aiming to support further research in VLT and
video understanding. Now you can download the dataset from the download page via the URLs of DTVLT. We
have updated the semantic labels in the OneDrive and Baiduyun Disk. The paper, experimental results, and
toolkit about DTVLT will be updated gradually.
[2023.06.13] Recently, we have proposed a new multi-modal global instance tracking benchmark named
MGIT. It consists of 150 long video sequences with a total of 2.03 million frames, aiming to fully
represent the complex spatio-temporal and casual relationships coupled in longer narrative content. Now you
can download the dataset from the download
page via the URLs of VideoCube-Tiny (the 150 sequences in MGIT are the same as those in
VideoCube-Tiny, but with additional semantic information). We have updated the semantic labels in the
OneDrive and Baiduyun Disk. The paper, experimental results, and toolkit about MGIT will be updated
gradually.
[2023.02.10] To make it easier for users to research with VideoCube, we have selected 150 representative
sequences from the original version (500 sequences) to form VideoCube-Tiny. The original full version
includes 500 sequences (1.4T), while the tiny version includes 150 sequences (344G, 105 sequences
in train-set, 15 sequences in val-set, and 30 sequences in test-set). The download page, leaderboard in OPE and R-OPE mechanisms, and the toolkit have
also been updated. You are encouraged to report the results of VideoCube-Tiny in your research works.
[2022.04.18] We have released SOTVerse, a user-defined task space of single object tracking, and the related paper
has been released on arXiv.
[2022.03.04] We have updated the download
page, submit instructions, and
leaderboard in OPE and R-OPE mechanisms.
[2021.11.21] The videocube-toolkit has been released!
[2021.11.17] The submission system and real-time leaderboard have been released!
[2021.06.07] The download links and the register system have been released!
[2021.05.24] The home page and the instructions page have been released!
Publications
Global Instance Tracking: Locating Target More Like Humans. S. Hu,
X. Zhao*,
L. Huang and
K. Huang (*corresponding author) IEEE Transactions on Pattern Analysis and Machine Intelligence
[PDF]
[BibTex]
Please cite our IEEE TPAMI paper if VideoCube helps your research.
A Multi-modal Global Instance Tracking Benchmark (MGIT): Better Locating Target in Complex
Spatio-temporal and Casual Relationship. S. Hu,
D. Zhang,
M. Wu,
X. Feng,
X. Li,
X. Zhao and
K. Huang Advances in Neural Information Processing Systems
[PDF]
[BibTex]
Please cite our NeurIPS paper if MGIT helps your research.