BCOT Benchmark

Abstract

Template-based 3D object tracking still lacks a high-precision benchmark of real scenes due to the difficulty of annotating the accurate 3D poses of real moving video objects without using markers. In this paper, we present a multi-view approach to estimate the accurate 3D poses of real moving objects, and then use binocular data to construct a new benchmark for monocular textureless 3D object tracking. The proposed method requires no markers, and the cameras only need to be synchronous, relatively fixed as cross-view and calibrated. Based on our object-centered model, we jointly optimize the object pose by minimizing shape re-projection constraints in all views, which greatly improves the accuracy compared with the single-view approach, and is even more accurate than the depth-based method. Our new benchmark dataset contains 20 textureless objects, 22 scenes, 404 video sequences and 126K images captured in real scenes. The annotation error is guaranteed to be less than 2mm, according to both theoretical analysis and validation experiments. We re-evaluate the state-of-the-art 3D object tracking methods with our dataset, reporting their performance ranking in real scenes.

Download Dataset

OneDrive

BaiduYun (Extraction Code:xgkm)

Code

The implementation of the proposed joint optimization framework based on multi-view data.

Monocular tracking: Observed from another view, there is a large translation error.

Multi-view tracking: Observe precise result from any view.

download code

Demo

Models

datasets

namespace

      easy : easy scene
      complex : complex scene
      light : dynamic light
      static : static camera set
      movable : movable camera set

Deadpool model, easy_static_suspension

Teapot model, easy_static_handheld

Ape model, easy_static_trans

Cat model, complex_movable_handheld

Lamp Clamp model, complex_movable_suspension

Squirrel model, complex_static_handheld

Lamp Clamp, complex_static_suspension

RTI Arm model, complex_static_trans

Tube model, light_movable_handheld

Stitch model, light_movable_suspension

3D Touch model, light_static_handheld

Driller model, light_static_suspension

Wall Shelf model, light_static_trans

RJ45 Clip model, occlusion_movable_suspension

Stitch model, outdoor_scene1_movable_handheld

RJ45 Clip model, outdoor_scene2_movable_suspension

Video

FlashLight model, complex_static_suspension

Squirrel model, light_movable_handheld

Wall Shelf model, light_static_suspension

Jack model, light_static_handheld

Ape model, light_movable_suspension

Deadpool model, easy_static_handheld

Jack model, complex_static_trans

Deadpool model, occlusion_movable_suspension

Stitch, outdoor_movable_scene2_handheld

Evaluation

Comparison of monocular 3D tracking methods

Comparison of monocular 3D tracking methods of indoor scenes

Comparison of monocular 3D tracking methods of outdoor scenes

Overall tracking accuracy under various ADD error tolerance thresholds

Indoor scene tracking accuracy under various ADD error tolerance thresholds

Outdoor scene tracking accuracy under various ADD error tolerance thresholds

Paper

BCOT: A Markerless High-Precision 3D Object Tracking Benchmark. Jiachen Li, Bin Wang, Shiqiang Zhu, Xin Cao, Fan Zhong, Wenxuan Chen, Te Li*, Jason Gu, Xueying Qin*. CVPR 2022.

Project Page

For more information, please visit our project page project page .