Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

Wujian Peng1,2, Sicheng Xie1,2, Zuyao You1,2, Shiyi Lan3, Zuxuan Wu1,2

Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University1
, Shanghai Collaborative Innovation Center of Intelligent Visual Computing2, NVIDIA3

Introduction

Our initial version of the SPEC dataset was published at CVPR 2024, and both the data and code have been release at here. Meanwhile, we are building a new version of SPEC dataset with a larger data scale, more object categories, and higher-quality images and text, etc. Here we provide a demo at this website, and the full version will come soon. You can start with the initial version of SPEC, please keep following our github project for news about the updating.

Absolute Size

Relative Size

Absolute Spatial

Relative Spatial

Existence

Count

BibTeX


  @inproceedings{peng2024spec,
    title={Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding},
    author={Wujian Peng, Sicheng Xie, Zuyao You, Shiyi Lan, Zuxuan Wu}, 
    booktitle={CVPR},
    year={2024}
  }