@teortaxesTex: DS-Vision does not use visual grounding at all, and cannot replicate the cases from the paper. It's ...