paperarXivTrust 82 · PrimaryPublished 7d agoLive · 4d ago

COCOLogic-V2: Identifying Logical Inconsistencies via Truly Hard-Negatives

While interpretable models such as concept bottleneck models (CBMs) and program synthesis methods enable verification of model decisions, their evaluation is typically limited to simple tasks, leaving complex reasoning on real-world images largely unexplored. We introduce COCOLogic-V2, an object-centric dataset for visual inductive reasoning on real-world images covering a broad subset of first-order logic. By categorizing samples into positive variants, near-boundary (NB), and far-from-boundary (FB) negatives, COCOLogic-V2 enables fine-grained diagnosis of model accountability. Our evaluation

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Related to

companyNorthwind AI

Covers

newsNew benchmark exposes reasoning gaps in top models

Related across the graph

companyNorthwind AI newsNew benchmark exposes reasoning gaps in top models

Topics

cs.LG