person profile

Samiha A. Ismail

Samiha A. Ismail — researcher or builder tracked in the Angestrom contributor network.

3Connections

1Papers

0Models

0Repos

0News

Papers · 1

A rubric-based controlled comparison of frontier language models on expert-authored clinical reasoning tasks

Multiple-choice medical benchmarks are increasingly saturated, and recent rubric-based evaluations such as HealthBench have shown that open-ended clinical performance is far from solved - its "Hard" subset top score remains 32%. We present a small, deliberately difficult evaluation dataset of five clinician-authored clinical scenarios spanning four specialties (anaesthesia, internal/family medicine, emergency medicine, and obstetrics), each accompanied by an atomic, weighted, MECE rubric (25-62 criteria per task; 184 criteria total) authored from a clinician-drafted golden answer. We evaluate