Read original ↗

newsOpenAITrust 88 · LabPublished 3d agoLive · 3d ago

Introducing GeneBench-Pro

Introducing GeneBench-Pro, a new benchmark testing AI performance in genomics, biology, and scientific research using complex, real-world datasets.

Research OpenAI verified

Covers

repoFabiojvv/ai-cortex-hub repoHayredin950/SYNAPSE paperFARS: A Fully Automated Research System Deployed at Scale paperAGC-Bench: Measuring Artificial General Creativity

Covers (incoming)

paperClinician-Level Agreement Without Clinical Caution: LLM Evaluator Limits in Medical AI Benchmarking repoqdrant/qdrant repospiceai/spiceai repoMemPalace/mempalace reporun-llama/ParseBench paperTestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution paperBeyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials

Related across the graph

repoFabiojvv/ai-cortex-hub paperBeyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials paperClinician-Level Agreement Without Clinical Caution: LLM Evaluator Limits in Medical AI Benchmarking repoqdrant/qdrant repospiceai/spiceai reporun-llama/ParseBench paperAGC-Bench: Measuring Artificial General Creativity paperTestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution repoHayredin950/SYNAPSE repoMemPalace/mempalace paperFARS: A Fully Automated Research System Deployed at Scale