newsOpenAITrust 88 · LabPublished 3d agoLive · 3d ago
Introducing GeneBench-Pro
Introducing GeneBench-Pro, a new benchmark testing AI performance in genomics, biology, and scientific research using complex, real-world datasets.
Covers
Covers (incoming)
paperClinician-Level Agreement Without Clinical Caution: LLM Evaluator Limits in Medical AI Benchmarkingrepoqdrant/qdrantrepospiceai/spiceairepoMemPalace/mempalacereporun-llama/ParseBenchpaperTestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-EvolutionpaperBeyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials
Related across the graph
repoFabiojvv/ai-cortex-hubpaperBeyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic PotentialspaperClinician-Level Agreement Without Clinical Caution: LLM Evaluator Limits in Medical AI Benchmarkingrepoqdrant/qdrantrepospiceai/spiceaireporun-llama/ParseBenchpaperAGC-Bench: Measuring Artificial General CreativitypaperTestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-EvolutionrepoHayredin950/SYNAPSErepoMemPalace/mempalacepaperFARS: A Fully Automated Research System Deployed at Scale
