news · Reddit r/MachineLearning

DeepSWE: new benchmark looking at how well today's frontier models can actually write code [R]

<table> <tr><td> <a href="https://www.reddit.com/r/MachineLearning/comments/1ue0hlp/deepswe_new_benchmark_looking_at_how_well_todays/"> <img alt="DeepSWE: new benchmark looking at how well today's frontier models can actually write code [R]" src="https://preview.redd.it/lacvagyr159h1.png?width=140&height=89&auto=webp&s=14f97a97511fbfe2fd767e4dc986ce0b4da5c73e" title="DeepSWE: new benchmark looking at how well today's frontier models can actually write code [R]" /> </a> </td><td> <!--

Want the primary source?View original →

modelHelix-7B

paperNuclearQAv2: A Structured Benchmark for Evaluating Domain-Science Competence in Large Language Models

paperNuclearQAv2: A Structured Benchmark for Evaluating Domain-Science Competence in Large Language Models modelHelix-7B

Research Reddit r/MachineLearning