swebench

4.1.0
60.15M

The official SWE-bench package - a benchmark for evaluating LMs on software engineering