swebench

4.1.0
20.52M

The official SWE-bench package - a benchmark for evaluating LMs on software engineering