A framework for evaluating overthinking and basic reasoning capabilities of Large Language Models
pip install llmthinkbench