TutorBench
AI Tutoring
Overview
Large Language Models serve as on-demand tutors for learners worldwide, yet, a critical evaluation gap exists. While most benchmarks assess an LLM's ability to solve problems, this capability alone does not make the models effective tutors. Effective tutors require nuanced, human-centered skills essential for student learning like providing adaptive explanations, offering guiding feedback, and adjusting to a learner's specific needs.
To address this gap, we introduce TutorBench, a comprehensive benchmark designed to rigorously evaluate the core tutoring skills of LLMs. TutorBench moves beyond simple answer-correctness to measure how well models perform three common and critical tutoring tasks: