Scaling Expertise with Human-AI Collaboration

Date: Mar 4, 2026
Time: 04:00 PM (Local Time Germany)
Speaker: Susanna Loeb (Stanford University)
Room: Ground Floor

Pandemic learning losses created demand for high-quality tutoring at scale, yet many programs rely on novice tutors who often lack the pedagogical expertise to respond effectively to student mistakes. This paper develops and tests Tutor CoPilot, a human-AI system grounded in ReMath, a framework that deconstructs expert teacher decision-making into three stages — error diagnosis, strategy selection, and response generation — derived from think-aloud protocols with experienced teachers. Embedding this expert reasoning into GPT-4 prompts and providing tutors with three suggested responses (which they can select, edit, or ignore) produces improvements in math learning: in a seven-week randomized controlled trial with 783 tutors and ~1,000 students in schools serving student from low-income families, students assigned to treatment tutors were 4 percentage points more likely to pass session exit tickets (ITT), rising to 14 points when CoPilot was actually used (ToT). Effects were largest for lower-rated and less experienced tutors, and NLP analysis of over 241,000 tutor messages confirms a shift toward higher-quality pedagogical strategies. At roughly $20 per tutor per year, the approach offers a promising, scalable, cost-effective model for augmenting human expertise with AI while preserving human agency and judgment.