Modern Bloom

Recreating Bloom's 2-Sigma Problem

A three-arm randomized controlled trial isolating the effect of expert human tutoring dosage on standardized math achievement.

The Problem

Bloom's 1984 finding is one of the most cited results in education: one-on-one tutoring moved students from the 50th to the 98th percentile. Forty years later, no one has properly replicated it.

The tutors got worse. Replication attempts had severe cost constraints. Saga Education, the study most known for tutor quality, paid tutors $25,500 a year with primarily self-serve digital training, tutoring two students at a time.

The assessments changed. Bloom introduced content novel to students. Later studies moved to standardized measures like MAP, requiring much longer interventions to show meaningful impact.

The designs got muddled. Subsequent replications introduced confounds — varying dosage and tutor quality simultaneously, swapping individuals for small groups, or using less-trained tutors with a fixed curriculum.

We have mapped the floor of tutoring effectiveness. The ceiling remains unknown.

This Study

A three-arm RCT that isolates dosage while holding tutor quality constant. All tutoring is virtual, one-on-one, delivered by expert instructors trained on the highest-effect interventions in the research literature.

	Daily	Weekly	Control
Tutor	Expert instructor	Expert instructor	None
Dosage	Daily 1-on-1 (45 min)	Weekly 1-on-1 (45 min)	Standard instruction
Duration	Two semesters	Two semesters	Two semesters

Outcome

MAP Growth (nationally normed, 11M+ students)

Sample

Grades 2–4, N=150, enrolled via lottery

Power

82% to detect 4 RIT points (~0.5 SD)

A typical session might proceed: review prior work (5 min), extend a concept at the student's frontier (15 min), guided practice with feedback (15 min), assessment or reflection (10 min). But this is illustrative, not prescriptive. A tutor who decides a student needs 45 minutes on a single problem type is free to do that.

Why Now

AI tutoring is the most active frontier in education technology. But the quality of an AI tutor can only be measured against a clear picture of what great human tutoring looks like. That picture does not exist. This study produces it: the first rigorous benchmark for what elite human tutoring achieves on a nationally normed assessment.

Tutor Selection & Training

Tutors are experienced educators with at least three years of classroom or intensive tutoring experience in elementary mathematics and demonstrated records of student growth. We expect to hire 12–14 for the daily arm and 3–4 for the weekly arm, each serving 4–5 students on staggered schedules.

All tutors complete a two-week (40-hour) training program designed with leading education researchers, covering: the evidence base for each intervention strategy; diagnostic use of MAP data; the structured decision-capture protocol; and supervised practice sessions with feedback.

Data Collection

Three streams:

MAP Growth RIT scores. Nationally normed, computer-adaptive assessments administered at baseline, mid-year, and endline.

Session recordings and interaction logs. Browser-level data capturing every problem attempted, resource used, and tool opened.

Tutor decision journals. Structured entries completed after each session: the student's state, what the tutor chose to focus on, what worked, and what they plan to adjust.

Together these reconstruct not just whether expert tutoring works, but how — the micro-decisions that constitute expert instruction.

The Larger Program

Phase 1: Virtual study. This pilot is self-contained and answers its own research question — does expert tutoring produce large effects on standardized measures?

Phase 2: In-school RCT. If Phase 1 confirms large effects, the second phase compares tutored students directly against classroom instruction in physical schools.

Independent value. The pilot also produces tutor training protocols, data infrastructure, and a detailed corpus of expert tutoring behavior — a behavioral benchmark for training and evaluating AI tutoring systems.

Timeline

Phase	Duration	Activities
Preparation	Months 1–3	IRB, researcher partnerships, tutor recruitment, training design
Training	Month 4	Tutor training, lottery, consent, baseline MAP, randomization
Semester 1	Months 5–10	~15 weeks tutoring, mid-semester MAP
Semester 2	Months 11–16	~15 weeks tutoring, mid-semester MAP
Analysis	Months 16–18	Endline MAP, data cleaning, analysis, manuscript

Apply to Be a Tutor

We're looking for experienced math educators (grades 2–4) with at least 3 years of classroom or intensive tutoring experience and demonstrated records of student growth.