A quasi-experimental study of a digital reading platform across two independent standardized assessments.
The Effect of Rally Reader on Student Reading
Measuring the effects of a new education tool in a California public school district, 2024–25
Snowpack Data partnered with Rally Reader to evaluate the effect of their digital reading platform on elementary students across a California public school district during the 2024–25 school year. We're sharing this paper as a way to promote our findings, which support the case for Rally Reader as a positive educational tool with fantastic promise, as well as showcase the standard of analytical rigor that the Snowpack team put into this study. This is just one among many mission-aligned passion projects that our team has gotten the privilege to support.
References to the partner district and its schools have been redacted at the district's request. A PDF of the full paper is available here.
Abstract
Rally Reader usage at protocol (at or above the grade-level 75th percentile of usage hours) has a significant positive effect on student reading achievement, consistent with ESSA Tier 3 (Promising Evidence) criteria. This finding is supported by multiple controlled analytical methods across two independent standardized assessments.
Difference-in-difference (DiD) analyses conducted via ordinary least-squares regression (OLS) and propensity score matching (PSM) across iReady and CAASPP assessments indicate that students using Rally Reader show meaningful improvement in reading achievement relative to non-users. All four iReady analyses — the primary and more finely-measured assessment panel — show positive treatment effects, with three reaching statistical significance. CAASPP results are directionally positive for protocol-adherent students at the grade level, with Grade 5 reaching significance. No analysis produced a statistically significant negative result.

The effect is strongest for protocol-adherent students in Grade 3, who showed 132% greater growth than their control peers on iReady, with positive signals in other grades as well. On the CAASPP — California's state summative assessment — Grade 5 protocol students scored +21.6 points higher than control peers (p=0.048). These findings reflect a single-district quasi-experimental design; multi-district replication would further strengthen the evidence base.

Introduction
Rally Reader is an eReader app that combines popular books with a digital reading coach to provide real-time feedback to assist students in learning pronunciation and vocabulary, and provides reading motivation. While using the application the student may read aloud and the application will track the reader's progress along the page, assisting when they may get stuck and providing visual indicators of their progress. Alternatively the student may read silently akin to a more traditional e-book interface. The purpose of this application is to measure accurate diagnostics of a student's reading progress while providing them with a wide selection of reading options, motivation during reading sessions, and encouragement in their post-session performance.
Background
Rally Reader was introduced to the partner school district at the end of the 2023–2024 school year in a pilot program before rolling out more broadly across the district in the 2024–2025 school year. With no pre-planned randomized control trial outlined ahead of the process, a post-hoc analysis was conducted to draw conclusions as to the causal effects of Rally Reader on student reading assessments for students who used Rally Reader during the '24–25 school year.
The relative randomization and breadth of Rally Reader adoption provides a reasonable dataset to draw conclusions on the efficacy of Rally Reader usage on student reading achievement. This analysis reviews two sets of standardized tests as measures of student reading performance. The primary measure is the state-administered English Language Assessment (CAASPP ELA), which is administered annually in the second half of the school year in the Spring. The second is iReady, an adaptive diagnostic examination that is administered multiple times throughout the school year and offers a more precise measure of performance as compared to the blunt state standardized tests.
District Context
The partner district is a California public school district serving approximately 5,400 elementary students (grades K–6) across seven elementary schools. Rally Reader was introduced as a supplemental reading tool in May 2024 for use on the Apple iPad, with adoption expanding across classrooms throughout the 2024–2025 school year. Adoption was teacher-driven and voluntary; no classrooms were randomly assigned to use or not use the product.
By March 2026, 1,501 students had used Rally Reader at least once. Adoption was concentrated at six core elementary schools, with per-school user counts ranging from roughly 100 to 260 students. Adoption was not uniform — it rolled out in waves, with 227 students starting in May 2024 (pilot), a second wave in September–December 2024, and continued onboarding through the 2025–2026 school year.
Rally Reader adoption also varied by grade, with adoption highest in grades 3–5. This imbalance is controlled for in all regression measurements and affects the outcome of DiD effects due to the expected scale-score improvement of students changing by grade, and is the reason per-grade results are reported alongside pooled estimates.

Usage Patterns
Among students who used Rally Reader during the '24–25 school year, the median total usage was 1.0 hours (average 4.5 hours). The 75th percentile was 3.5 hours and the 90th percentile was 12.1 hours.

The protocol threshold in this study is defined as the grade-level 75th percentile (P75) of total usage hours among students with any Rally Reader usage. This approach accounts for differences in typical usage levels across grades and identifies students whose engagement is meaningfully above their grade peers. The P75 thresholds by grade are: Grade 2 (0.9h), Grade 3 (3.5h), Grade 4 (8.3h), Grade 5 (2.9h), Grade 6 (9.7h). Students at or above their grade's P75 threshold are classified as protocol-adherent.
Additionally, a minimum usage floor of 2 minutes is applied to distinguish genuine engagement from incidental app-opens. Students with less than 2 minutes of total usage are reclassified as non-users for the binary "Any RR" analysis.
Reading Type
Rally Reader supports two modes of reading: oral (read-aloud, validated by the app) and silent. Among users in the iReady panel:
- 366 students (58%) were classified as mostly oral
- 28 students (4%) as mixed
- 240 students (37%) as mostly silent
Oral reading use decreased with age: 85% of 2nd graders primarily read aloud, compared to 22% of 6th graders. Mostly silent readers logged more total hours on average (9.6h) than mostly oral readers (1.2h).

Demographics
The sample of students for iReady (grades 2–6) includes 630 Rally Reader users and 1,021 non-users. The groups are broadly comparable but not identical.
Rally Reader users are less likely to be economically disadvantaged (32.4% vs 42.4%), less likely to be English language learners (5.7% vs 7.3%), skew more towards female students (52.2% vs 48.6%), and have higher baseline scores (545 vs 503). The difference in baseline scores suggests that the tool may have been directed towards students who could benefit from additional reading practice, in addition to students self-selecting into a reading tool that offers benefits like a broad library of reading material.
These differences are controlled for in all regression analyses. Propensity score matching provides an additional check by explicitly balancing these characteristics before estimating treatment effects.
Methodology
This study uses a quasi-experimental difference-in-differences (DiD) design to estimate the causal effect of Rally Reader on student reading achievement. Outcomes show the score improvement of treatment groups in standardized tests over the school year relative to control. Because adoption was not randomized, we compare outcomes between students who used Rally Reader and those who did not, controlling for pre-existing differences.
The study population includes students in grades 2–6 across the district. Of these, 1,501 students used Rally Reader at any point during the study period.
Assessments
Student achievement is measured using two independent standardized assessments:
CAASPP ELA – California's annual summative assessment, administered each Spring. The CAASPP panel (n=1,312) includes students with scale scores in both Spring 2024 (baseline) and Spring 2025 (outcomes), grades 3–5.
iReady Diagnostic – Administered up to three times per year (Fall, Winter, Spring). The iReady panel (n=1,651) uses a flexible pre/post design: for each Rally Reader user, the baseline is their last iReady score before their Rally Reader start date, and the outcome is their most recent score after a minimum 180-day gap. Control students use the median Rally Reader start date (February 12, 2025) as their split point. This flexible design maximizes statistical power by leveraging each student's full exposure window rather than restricting to a fixed testing period.
Each assessment uses a different scale-scoring mechanism and scores are never compared directly.
Treatment Assignment
Students are classified into three treatment groups based on total Rally Reader usage hours relative to their grade's 75th percentile threshold:
| Group | Criteria | n |
|---|---|---|
| Control | No Rally Reader usage (< 2 min) | 7,891 |
| Exposure | Used Rally Reader (≥ 2 min) | 1,222 |
| Protocol | Used Rally Reader (P75) | 379 |
The protocol threshold is computed per grade as the 75th percentile of total usage hours among students with any Rally Reader usage (> 2 minutes). This within-grade approach accounts for the fact that typical usage levels vary by grade. P75 thresholds range from 0.6 hours (Grade 2) to 7.5 hours (Grade 6). A student is only classified as treated for a given assessment if their Rally Reader usage began before the assessment date.
In addition to the 3-arm model above, a binary "Any RR" model is estimated that pools all Rally Reader users (with at least 2 minutes of total usage) against all non-users, providing a simple intent-to-treat estimate.
Statistical Methods
Two primary analytical methods are used, each applied independently to both CAASPP and iReady panels:
- OLS Difference-in-Differences (DiD). Regresses score gain on treatment status, controlling for baseline score (centered within grade), grade level, gender, English learner status, economic disadvantage, and Hispanic ethnicity. For iReady models, the number of days between baseline and outcome tests and an outcome window fixed effect are also included as controls to account for variation in the testing window and seasonal assessment differences. All standard errors are HC3-robust. Two OLS models are estimated: a 3-arm model (Exposure vs Protocol vs Control) for dose-response analysis, and a binary model (Any RR vs Control) for the pooled intent-to-treat estimate.
- Propensity Score Matching + DiD. A logistic regression model estimates the probability of Rally Reader usage based on the same covariates. Each treated student is matched 1:1 to a control student with the nearest propensity score (caliper = 0.2 SD). DiD is then computed on the matched pairs on matched assessments. PSM is run separately for two comparisons: Any RR user vs Control, and Protocol-only vs Control.
All p-values reported are from these regression models. Point estimates in tables reflect controlled regression estimates with full covariates.
Reading Type Classification
Rally Reader tracks whether students use oral (validated reading aloud) or silent reading modes. Students are classified by their share of oral reading time into three categories:
| Classification | Criteria |
|---|---|
| Mostly Oral | > 60% oral reading time |
| Mixed | 40–60% oral reading time |
| Mostly Silent | < 40% oral reading time |
Demographic Controls
The following covariates are included in all regression models to control for observable differences between treatment and control groups:
- Gender (50.0% male across iReady sample)
- Hispanic ethnicity (47.4% of the sample)
- English Learner status (6.7% of the sample)
- Economic disadvantage (38.5% of the sample)
- Baseline score (centered within grade to control for starting achievement level)
- Grade level (categorical fixed effect)
- Days between tests (iReady models only — controls for variation in testing window)
- Outcome window (iReady models only — fixed effect controlling for seasonal assessment differences)
The following variables were omitted due to a lack of sample size, availability, or introduction of significant expansion of degrees of freedom: race, special education status, migrant status, school, classroom.
Propensity score matching provides an additional check against OLS by explicitly balancing these covariates between treatment and control before estimating effects.
Results
All four iReady analyses — the more finely-measured assessment panel — indicate positive treatment effects for Rally Reader users, with three reaching statistical significance. Across the four CAASPP ELA analyses, results are directionally positive for protocol-adherent students, with Grade 5 reaching statistical significance. No evaluation produced a statistically significant negative result.

The strongest and most consistent evidence comes from the iReady diagnostics. The binary OLS estimate for all Rally Reader users is +8.3 points (p<0.001), and the protocol-adherent OLS estimate is +11.4 points (p=0.002). Propensity score matching on all users confirms the positive direction at +4.2 points (p=0.038). CAASPP results are directionally positive for protocol students (+4.7 OLS) but carry wide confidence intervals due to smaller per-protocol sample sizes and are not statistically significant.

iReady Diagnostic Results
Pooled Effects
Rally Reader users showed a significant improvement on iReady diagnostics relative to non-users. The Any RR binary OLS estimate of +8.3 points (p<0.001, n=630 users vs 1,021 control) represents the intent-to-treat effect across all usage levels. Protocol-adherent students show a larger effect of +11.4 points (p=0.002, n=163), indicating a dose-response relationship. PSM confirms the positive direction for all users at +8.6 points (p<0.001, 457 matched pairs). PSM on protocol-only students yields +4.9 points (p=0.288, 63 pairs) — directionally positive but not significant given the smaller matched sample.


Per-Grade Results
The treatment effect is strongest in grades 3 and 5. Protocol-adherent students in grade 5 show +20.0 points (p=0.010) and exposure students in grade 5 show +11.9 points (p=0.001). Grade 3 exposure students show +11.8 points (p=0.043), with protocol at +16.2 points (p=0.087). Grade 4 shows positive but non-significant effects.

Subskills Analysis
iReady subskill scores indicate that the treatment effect is significant in comprehension overall (+10.9 points), comprehension of literature (+15.4 points), and vocabulary (+9.5 points). We also see non-significant but directional gains in Phonics (+15.9 points), High Frequency Words (+10.2), and Informational Comprehension (+6.0), though this score is only available for lower grades. This is consistent with Rally Reader's content, which is primarily fiction-based material catering to student interests.

CAASPP ELA Results
Pooled Effects
CAASPP results for protocol-adherent students are positive and directionally correlated with iReady but do not reach statistical significance at the pooled level. The OLS protocol estimate is +4.7 points (p=0.40, n=120), and PSM protocol yields −5.6 points (p=0.49, 120 matched pairs). The binary "Any RR" OLS estimate is near-zero (+0.2 points, p=0.95), indicating no pooled intent-to-treat effect on CAASPP. The CAASPP results carry wider confidence intervals due to the smaller per-protocol sample and the blunter nature of the annual assessment compared to iReady's adaptive and more frequent administration.


Per-Grade Results
The grade-level pattern for CAASPP shows a notable positive result for Grade 5 protocol students (+21.6 points, p=0.048). Other grade-level CAASPP results do not reach statistical significance.

The divergence between CAASPP and iReady results may reflect assessment timing and frequency. CAASPP is administered only once per year in the Spring, while the iReady panel uses a flexible pre/post design based on each student's Rally Reader start date. Students who began Rally Reader mid-year may not have accumulated sufficient usage before the spring CAASPP assessment, or may have had a single poor test performance that would have been washed out in iReady's multiple testing windows.
Discussion
Interpretation
The results of this analysis provide reasonable converging evidence that Rally Reader usage has a positive effect on student reading achievement, particularly as measured by iReady diagnostics. The effect is robust across multiple analytical methods including OLS DiD, PSM DiD, and subskill decomposition. The indication of literary comprehension and vocabulary as the highest-impact subskills lends credence to our hypothesis that Rally Reader is encouraging students to engage in reading through access to their preferred books and leading to improvements over their peers. There is also weak but directional evidence that deeper engagement in oral reading, the primary reinforcement mechanism of Rally Reader, is leading to improved outcomes for students.
Dosage and Compliance
The protocol threshold in this analysis is defined as the within-grade 75th percentile (P75) of total usage hours, rather than a fixed hour count. This data-driven approach identifies students whose engagement is meaningfully above their grade peers and accounts for natural differences in usage patterns across grades. Protocol thresholds range from 0.9 hours in Grade 2 to 9.7 hours in Grade 6.
Additional analysis is warranted to examine the relationship between total reading time and academic improvement. We did not observe a linear correlation between increased dosage and additional performance gains. This pattern is consistent with a ceiling effect rather than diminishing returns from the product itself: students who read the most tend to be higher-performing to begin with, and high-achieving students have less room to gain on assessments calibrated to grade-level performance.
Dosage compliance, defined here as aggregate usage over the school year, was an opinionated decision made for this analysis and is subject to scrutiny. This was chosen over alternatives of average time per week, active weeks, active days, and average time per session after assessing the sample sizes and baseline demographics of students in the exposure group. We found that a simple aggregate number of reading hours was a straightforward measure of dosage compliance and allowed a reasonable number of students in the sample rather than a stricter protocol of 45 minutes each week during the school year. However, improving the acceptance protocol selection and dosage compliance in a future study will allow much more accurate measurement.
Limitations
There are several key limitations of this study that must be addressed.
- Single district. All data comes from a single California public school district. Results may not generalize to districts with different demographics, adoption patterns, or instructional contexts.
- Small per-protocol samples. Protocol sample sizes vary by assessment panel (154 in iReady, 120 in CAASPP). Per-grade protocol samples are smaller still, which limits the precision of per-grade estimates and contributes to wide confidence intervals.
- Non-random assignment. Adoption was teacher- and student-driven and largely voluntary. While this study attempts to control for differences in exposure vs control, unobserved confounders such as teacher quality, school, or classroom-level effects cannot be ruled out.
- Flexible panel design. The iReady panel uses each student's Rally Reader start date to define the pre/post split, which maximizes statistical power but introduces variation in the testing window. Because the flexible design produces outcomes spanning multiple seasonal iReady windows (Fall, Winter, Spring), an outcome window fixed effect is included to control for seasonal re-norming differences. Days between tests are also controlled for in the regression. A same-window robustness check restricted to Fall 2025 outcomes confirms the positive result (Any RR +7.5, p=0.005; PSM +5.7, p=0.022). A Spring-to-Spring sensitivity analysis (fixed window) shows directionally consistent but smaller, non-significant effects.
Next Steps
One of the key outcomes of this post-hoc analysis is to identify what we may require in a future randomized controlled trial to more clearly identify the effects of Rally Reader on student reading achievement. There are a few concerns: namely, how many students we need in the treatment group to get a significant number of students to reach dosage compliance, and how many students we need in the treatment group to identify a potentially lower treatment effect.
With a conservative treatment effect of +3 to +4 points, we would need a total of 4,036 students to measure a statistically significant effect in CAASPP scores. Reasonable RCTs would likely need somewhere between 1,500 and 4,000 students appropriately randomized and controlled. We would also need to make protocol adherence monitored and enforced as part of the study rather than chosen after the fact.
Appendix A — Regression Coefficients
Appendix B — Sensitivity

Because a fixed Spring-to-Spring window compresses the effective treatment period — it includes students who began Rally Reader after their baseline Spring test but before the outcome Spring test — attenuated effects are expected in this sensitivity analysis relative to the flexible panel. A Spring-to-Spring robustness panel restricts the iReady analysis to students with Spring 2024 and Spring 2025 scores (n=1,562). Under this fixed-window design, the protocol estimate is −0.2 points and the exposure estimate is −2.0 points (p=0.43). Neither reaches significance.
Appendix C — PSM Covariate Balance
