blog posts and news stories

Updated Research on the Impact of Alabama’s Math, Science, and Technology Initiative (AMSTI) on Student Achievement

We are excited to release the findings of a new round of work conducted to continue our investigation of AMSTI. Alabama’s specialized training program for math and science teachers began over 20 years ago and now reaches over 900 schools across the state. As the program is constantly evolving to meet the demands of new standards and new assessment systems, the AMSTI team and the Alabama State Department of Education continue to support research to evaluate the program’s impact. Our new report builds on the work undertaken last year to answer three new research questions.

  1. What is the impact of AMSTI on reading achievement? We found a positive impact of AMSTI for students on the ACT Aspire reading assessment equivalent to 2 percentile points. This replicates a finding from our earlier 2012 study. This analysis used students of AMSTI-trained science teachers, as the training purposely integrates reading and writing practices into the science modules.
  2. What is the impact of AMSTI on early-career teachers? We found positive impacts of AMSTI for partially-trained math teachers and fully-trained science teachers. The sample of teachers for this analysis was those in their first three years of teaching, with varying levels of AMSTI training.
  3. How can AMSTI continue program development to better serve ELL students? Our earlier work found a negative impact of AMSTI training for ELL students in science. Building upon these results, we were able to identify a small subset of “model ELL AMSTI schools” where there was both a positive impact of AMSTI on ELL students, and where that impact was larger than any school-level effect on ELL students versus the entire sample. By looking at the site-specific best practices of these schools for supporting ELL students in science and across the board, the AMSTI team can start to incorporate these strategies into the program at large.

All research Empirical Education has conducted on AMSTI can be found on our AMSTI webpage.

2020-04-06

Research on AMSTI Presented to the Alabama State Board of Education

On January 13, Dr. Eric Mackey, Alabama’s new State Superintendent of Education, presented our rapid cycle evaluation of Alabama Math, Science, and Technology Initiative (AMSTI). The study is based on results for the 2016-17 school year, for which outcome data were available at the time the Alabama State Department of Education (ALSDE) contracted with Empirical in July 2018.

AMSTI is ALSDE’s initiative to improve math and science teaching statewide; the program, which started over 20 years ago, now operates in over 900 schools across the state.

Our current project, led by Val Lazarev compares classes taught by teachers who were fully trained in AMSTI with matched classrooms taught by teachers with no AMSTI training. The overall results, shown in the above graph were similar in magnitude to Empirical’s 2012 study directed by Denis Newman and designed by Empirical’s Chief Scientist, Andrew Jaciw. That cluster-randomized trial, which involved 82 schools and ~700 teachers, showed AMSTI had a small overall positive effect. The earlier study also showed that AMSTI may be exacerbating the achievement gap between black and white students. Since ALSDE was also interested in information that could improve AMSTI, the current study examined a number of subgroup impacts. In this project we did not find a difference between the value of AMSTI for black and white students. We did find a strong benefit for females in science. And for English learners, there was a negative effect of being in a science class of an AMSTI-trained teacher. The state board expressed concern and a commitment to using the results to guide improvement of the program.

Download both of the reports here.

2019-01-22

New Project with ALSDE to Study AMSTI

Empirical Education is excited to announce a new study of the Alabama Math, Science, and Technology Initiative (AMSTI). The Alabama legislature commissioned the study. AMSTI is the Alabama State Department of Education’s initiative to improve math and science teaching statewide. The program, which started over 20 years ago, operates in over 900 schools across the state. Many external evaluators have validated AMSTI.

Researchers here at Empirical Education, directed by Chief Scientist Andrew Jaciw, published a study in 2012. The cluster-randomized trial (CRCT) involved 82 schools and ~700 teachers. It assessed the efficacy of AMSTI over a three year period and showed an overall positive effect (Newman et al., 2012).

The new study that we are embarking on will use a quasi-experimental matched comparison group design. We will take advantage of existing data available from the Alabama State Department of Education and the AMSTI program. By comparing compare schools using AMSTI to matched schools not using AMSTI, we can determine the impact of the program on math and science achievement for students in grades 3 through 8. Our report will also include differential impacts of the program on important student subgroups. Using Improvement Science principles, we will examine school climates for a greater or reduced program impact.

At the conclusion of the study, we will distribute the report to select committees of the Alabama state legislature, the Governor and the Alabama State Board of Education, and the Alabama State Department of Education. Empirical Education researchers will travel to Montgomery, AL to present the study findings and recommendations for improvement to the Alabama legislature.

2018-07-13

The Value of Looking at Local Results

The report we released today has an interesting history that shows the value of looking beyond the initial results of an experiment. Later this week, we are presenting a paper at AERA entitled “In School Settings, Are All RCTs Exploratory?” The findings we report from our experiment with an iPad application were part of the inspiration for this. If Riverside Unified had not looked at its own data, we would not, in the normal course of data analysis, have broken the results out by individual districts, and our conclusion would have been that there was no discernible impact of the app. We can cite many other cases where looking at subgroups leads us to conclusions different from the conclusion based on the result averaged across the whole sample. Our report on AMSTI is another case we will cite in our AERA paper.

We agree with the Institute of Education Sciences (IES) in taking a disciplined approach in requiring that researchers “call their shots” by naming the small number of outcomes considered most important in any experiment. All other questions are fine to look at but fall into the category of exploratory work. What we want to guard against, however, is the implication that answers to primary questions, which often are concerned with average impacts for the study sample as a whole, must apply to various subgroups within the sample, and therefore can be broadly generalized by practitioners, developers, and policy makers.

If we find an average impact but in exploratory analysis discover plausible, policy-relevant, and statistically strong differential effects for subgroups, then some doubt about completeness may be cast on the value of the confirmatory finding. We may not be certain of a moderator effect—for example—but once it comes to light, the value of the average impact can also be considered incomplete or misleading for practical purposes. If it is necessary to conduct an additional experiment to verify a differential subgroup impact, the same experiment may verify that the average impact is not what practitioners, developers, and policy makers should be concerned with.

In our paper at AERA, we are proposing that any result from a school-based experiment should be treated as provisional by practitioners, developers, and policy makers. The results of RCTs can be very useful, but the challenges of generalizability of the results from even the most stringently designed experiment mean that the results should be considered the basis for a hypothesis that the intervention may work under similar conditions. For a developer considering how to improve an intervention, the specific conditions under which it appeared to work or not work is the critical information to have. For a school system decision maker, the most useful pieces of information are insight into subpopulations that appear to benefit and conditions that are favorable for implementation. For those concerned with educational policy, it is often the case that conditions and interventions change and develop more rapidly than research studies can be conducted. Using available evidence may mean digging through studies that have confirmatory results in contexts similar or different from their own and examining exploratory analyses that provide useful hints as to the most productive steps to take next. The practitioner in this case is in a similar position to the researcher considering the design of the next experiment. The practitioner also has to come to a hypothesis about how things work as the basis for action.

2012-04-01

Study of Alabama STEM Initiative Finds Positive Impacts

On February 21, 2012 the U.S. Department of Education released the final report of an experiment that Empirical Education has been working on for the last six years. The report, titled Evaluation of the Effectiveness of the Alabama Math, Science, and Technology Initiative (AMSTI) is now available on the Institute of Education Sciences website. The Alabama State Department of Education held a press conference to announce the findings, attended by Superintendent of Education Bice, staff of AMSTI, along with educators, students, and co-principal investigator of the study, Denis Newman, CEO of Empirical Education.

AMSTI was developed by the state of Alabama and introduced in 2002 with the goal of improving mathematics and science achievement in the state’s K-12 schools. Empirical Education was primarily responsible for conducting the study—including the design, data collection, analysis, and reporting—under its subcontract with the Regional Education Lab, Southeast (the study was initiated through a research grant to Empirical). Researchers from Academy of Education Development, Abt Associates, and ANALYTICA made important contributions to design, analysis and data collection.

The findings show that after one year, students in the 41 AMSTI schools experienced an impact on mathematics achievement equivalent to 28 days of additional student progress over students receiving conventional mathematics instruction. The study found, after one year, no difference for science achievement. It also found that AMSTI had an impact on teachers’ active learning classroom practices in math and science that, according to the theory of action posited by AMSTI, should have an impact on achievement. Further exploratory analysis found effects for student achievement in both mathematics and science after two years. The study also explored reading achievement, where it found significant differences between the AMSTI and control groups after one year. Exploration of differential effect for student demographic categories found consistent results for gender, socio-economic status, and pretest achievement level for math and science. For reading, however, the breakdown by student ethnicity suggests a differential benefit.

Just about everybody at Empirical worked on this project at one point or another. Besides the three of us (Newman, Jaciw and Zacamy) who are listed among the authors, we want to acknowledge past and current employees whose efforts made the project possible: Jessica Cabalo, Ruthie Chang, Zach Chin, Huan Cung, Dan Ho, Akiko Lipton, Boya Ma, Robin Means, Gloria Miller, Bob Smith, Laurel Sterling, Qingfeng Zhao, Xiaohui Zheng, and Margit Zsolnay.

With solid cooperation of the state’s Department of Education and the AMSTI team, approximately 780 teachers and 30,000 upper-elementary and middle school students in 82 schools from five regions in Alabama participated in the study. The schools were randomized into one of two categories: 1) Those who received AMSTI starting the first year, or 2) Those who received “business as usual” the first year and began participation in AMSTI the second year. With only a one-year delay before the control group entered treatment, the two-year impact was estimated using statistical techniques developed by, and with the assistance of our colleagues at Abt Associates. Academy for Education Development assisted with data collection and analysis of training and program implementation.

Findings of the AMSTI study will also be presented at the Society for Research on Educational Effectiveness (SREE) Spring Conference taking place in Washington D.C. from March 8-10, 2012. Join Denis Newman, Andrew Jaciw, and Boya Ma on Friday March 9, 2012 from 3:00pm-4:30pm, when they will present findings of their study titled, “Locating Differential Effectiveness of a STEM Initiative through Exploration of Moderators.” A symposium on the study, including the major study collaborators, will be presented at the annual conference of the American Educational Research Association (AERA) on April 15, 2012 from 2:15pm-3:45pm at the Marriott Pinnacle ⁄ Pinnacle III in Vancouver, Canada. This session will be chaired by Ludy van Broekhuizen (director of REL-SE) and will include presentations by Steve Ricks (director of AMSTI); Jean Scott (SERVE Center at UNCG); Denis Newman, Andrew Jaciw, Boya Ma, and Jenna Zacamy (Empirical Education); Steve Bell (Abt Associates); and Laura Gould (formerly of AED). Sean Reardon (Stanford) will serve as the discussant. A synopsis of the study will also be included in the Common Guidelines for Education Research and Development.

2012-02-21

Exploration in the World of Experimental Evaluation

Our 300+ page report makes a good start. But IES, faced with limited time and resources to complete the many experiments being conducted within the Regional Education Lab system, put strict limits on the number of exploratory analyses researchers could conduct. We usually think of exploratory work as questions to follow up on puzzling or unanticipated results. However, in the case of the REL experiments, IES asked researchers to focus on a narrow set of “confirmatory” results and anything else was considered “exploratory,” even if the question was included in the original research design.

The strict IES criteria were based on the principle that when a researcher is using tests of statistical significance, the probability of erroneously concluding that there is an impact when there isn’t one increases with the frequency of the tests. In our evaluation of AMSTI, we limited ourselves to only four such “confirmatory” (i.e., not exploratory) tests of statistical significance. These were used to assess whether there was an effect on student outcomes for math problem-solving and for science, and the amount of time teachers spent on “active learning” practices in math and in science. (Technically, IES considered this two sets of two, since two were the primary student outcomes and two were the intermediate teacher outcomes.) The threshold for significance was made more stringent to keep the probability of falsely concluding that there was a difference for any of the outcomes at 5% (often expressed as p < .05).

While the logic for limiting the number of confirmatory outcomes is based on technical arguments about adjustments for multiple comparisons, the limit on the amount of exploratory work was based more on resource constraints. Researchers are notorious (and we don’t exempt ourselves) for finding more questions in any study than were originally asked. Curiosity-based exploration can indeed go on forever. In the case of our evaluation of AMSTI, however, there were a number of fundamental policy questions that were not answered either by the confirmatory or by the exploratory questions in our report. More research is needed.

Take the confirmatory finding that the program resulted in the equivalent of 28 days of additional math instruction (or technically an impact of 5% of a standard deviation). This is a testament to the hard work and ingenuity of the AMSTI team and the commitment of the school systems. From a state policy perspective, it gives a green light to continuing the initiative’s organic growth. But since all the schools in the experiment applied to join AMSTI, we don’t know what would happen if AMSTI were adopted as the state curriculum requiring schools with less interest to implement it. Our results do not generalize to that situation. Likewise, if another state with different levels of achievement or resources were to consider adopting it, we would say that our study gives good reason to try it but, to quote Lee Cronbach, a methodologist whose ideas increasingly resonate as we translate research into practice: “…positive results obtained with a new procedure for early education in one community warrant another community trying it. But instead of trusting that those results generalize, the next community needs its own local evaluation” (Cronbach, 1975, p. 125).

The explorations we conducted as part of the AMSTI evaluation did not take the usual form of deeper examinations of interesting or unexpected findings uncovered during the planned evaluation. All the reported explorations were questions posed in the original study plan. They were defined as exploratory either because they were considered of secondary interest, such as the outcome for reading, or because they were not a direct causal result of the randomization, such as the results for subgroups of students defined by different demographic categories. Nevertheless, exploration of such differences is important for understanding how and for whom AMSTI works. The overall effect, averaging across subgroups, may mask differences that are of critical importance for policy

Readers interested in the issue of subgroup differences can refer to Table 6.11. Once differences are found in groups defined in terms of individual student characteristics, our real exploration is just beginning. For example, can the difference be accounted for by other characteristics or combinations of characteristics? Is there something that differentiates the classes or schools that different students attend? Such questions begin to probe additional factors that can potentially be addressed in the program or its implementation. In any case, the report just released is not the “final report.” There is still a lot of work necessary to understand how any program of this sort can continue to be improved.

2012-02-14

Empirical Presents at AERA 2012

We will again be presenting at the annual meeting of the American Educational Research Association (AERA). Join the Empirical Education team in Vancouver, Canada from April 13 – 17, 2012. Our presentations will span two divisions: 1) Measurement and Research Methodology and 2) Research, Evaluation and Assessment in Schools.

Research Topics will include:

Current Studies in Program Evaluation to Improve Student Achievement Outcomes

Evaluating Alabama’s Math, Science and Technology Initiative: Results of a Three-Year, State-Wide Randomized Experiment

Accommodating Data From Quasi–Experimental Design

Quantitative Approaches to the Evaluation of Literacy Programs and Instruction for Elementary and Secondary Students

We look forward to seeing you at our sessions to discuss our research. You can also download our presentation schedule here. As has become tradition, we plan to host yet another of our popular AERA receptions. Details about the reception will follow in the months to come.

2011-11-18

Join Empirical Education at ALAS, AEA, and NSDC

This year, the Association of Latino Administrators & Superintendents (ALAS) will be holding its 8th annual summit on Hispanic Education in San Francisco. Participants will have the opportunity to attend speaker sessions, roundtable discussions, and network with fellow attendees. Denis Newman, CEO of Empirical Education, together with John Sipe, Senior Vice President and National Sales Manager at Houghton Mifflin Harcourt and Jeannetta Mitchell, eight-grade teacher at Presidio Middle school and a participant in the pilot study, will take part in a 30-minute discussion reviewing the study design and experiences gathered around a one-year study of Algebra on the iPad. The session takes place on October 13th at the Salon 8 of the Marriott Marquis in San Francisco from 10:30am to 12:00pm.

Also this year, the American Evaluation Association (AEA) will be hosting its 25th annual conference from November 2–5 in Anaheim, CA. Approximately 2,500 evaluation practitioners, academics, and students from around the globe are expected to gather at the conference. This year’s theme revolves around the challenges of values and valuing in evaluation.

We are excited to be part of AEA again this year and would like to invite you to join us at two presentations. First, Denis Newman will be hosting the roundtable session on Returning to the Causal Explanatory Tradition: Lessons for Increasing the External Validity of Results from Randomized Trials. We examine how the causal explanatory tradition—originating in the writing of Lee Cronbach—can inform the planning, conduct and analysis of randomized trials to increase external validity of findings. Find us in the Balboa A/B room on Friday, November 4th from 10:45am to 11:30am.

Second, Valeriy Lazarev and Denis Newman will present a paper entitled, “From Program Effect to Cost Savings: Valuing the Benefits of Educational Innovation Using Vertically Scaled Test Scores And Instructional Expenditure Data.”

Be sure to stop by on Saturday, November 5th from 9:50am to 11:20am in room Avila A.

Furthermore, Jenna Zacamy, Senior Research Manager at Empirical Education, will be presenting on two topics at the National Staff Development Council (NSDC) annual conference taking place in Anaheim, CA from December 3rd to 7th. Join her on Monday, December 5th at 2:30pm to 4:30pm when she will talk about the impact on student achievement for grades 4 through 8 of the Alabama Math, Science, and Technology Initiative, together with Pamela Finney and Jean Scott from SERVE Center at UNCG.

On Tuesday, December 6th at 10:00am to 12:00pm Jenna will discuss prior and current research on the effectiveness of a large-scale high school literacy reform together with Cathleen Kral from WestEd and William Loyd from Washtenaw Intermediate School District.

2011-10-10

Methods for Local Experimental Evaluation of STEM Initiatives Presented to State Legislators

The National Conference of State Legislatures (NCSL) presented a seminar for education committee chairs January 9-11 in Huntsville, Alabama, home of NASA’s Marshall Space Flight Center. The topic was “Linking Research and Policy to Improve Science, Technology, Engineering and Math Education.” Empirical Education‘s president, Denis Newman, presented the company‘s research on the Alabama Math Science and Technology Initiative, an 80-school randomized experiment being conducted as part of its contract with the Regional Education Laboratory for the Southeast. The presentation also drew on findings from experiments the company has conducted to evaluate STEM initiatives elsewhere in the country to illustrate the importance of local research goals and characteristics in evaluation design. The seminar was part of a series on research funded by the Institute of Education Sciences. (Click here for a copy of the presentation.)

2009-02-01

Empirical at 2008 AERA Conference

Acceptance notifications are in! Empirical Education will have a strong showing at the 2008 American Educational Research Association conference, which will be held in New York on March 24–28, 2008. Empirical staff will be presenting their research and findings in several divisions and special interest groups, including School Evaluation and Program Development, Learning and Instruction, Survey Research in Education, and Measurement and Research Methodology: Quantitative Methods and Statistical Theory. Our six presentations will include a variety of topics:


View pictures of Empirical Education’s reception at The Chocolate Bar in NYC.

2007-12-15
Archive