blog posts and news stories

Study of Alabama STEM Initiative Finds Positive Impacts

On February 21, 2012 the U.S. Department of Education released the final report of an experiment that Empirical Education has been working on for the last six years. The report, titled Evaluation of the Effectiveness of the Alabama Math, Science, and Technology Initiative (AMSTI) is now available on the Institute of Education Sciences website. The Alabama State Department of Education held a press conference to announce the findings, attended by Superintendent of Education Bice, staff of AMSTI, along with educators, students, and co-principal investigator of the study, Denis Newman, CEO of Empirical Education. The press release issued by the Alabama State Department of Education and a WebEx presentation provide more detail on the study’s findings.

AMSTI was developed by the state of Alabama and introduced in 2002 with the goal of improving mathematics and science achievement in the state’s K-12 schools. Empirical Education was primarily responsible for conducting the study—including the design, data collection, analysis, and reporting—under its subcontract with the Regional Education Lab, Southeast (the study was initiated through a research grant to Empirical). Researchers from Academy of Education Development, Abt Associates, and ANALYTICA made important contributions to design, analysis and data collection.

The findings show that after one year, students in the 41 AMSTI schools experienced an impact on mathematics achievement equivalent to 28 days of additional student progress over students receiving conventional mathematics instruction. The study found, after one year, no difference for science achievement. It also found that AMSTI had an impact on teachers’ active learning classroom practices in math and science that, according to the theory of action posited by AMSTI, should have an impact on achievement. Further exploratory analysis found effects for student achievement in both mathematics and science after two years. The study also explored reading achievement, where it found significant differences between the AMSTI and control groups after one year. Exploration of differential effect for student demographic categories found consistent results for gender, socio-economic status, and pretest achievement level for math and science. For reading, however, the breakdown by student ethnicity suggests a differential benefit.

Just about everybody at Empirical worked on this project at one point or another. Besides the three of us (Newman, Jaciw and Zacamy) who are listed among the authors, we want to acknowledge past and current employees whose efforts made the project possible: Jessica Cabalo, Ruthie Chang, Zach Chin, Huan Cung, Dan Ho, Akiko Lipton, Boya Ma, Robin Means, Gloria Miller, Bob Smith, Laurel Sterling, Qingfeng Zhao, Xiaohui Zheng, and Margit Zsolnay.

With solid cooperation of the state’s Department of Education and the AMSTI team, approximately 780 teachers and 30,000 upper-elementary and middle school students in 82 schools from five regions in Alabama participated in the study. The schools were randomized into one of two categories: 1) Those who received AMSTI starting the first year, or 2) Those who received “business as usual” the first year and began participation in AMSTI the second year. With only a one-year delay before the control group entered treatment, the two-year impact was estimated using statistical techniques developed by, and with the assistance of our colleagues at Abt Associates. Academy for Education Development assisted with data collection and analysis of training and program implementation.

Findings of the AMSTI study will also be presented at the Society for Research on Educational Effectiveness (SREE) Spring Conference taking place in Washington D.C. from March 8-10, 2012. Join Denis Newman, Andrew Jaciw, and Boya Ma on Friday March 9, 2012 from 3:00pm-4:30pm, when they will present findings of their study titled, “Locating Differential Effectiveness of a STEM Initiative through Exploration of Moderators.” A symposium on the study, including the major study collaborators, will be presented at the annual conference of the American Educational Research Association (AERA) on April 15, 2012 from 2:15pm-3:45pm at the Marriott Pinnacle ⁄ Pinnacle III in Vancouver, Canada. This session will be chaired by Ludy van Broekhuizen (director of REL-SE) and will include presentations by Steve Ricks (director of AMSTI); Jean Scott (SERVE Center at UNCG); Denis Newman, Andrew Jaciw, Boya Ma, and Jenna Zacamy (Empirical Education); Steve Bell (Abt Associates); and Laura Gould (formerly of AED). Sean Reardon (Stanford) will serve as the discussant. A synopsis of the study will also be included in the Common Guidelines for Education Research and Development.

2012-02-21

Empirical is participating in recently awarded five-year REL contracts

The Institute of Education Sciences (IES) at the U.S. Department of Education recently announced the recipients of five-year contracts for each of the 10 Regional Education Laboratories (RELs). We are excited to be part of four strong teams of practitioners and researchers that received the awards.

The original request for proposals in May 2011 called for the new RELs to work closely with alliances of state and local education agencies and other practitioner organizations to build local capacity for research. Considering the close ties between this agenda and Empirical’s core mission we joined the proposal efforts and are now part of winning teams in the West (led by WestEd), Northwest (led by Education Northwest), Midwest (led by the American Institutes for Research (AIR)), and Southwest (led by SEDL) The REL Southwest is currently under a stop work order while ED addresses a dispute concerning its review process. Empirical Education’s history in conducting Randomized Control Trials (RCTs) and in providing technical assistance to education agencies provides a strong foundation for the next five years.

2012-02-16

Exploration in the World of Experimental Evaluation

Our 300+ page report makes a good start. But IES, faced with limited time and resources to complete the many experiments being conducted within the Regional Education Lab system, put strict limits on the number of exploratory analyses researchers could conduct. We usually think of exploratory work as questions to follow up on puzzling or unanticipated results. However, in the case of the REL experiments, IES asked researchers to focus on a narrow set of “confirmatory” results and anything else was considered “exploratory,” even if the question was included in the original research design.

The strict IES criteria were based on the principle that when a researcher is using tests of statistical significance, the probability of erroneously concluding that there is an impact when there isn’t one increases with the frequency of the tests. In our evaluation of AMSTI, we limited ourselves to only four such “confirmatory” (i.e., not exploratory) tests of statistical significance. These were used to assess whether there was an effect on student outcomes for math problem-solving and for science, and the amount of time teachers spent on “active learning” practices in math and in science. (Technically, IES considered this two sets of two, since two were the primary student outcomes and two were the intermediate teacher outcomes.) The threshold for significance was made more stringent to keep the probability of falsely concluding that there was a difference for any of the outcomes at 5% (often expressed as p < .05).

While the logic for limiting the number of confirmatory outcomes is based on technical arguments about adjustments for multiple comparisons, the limit on the amount of exploratory work was based more on resource constraints. Researchers are notorious (and we don’t exempt ourselves) for finding more questions in any study than were originally asked. Curiosity-based exploration can indeed go on forever. In the case of our evaluation of AMSTI, however, there were a number of fundamental policy questions that were not answered either by the confirmatory or by the exploratory questions in our report. More research is needed.

Take the confirmatory finding that the program resulted in the equivalent of 28 days of additional math instruction (or technically an impact of 5% of a standard deviation). This is a testament to the hard work and ingenuity of the AMSTI team and the commitment of the school systems. From a state policy perspective, it gives a green light to continuing the initiative’s organic growth. But since all the schools in the experiment applied to join AMSTI, we don’t know what would happen if AMSTI were adopted as the state curriculum requiring schools with less interest to implement it. Our results do not generalize to that situation. Likewise, if another state with different levels of achievement or resources were to consider adopting it, we would say that our study gives good reason to try it but, to quote Lee Cronbach, a methodologist whose ideas increasingly resonate as we translate research into practice: “…positive results obtained with a new procedure for early education in one community warrant another community trying it. But instead of trusting that those results generalize, the next community needs its own local evaluation” (Cronbach, 1975, p. 125).

The explorations we conducted as part of the AMSTI evaluation did not take the usual form of deeper examinations of interesting or unexpected findings uncovered during the planned evaluation. All the reported explorations were questions posed in the original study plan. They were defined as exploratory either because they were considered of secondary interest, such as the outcome for reading, or because they were not a direct causal result of the randomization, such as the results for subgroups of students defined by different demographic categories. Nevertheless, exploration of such differences is important for understanding how and for whom AMSTI works. The overall effect, averaging across subgroups, may mask differences that are of critical importance for policy

Readers interested in the issue of subgroup differences can refer to Table 6.11. Once differences are found in groups defined in terms of individual student characteristics, our real exploration is just beginning. For example, can the difference be accounted for by other characteristics or combinations of characteristics? Is there something that differentiates the classes or schools that different students attend? Such questions begin to probe additional factors that can potentially be addressed in the program or its implementation. In any case, the report just released is not the “final report.” There is still a lot of work necessary to understand how any program of this sort can continue to be improved.

2012-02-14

Final Report Released on The Efficacy of PCI’s Reading Program

Empirical has released the final report of a three-year longitudinal study on the efficacy of the PCI Reading Program, which can be found on our reports page. This study, the first formal assessment of the PCI Reading Program, evaluated the program among a sample of third- through eighth-grade students with supported-level disabilities in Florida’s Brevard Public Schools and Miami-Dade County Public Schools. The primary goal of the study was to identify whether the program could achieve its intended purpose of teaching specific sight words. The study was completed in three “phases,” or school years. The results from Phase 1 and 2 showed a significant positive effect on student sight word achievement and Phase 2 supported the initial expectation that two years of growth would be greater than one year (read more on results of Phase 1 and Phase 2).

“Working with Empirical Education was a win for us on many fronts. Their research was of the highest quality and has really helped us communicate with our customers through their several reports and conference presentations. They went beyond just outcomes to show how teachers put our reading program to use in classrooms. In all their dealings with PCI and with the school systems they were highly professional and we look forward to future research partnership opportunities.” - Lee Wilson, President & CEO, PCI Educational Publishing

In Phase 3, the remaining sample of students was too small to conduct any impact analyses, so researchers investigated patterns in students’ progress through the program. The general findings were positive in that the exploration confirmed that students continue to learn more sight words with a second year of exposure to PCI although at a slower pace than expected by the developers. Furthermore, findings across all three phases show high levels of teacher satisfaction with the program. Along with this positive outcome, teacher-reported student engagement levels were also high.

2011-12-09

Research Guidelines Re-released to Broader Audience

The updated guidelines for evaluation research were unveiled at the SIIA Ed Tech Business Forum, held in New York City on November 28 - 29. Authored by Empirical’s CEO, Denis Newman, and issued by the Software and Information Industry Association (SIIA), the guidelines seek to provide a standard of best practices for conducting and reporting evaluation studies for educational technologies in order to enhance the quality, credibility, and utility to education decision makers.

Denis introduced the guidelines during the “Meet the authors of SIIA Publications” session on November 29. Non-members will be able to purchase the guidelines from Selling to Schools starting Thursday, December 1, 2011 (with continued free access to SIIA members). UPDATE: Denis was interviewed by Glen McCandless of Selling to Schools on December 15, 2011 to discuss key aspects of the guidelines. Listen to the full interview here.

2011-12-05

District Data Study: Empirical’s Newest Research Product

Empirical Education introduces its newest offer: District Data StudyTM. Aimed at providing evidence of effectiveness, District Data Study assists vendors in conducting quantitative case studies using historical data from schools and districts currently engaged in a specific educational program.

There are two basic questions that can be cost-effectively answered given the available data.

  1. Are the outcomes (behavioral or academic) for students in schools that use the program better than outcomes of comparable students in schools not (or before) using the program?

  2. Is the amount of program usage associated with differences in outcomes?

The data studies result in concise reports on measurable academic and behavioral outcomes using appropriate statistical analyses of customer data from implementation of the educational product or program. District Data Study is built on efficient procedures and engineering infrastructure that can be applied to individual districts already piloting a program or veteran clients with longstanding implementation.

2011-11-20

Empirical Presents at AERA 2012

We will again be presenting at the annual meeting of the American Educational Research Association (AERA). Join the Empirical Education team in Vancouver, Canada from April 13 – 17, 2012. Our presentations will span two divisions: 1) Measurement and Research Methodology and 2) Research, Evaluation and Assessment in Schools.

Research Topics will include:

  • Current Studies in Program Evaluation to Improve Student Achievement Outcomes

  • Evaluating Alabama’s Math, Science and Technology Initiative: Results of a Three-Year, State-Wide Randomized Experiment

  • Accommodating Data From Quasi–Experimental Design

  • Quantitative Approaches to the Evaluation of Literacy Programs and Instruction for Elementary and Secondary Students

We look forward to seeing you at our sessions to discuss our research. You can also download our presentation schedule here. As has become tradition, we plan to host yet another of our popular AERA receptions. Details about the reception will follow in the months to come.

2011-11-18

Need for Product Evaluations Continues to Grow

There is a growing need for evidence of the effectiveness of products and services being sold to schools. A new release of SIIA’s product evaluation guidelines is now available at the Selling to Schools website (with continued free access to SIIA members), to help guide publishers in measuring the effectiveness of the tools they are selling to schools.

It’s been almost a decade since NCLB made its call for “scientifically-based research,” but the calls for research haven’t faded away. This is because resources available to schools have diminished over that time, heightening the importance of cost benefit trade-offs in spending.

NCLB has focused attention on test score achievement, and this metric is becoming more pervasive; e.g., through a tie to teacher evaluation and through linkages to dropout risk. While NCLB fostered a compliance mentality—product specs had to have a check mark next to SBR—the need to assure that funds are not wasted is now leading to a greater interest in research results. Decision-makers are now very interested in whether specific products will be effective, or how well they have been working, in their districts.

Fortunately, the data available for evaluations of all kinds is getting better and easier to access. The U.S. Department of Education has poured hundreds of millions of dollars into state data systems. These investments make data available to states and drive the cleaning and standardizing of data from districts. At the same time, districts continue to invest in data systems and warehouses. While still not a trivial task, the ability of school district researchers to get the data needed to determine if an investment paid off—in terms of increased student achievement or attendance—has become much easier over the last decade.

The reauthorization of ESEA (i.e., NCLB) is maintaining the pressure to evaluate education products. We are still a long way from the draft reauthorization introduced in Congress becoming a law, but the initial indications are quite favorable to the continued production of product effectiveness evidence. The language has changed somewhat. Look for the phrase “evidence based”. Along with the term “scientifically-valid”, this new language is actually more sophisticated and potentially more effective than the old SBR neologism. Bob Slavin, one of the reviewers of the SIIA guidelines, says in his Ed Week blog that “This is not the squishy ‘based on scientifically-based evidence’ of NCLB. This is the real McCoy.” It is notable that the definition of “evidence-based” goes beyond just setting rules for the design of research, such as the SBR focus on the single dimension of “internal validity” for which randomization gets the top rating. It now asks how generalizable the research is or its “external validity”; i.e., does it have any relevance for decision-makers?

One of the important goals of the SIIA guidelines for product effectiveness research is to improve the credibility of publisher-sponsored research. It is important that educators see it as more than just “market research” producing biased results. In this era of reduced budgets, schools need to have tangible evidence of the value of products they buy. By following the SIIA’s guidelines, publishers will find it easier to achieve that credibility.

2011-11-12

Empirical's Chief Scientist co-authored a recently released NCEE Reference Report

Together with researchers from Abt Associates, Andrew Jaciw, Chief Scientist of Empirical Education, co–authored a recently released report entitled, “Estimating the Impacts of Educational Interventions Using State Tests or Study-Administered Tests”. The full report released by the The National Center for Education Evaluation and Regional Assistance (NCEE) can be found on the Institute of Education Sciences (IES) website.The NCEE Reference Report examines and identifies factors that could affect the precision of program evaluations when they are based on state assessments instead of study-administered tests. The authors found that using the same test for both the pre- and post-test yielded more precise impact estimates; using two pre-test covariates, one from each type of test (state assessment and study- administered standardized test), yielded more precise impact estimates; using as the dependent variable the simple average of the post-test scores from the two types of tests yielded more precise impact estimates and smaller sample size requirements than using post-test scores from only one of the two types of tests.

2011-11-02

Expertise Provided for New York Times Front Page Story

Empirical’s CEO, Denis Newman, was one of the experts consulted by New York Times reporter Trip Gabriel in his Sunday Times, front page story, “Inflating the Software Report Card.” Newman’s commentary on the first article in this series can be seen here. The article also refers to the guidelines for evaluation research issued by the Software and Information Industry Association (SIIA), which can be found on the SIIA site. In addition, the report referred to in the article—which was not authored by Newman but a team of company researchers—can be found on our reports and papers page. (Some readers were confused by the misspelling of Newman’s first name as “Dennis”.)

2011-10-11
Archive