blog posts and news stories

Exploration in the World of Experimental Evaluation

Our 300+ page report makes a good start. But IES, faced with limited time and resources to complete the many experiments being conducted within the Regional Education Lab system, put strict limits on the number of exploratory analyses researchers could conduct. We usually think of exploratory work as questions to follow up on puzzling or unanticipated results. However, in the case of the REL experiments, IES asked researchers to focus on a narrow set of “confirmatory” results and anything else was considered “exploratory,” even if the question was included in the original research design.

The strict IES criteria were based on the principle that when a researcher is using tests of statistical significance, the probability of erroneously concluding that there is an impact when there isn’t one increases with the frequency of the tests. In our evaluation of AMSTI, we limited ourselves to only four such “confirmatory” (i.e., not exploratory) tests of statistical significance. These were used to assess whether there was an effect on student outcomes for math problem-solving and for science, and the amount of time teachers spent on “active learning” practices in math and in science. (Technically, IES considered this two sets of two, since two were the primary student outcomes and two were the intermediate teacher outcomes.) The threshold for significance was made more stringent to keep the probability of falsely concluding that there was a difference for any of the outcomes at 5% (often expressed as p < .05).

While the logic for limiting the number of confirmatory outcomes is based on technical arguments about adjustments for multiple comparisons, the limit on the amount of exploratory work was based more on resource constraints. Researchers are notorious (and we don’t exempt ourselves) for finding more questions in any study than were originally asked. Curiosity-based exploration can indeed go on forever. In the case of our evaluation of AMSTI, however, there were a number of fundamental policy questions that were not answered either by the confirmatory or by the exploratory questions in our report. More research is needed.

Take the confirmatory finding that the program resulted in the equivalent of 28 days of additional math instruction (or technically an impact of 5% of a standard deviation). This is a testament to the hard work and ingenuity of the AMSTI team and the commitment of the school systems. From a state policy perspective, it gives a green light to continuing the initiative’s organic growth. But since all the schools in the experiment applied to join AMSTI, we don’t know what would happen if AMSTI were adopted as the state curriculum requiring schools with less interest to implement it. Our results do not generalize to that situation. Likewise, if another state with different levels of achievement or resources were to consider adopting it, we would say that our study gives good reason to try it but, to quote Lee Cronbach, a methodologist whose ideas increasingly resonate as we translate research into practice: “…positive results obtained with a new procedure for early education in one community warrant another community trying it. But instead of trusting that those results generalize, the next community needs its own local evaluation” (Cronbach, 1975, p. 125).

The explorations we conducted as part of the AMSTI evaluation did not take the usual form of deeper examinations of interesting or unexpected findings uncovered during the planned evaluation. All the reported explorations were questions posed in the original study plan. They were defined as exploratory either because they were considered of secondary interest, such as the outcome for reading, or because they were not a direct causal result of the randomization, such as the results for subgroups of students defined by different demographic categories. Nevertheless, exploration of such differences is important for understanding how and for whom AMSTI works. The overall effect, averaging across subgroups, may mask differences that are of critical importance for policy

Readers interested in the issue of subgroup differences can refer to Table 6.11. Once differences are found in groups defined in terms of individual student characteristics, our real exploration is just beginning. For example, can the difference be accounted for by other characteristics or combinations of characteristics? Is there something that differentiates the classes or schools that different students attend? Such questions begin to probe additional factors that can potentially be addressed in the program or its implementation. In any case, the report just released is not the “final report.” There is still a lot of work necessary to understand how any program of this sort can continue to be improved.

2012-02-14

Final Report Released on The Efficacy of PCI’s Reading Program

Empirical has released the final report of a three-year longitudinal study on the efficacy of the PCI Reading Program, which can be found on our reports page. This study, the first formal assessment of the PCI Reading Program, evaluated the program among a sample of third- through eighth-grade students with supported-level disabilities in Florida’s Brevard Public Schools and Miami-Dade County Public Schools. The primary goal of the study was to identify whether the program could achieve its intended purpose of teaching specific sight words. The study was completed in three “phases,” or school years. The results from Phase 1 and 2 showed a significant positive effect on student sight word achievement and Phase 2 supported the initial expectation that two years of growth would be greater than one year (read more on results of Phase 1 and Phase 2).

“Working with Empirical Education was a win for us on many fronts. Their research was of the highest quality and has really helped us communicate with our customers through their several reports and conference presentations. They went beyond just outcomes to show how teachers put our reading program to use in classrooms. In all their dealings with PCI and with the school systems they were highly professional and we look forward to future research partnership opportunities.” - Lee Wilson, President & CEO, PCI Educational Publishing

In Phase 3, the remaining sample of students was too small to conduct any impact analyses, so researchers investigated patterns in students’ progress through the program. The general findings were positive in that the exploration confirmed that students continue to learn more sight words with a second year of exposure to PCI although at a slower pace than expected by the developers. Furthermore, findings across all three phases show high levels of teacher satisfaction with the program. Along with this positive outcome, teacher-reported student engagement levels were also high.

2011-12-09

Research Guidelines Re-released to Broader Audience

The updated guidelines for evaluation research were unveiled at the SIIA Ed Tech Business Forum, held in New York City on November 28 - 29. Authored by Empirical’s CEO, Denis Newman, and issued by the Software and Information Industry Association (SIIA), the guidelines seek to provide a standard of best practices for conducting and reporting evaluation studies for educational technologies in order to enhance the quality, credibility, and utility to education decision makers.

Denis introduced the guidelines during the “Meet the authors of SIIA Publications” session on November 29. Non-members will be able to purchase the guidelines from Selling to Schools starting Thursday, December 1, 2011 (with continued free access to SIIA members). UPDATE: Denis was interviewed by Glen McCandless of Selling to Schools on December 15, 2011 to discuss key aspects of the guidelines. Listen to the full interview here.

2011-12-05

District Data Study: Empirical’s Newest Research Product

Empirical Education introduces its newest offer: District Data StudyTM. Aimed at providing evidence of effectiveness, District Data Study assists vendors in conducting quantitative case studies using historical data from schools and districts currently engaged in a specific educational program.

There are two basic questions that can be cost-effectively answered given the available data.

  1. Are the outcomes (behavioral or academic) for students in schools that use the program better than outcomes of comparable students in schools not (or before) using the program?

  2. Is the amount of program usage associated with differences in outcomes?

The data studies result in concise reports on measurable academic and behavioral outcomes using appropriate statistical analyses of customer data from implementation of the educational product or program. District Data Study is built on efficient procedures and engineering infrastructure that can be applied to individual districts already piloting a program or veteran clients with longstanding implementation.

2011-11-20

Empirical Presents at AERA 2012

We will again be presenting at the annual meeting of the American Educational Research Association (AERA). Join the Empirical Education team in Vancouver, Canada from April 13 – 17, 2012. Our presentations will span two divisions: 1) Measurement and Research Methodology and 2) Research, Evaluation and Assessment in Schools.

Research Topics will include:

  • Current Studies in Program Evaluation to Improve Student Achievement Outcomes

  • Evaluating Alabama’s Math, Science and Technology Initiative: Results of a Three-Year, State-Wide Randomized Experiment

  • Accommodating Data From Quasi–Experimental Design

  • Quantitative Approaches to the Evaluation of Literacy Programs and Instruction for Elementary and Secondary Students

We look forward to seeing you at our sessions to discuss our research. You can also download our presentation schedule here. As has become tradition, we plan to host yet another of our popular AERA receptions. Details about the reception will follow in the months to come.

2011-11-18

Need for Product Evaluations Continues to Grow

There is a growing need for evidence of the effectiveness of products and services being sold to schools. A new release of SIIA’s product evaluation guidelines is now available at the Selling to Schools website (with continued free access to SIIA members), to help guide publishers in measuring the effectiveness of the tools they are selling to schools.

It’s been almost a decade since NCLB made its call for “scientifically-based research,” but the calls for research haven’t faded away. This is because resources available to schools have diminished over that time, heightening the importance of cost benefit trade-offs in spending.

NCLB has focused attention on test score achievement, and this metric is becoming more pervasive; e.g., through a tie to teacher evaluation and through linkages to dropout risk. While NCLB fostered a compliance mentality—product specs had to have a check mark next to SBR—the need to assure that funds are not wasted is now leading to a greater interest in research results. Decision-makers are now very interested in whether specific products will be effective, or how well they have been working, in their districts.

Fortunately, the data available for evaluations of all kinds is getting better and easier to access. The U.S. Department of Education has poured hundreds of millions of dollars into state data systems. These investments make data available to states and drive the cleaning and standardizing of data from districts. At the same time, districts continue to invest in data systems and warehouses. While still not a trivial task, the ability of school district researchers to get the data needed to determine if an investment paid off—in terms of increased student achievement or attendance—has become much easier over the last decade.

The reauthorization of ESEA (i.e., NCLB) is maintaining the pressure to evaluate education products. We are still a long way from the draft reauthorization introduced in Congress becoming a law, but the initial indications are quite favorable to the continued production of product effectiveness evidence. The language has changed somewhat. Look for the phrase “evidence based”. Along with the term “scientifically-valid”, this new language is actually more sophisticated and potentially more effective than the old SBR neologism. Bob Slavin, one of the reviewers of the SIIA guidelines, says in his Ed Week blog that “This is not the squishy ‘based on scientifically-based evidence’ of NCLB. This is the real McCoy.” It is notable that the definition of “evidence-based” goes beyond just setting rules for the design of research, such as the SBR focus on the single dimension of “internal validity” for which randomization gets the top rating. It now asks how generalizable the research is or its “external validity”; i.e., does it have any relevance for decision-makers?

One of the important goals of the SIIA guidelines for product effectiveness research is to improve the credibility of publisher-sponsored research. It is important that educators see it as more than just “market research” producing biased results. In this era of reduced budgets, schools need to have tangible evidence of the value of products they buy. By following the SIIA’s guidelines, publishers will find it easier to achieve that credibility.

2011-11-12

Empirical's Chief Scientist co-authored a recently released NCEE Reference Report

Together with researchers from Abt Associates, Andrew Jaciw, Chief Scientist of Empirical Education, co–authored a recently released report entitled, “Estimating the Impacts of Educational Interventions Using State Tests or Study-Administered Tests”. The full report released by the The National Center for Education Evaluation and Regional Assistance (NCEE) can be found on the Institute of Education Sciences (IES) website.The NCEE Reference Report examines and identifies factors that could affect the precision of program evaluations when they are based on state assessments instead of study-administered tests. The authors found that using the same test for both the pre- and post-test yielded more precise impact estimates; using two pre-test covariates, one from each type of test (state assessment and study- administered standardized test), yielded more precise impact estimates; using as the dependent variable the simple average of the post-test scores from the two types of tests yielded more precise impact estimates and smaller sample size requirements than using post-test scores from only one of the two types of tests.

2011-11-02

Expertise Provided for New York Times Front Page Story

Empirical’s CEO, Denis Newman, was one of the experts consulted by New York Times reporter Trip Gabriel in his Sunday Times, front page story, “Inflating the Software Report Card.” Newman’s commentary on the first article in this series can be seen here. The article also refers to the guidelines for evaluation research issued by the Software and Information Industry Association (SIIA), which can be found on the SIIA site. In addition, the report referred to in the article—which was not authored by Newman but a team of company researchers—can be found on our reports and papers page. (Some readers were confused by the misspelling of Newman’s first name as “Dennis”.)

2011-10-11

Join Empirical Education at ALAS, AEA, and NSDC

This year, the Association of Latino Administrators & Superintendents (ALAS) will be holding its 8th annual summit on Hispanic Education in San Francisco. Participants will have the opportunity to attend speaker sessions, roundtable discussions, and network with fellow attendees. Denis Newman, CEO of Empirical Education, together with John Sipe, Senior Vice President and National Sales Manager at Houghton Mifflin Harcourt and Jeannetta Mitchell, eight-grade teacher at Presidio Middle school and a participant in the pilot study, will take part in a 30-minute discussion reviewing the study design and experiences gathered around a one-year study of Algebra on the iPad. The session takes place on October 13th at the Salon 8 of the Marriott Marquis in San Francisco from 10:30am to 12:00pm.

Also this year, the American Evaluation Association (AEA) will be hosting its 25th annual conference from November 2–5 in Anaheim, CA. Approximately 2,500 evaluation practitioners, academics, and students from around the globe are expected to gather at the conference. This year’s theme revolves around the challenges of values and valuing in evaluation.

We are excited to be part of AEA again this year and would like to invite you to join us at two presentations. First, Denis Newman will be hosting the roundtable session on Returning to the Causal Explanatory Tradition: Lessons for Increasing the External Validity of Results from Randomized Trials. We examine how the causal explanatory tradition—originating in the writing of Lee Cronbach—can inform the planning, conduct and analysis of randomized trials to increase external validity of findings. Find us in the Balboa A/B room on Friday, November 4th from 10:45am to 11:30am.

Second, Valeriy Lazarev and Denis Newman will present a paper entitled, “From Program Effect to Cost Savings: Valuing the Benefits of Educational Innovation Using Vertically Scaled Test Scores And Instructional Expenditure Data.” Be sure to stop by on Saturday, November 5th from 9:50am to 11:20am in room Avila A.

Furthermore, Jenna Zacamy, Senior Research Manager at Empirical Education, will be presenting on two topics at the National Staff Development Council (NSDC) annual conference taking place in Anaheim, CA from December 3rd to 7th. Join her on Monday, December 5th at 2:30pm to 4:30pm when she will talk about the impact on student achievement for grades 4 through 8 of the Alabama Math, Science, and Technology Initiative, together with Pamela Finney and Jean Scott from SERVE Center at UNCG.

On Tuesday, December 6th at 10:00am to 12:00pm Jenna will discuss prior and current research on the effectiveness of a large-scale high school literacy reform together with Cathleen Kral from WestEd and William Loyd from Washtenaw Intermediate School District.

2011-10-10

New Reports Show Positive Results for Elementary Reading Program

Two studies of the Treasures reading program from McGraw-Hill are now posted on our reports page. Treasures is a basal reading program for students in grades K–6. Although the first study was a multi-site study while the second was conducted in the Osceola school district, both found positive impacts on reading achievement in grades 3–5.

The primary data for the first study were scores supplied with district permission by Northwest Evaluation Association from their MAP reading test. The study uses a quasi-experimental comparison group design based on 35 Treasures and 48 comparison schools primarily in the midwest. The study found that Treasures had a positive impact on overall elementary student reading scores, the strongest effect being observed for grade 5.

The second study’s data were provided by the Osceola school district and consist of demographic information, FCAT test scores, and information on student transfers during the year (between schools within the districts and from other districts). The dataset for this time series design covered five consecutive school years from 2005–06 to 2009–10, including two years prior to introduction of the intervention and three years after the introduction. The study included exploration of moderators that demonstrated a stronger positive effect for students with disabilities and English learners than the rest of the student population. We also found a stronger positive impact on girls than on boys.

Check back for results from follow-up studies, which are currently underway in other states and districts.

2011-09-21
Archive