blog posts and news stories

Conference Season 2019

Are you staying warm this winter? Can’t wait for the spring? Us either, with spring conference season right around the corner! Find our Empirical team traveling bicoastally in these upcoming months.

We’re starting the season right in our backyard at the Bay Area Learning Analytics (BayLAN) Conference at Stanford University on March 2, 2019! CEO Denis Newman will be presenting on a panel on the importance of efficacy with Jeremy Roschelle of Digital Promise. Senior Research Scientist Valeriy Lazarev will also be attending the conference.

The next day, the team will be off to SXSW EDU in Austin, Texas! Our goal is to talk to people about the new venture, Evidentally.

Then we’re headed to Washington D.C. to attend the annual Society for Research on Educational Effectiveness (SREE) Conference! Andrew Jaciw will be presenting “A Study of the Impact of the CREATE Residency Program on Teacher Socio-Emotional and Self-Regulatory Outcomes”. We will be presenting on Friday March 8, 2:30 PM - 4:00 PM during the “Social and Emotional Learning in Education Settings” sessions in Ballroom 1. Denis will also be attending and with Andrew, meeting with many research colleagues. If you can’t catch us in D.C., you can find Andrew back in the Bay Area at the sixth annual Carnegie Foundation Summit.

For the last leg of spring conferences, we’ll be back at the American Educational Research Association’s Annual (AERA) Meeting in Toronto, Canada from April 6th to 9th. There you’ll be able to hear more about the CREATE Teacher Residency Research Study presented by Andrew Jaciw, joined by Vice President of Research Operations Jenna Zacamy along with our new Research Manager, Audra Wingard. And for the first time in 10 years, you won’t be finding Denis at AERA… Instead he’ll be at the ASU GSV Summit in San Diego, California!

2019-02-12

How Efficacy Studies Can Help Decision-makers Decide if a Product is Likely to Work in Their Schools

We and our colleagues have been working on translating the results of rigorous studies of the impact of educational products, programs, and policies for people in school districts who are making the decisions whether to purchase or even just try out—pilot—the product. We are influenced by Stanford University Methodologist Lee Cronbach, especially his seminal book (1982) and article (1975) where he concludes “When we give proper weight to local conditions, any generalization is a working hypothesis, not a conclusion…positive results obtained with a new procedure for early education in one community warrant another community trying it. But instead of trusting that those results generalize, the next community needs its own local evaluation” (p. 125). In other words, we consider even the best designed experiment to be like a case study, as much about the local and moderating role of context, as about the treatment when interpreting the causal effect of the program.

Following the focus on context, we can consider characteristics of the people and of the institution where the experiment was conducted to be co-causes of the result that deserve full attention—even though, technically, only the treatment, which was randomly assigned was controlled. Here we argue that any generalization from a rigorous study, where the question is whether the product is likely to be worth trying in a new district, must consider the full context of the study.

Technically, in the language of evaluation research, these differences in who or where the product or “treatment” works are called “interaction effects” between the treatment and the characteristic of interest (e.g., subgroups of students by demographic category or achievement level, teachers with different skills, or bandwidth available in the building). The characteristic of interest can be called a “moderator”, since it changes, or moderates, the impact of the treatment. An interaction reveals if there is differential impact and whether a group with a particular characteristic is advantaged, disadvantaged, or unaffected by the product.

The rules set out by The Department of Education’s What Works Clearinghouse (WWC) focus on the validity of the experimental conclusion: Did the program work on average compared to a control group? Whether it works better for poor kids than for middle class kids, works better for uncertified teachers versus veteran teachers, increases or closes a gap between English learners and those who are proficient, are not part of the information provided in their reviews. But these differences are exactly what buyers need in order to understand whether the product is a good candidate for a population like theirs. If a program works substantially better for English proficient students than for English learners, and the purchasing school has largely the latter type of student, it is important that the school administrator know the context for the research and the result.

The accuracy of an experimental finding depends on it not being moderated by conditions. This is recognized with recent methods of generalization (Tipton, 2013) that essentially apply non-experimental adjustments to experimental results to make them more accurate and more relevant to specific local contexts.

Work by Jaciw (2016a, 2016b) takes this one step further.

First, he confirms the result that if the impact of the program is moderated, and if moderators are distributed differently between sites, then an experimental result from one site will yield a biased inference for another site. This would be the case, for example, if the impact of a program depends on individual socioeconomic status, and there is a difference between the study and inference sites in the proportion of individuals with low socioeconomic status. Conditions for this “external validity bias” are well understood, but the consequences are addressed much less often than the usual selection bias. Experiments can yield accurate results about the efficacy of a program for the sample studied, but that average may not apply either to a subgroup within the sample or to a population outside the study.

Second, he uses results from a multisite trial to show empirically that there is potential for significant bias when inferring experimental results from one subset of sites to other inference sites within the study; however, moderators can account for much of the variation in impact across sites. Average impact findings from experiments provide a summary of whether a program works, but leaves the consumer guessing about the boundary conditions for that effect—the limits beyond which the average effect ceases to apply. Cronbach was highly aware of this, titling a chapter in his 1982 book “The Limited Reach of Internal Validity”. Using terms like “unbiased” to describe impact findings from experiments is correct in a technical sense (i.e., the point estimate, on hypothetical repeated sampling, is centered on the true average effect for the sample studied), but it can impart an incorrect sense of the external validity of the result: that it applies beyond the instance of the study.

Implications of the work cited, are, first, that it is possible to unpack marginal impact estimates through subgroup and moderator analyses to arrive at more-accurate inferences for individuals. Second, that we should do so—why obscure differences by paying attention to only the grand mean impact estimate for the sample? And third, that we should be planful in deciding which subgroups to assess impacts for in the context of individual experiments.

Local decision-makers’ primary concern should be with whether a program will work with their specific population, and to ask for causal evidence that considers local conditions through the moderating role of student, teacher, and school attributes. Looking at finer differences in impact may elicit criticism that it introduces another type of uncertainty—specifically from random sampling error—which may be minimal with gross impacts and large samples, but influential when looking at differences in impact with more and smaller samples. This is a fair criticism, but differential effects may be less susceptible to random perturbations (low power) than assumed, especially if subgroups are identified at individual levels in the context of cluster randomized trials (e.g., individual student-level SES, as opposed to school average SES) (Bloom, 2005; Jaciw, Lin, & Ma, 2016).

References:
Bloom, H. S. (2005). Randomizing groups to evaluate place-based programs. In H. S. Bloom (Ed.), Learning more from social experiments. New York: Russell Sage Foundation.

Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology. American Psychologist, 116-127.

Cronbach, L. J. (1982). Designing evaluations of educational and social programs. San Francisco, CA: Jossey-Bass.

Jaciw, A. P. (2016). Applications of a within-study comparison approach for evaluating bias in generalized causal inferences from comparison group studies. Evaluation Review, (40)3, 241-276. Retrieved from http://erx.sagepub.com/content/40/3/241.abstract

Jaciw, A. P. (2016). Assessing the accuracy of generalized inferences from comparison group studies using a within-study comparison approach: The methodology. Evaluation Review, (40)3, 199-240. Retrieved from http://erx.sagepub.com/content/40/3/199.abstract

Jaciw, A., Lin, L., & Ma, B. (2016). An empirical study of design parameters for assessing differential impacts for students in group randomized trials. Evaluation Review. Retrieved from http://erx.sagepub.com/content/early/2016/10/14/0193841X16659600.abstract

Tipton, E. (2013). Improving generalizations from experiments using propensity score subclassification: Assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics, 38, 239-266.

2018-01-16

Spring 2018 Conference Season is Taking Shape


We’ll be on the road again this spring.

SREE

Andrew Jaciw and Denis Newman will be in Washington DC for the annual spring conference of the The Society for Research on Educational Effectiveness (SREE), the premier conference on rigorous research. Andrew Jaciw will present his paper: Leveraging Fidelity Data to Making Sense of Impact Results: Informing Practice through Research. His presentation will be a part of Session 2I: Research Methods - Post-Random Assignment Models: Fidelity, Attrition, Mediation & More from 8-10am on Thursday, March 1.

SXSW EDU

In March, Denis Newman will be attending SXSW EDU Conference & Festival in Austin, TX and presenting on a panel along with Malvika Bhagwat, Jason Palmer, and Karen Billings titled Can Evidence Even Keep Up with EdTech? This will address how researchers and companies can produce evidence that products work—in time for educators and administrators to make a knowledgeable buying decision under accelerating timelines.

AERA

Empirical staff will be presenting in 4 different sessions at the annual conference of the American Educational Research Association (AERA) in NYC in April, all under Division H (Research, Evaluation, and Assessment in Schools).

  1. For Quasi-experiments on Edtech Products, What Counts as Being Treated?
  2. Teacher evaluation rubric properties and associations with school characteristics: Evidence from the Texas evaluation system
  3. Indicators of Successful Teacher Recruitment and Retention in Oklahoma Rural Schools
  4. The Challenges and Successes of Conducting Large-scale Educational Research

In addition to these presentations, we are planning another of our celebrated receptions in NYC so stay tuned for details.

ISTE

A panel on our Research Guidelines has been accepted at this major convention, considered the epicenter of edtech with thousands of users and 100s of companies, held this year in Chicago from June 24–27.

2017-12-18

ETIN Releases Guidelines for Research on Educational Technologies in K-12 Schools

The press release (below) was originally published on the SIIA website. It has since inspired stories in the Huffington Post, edscoop, EdWeek’s Market Brief, and the EdSurge newsletter.



ETIN Releases Guidelines for Research on Educational Technologies in K-12 Schools

Changes in education technology and policy spur updated approach to industry research

Washington, DC (July 25, 2017)The Education Technology Industry Network, a division of The Software & Information Industry Association, released an important new report today: “Guidelines for Conducting and Reporting EdTech Impact Research in U.S. K-12 Schools.” Authored by Dr. Denis Newman and the research team at Empirical Education Inc., the Guidelines provide 16 best practice standards of research for publishers and developers of educational technologies.

The Guidelines are a response to the changing research methods and policies driven by the accelerating pace of development and passage of the Every Student Succeeds Act (ESSA), which has challenged the static notion of evidence defined in NCLB. Recognizing the need for consensus among edtech providers, customers in the K-12 school market, and policy makers at all levels, SIIA is making these Guidelines freely available.

“SIIA members recognize that changes in technology and policy have made evidence of impact an increasingly critical differentiator in the marketplace,” said Bridget Foster, senior VP and managing director of ETIN. “The Guidelines show how research can be conducted and reported within a short timeframe and still contribute to continuous product improvement.”

“The Guidelines for research on edtech products is consistent with our approach to efficacy: that evidence of impact can lead to product improvement,” said Amar Kumar, senior vice president of Efficacy & Research at Pearson. “We appreciate ETIN’s leadership and Empirical Education’s efforts in putting together this clear presentation of how to use rigorous and relevant research to drive growth in the market.”

The Guidelines draw on over a decade of experience in conducting research in the context of the U.S. Department of Education’s Institute of Education Sciences, and its Investing in Innovation program.

“The current technology and policy environment provides an opportunity to transform how research is done,” said Dr. Newman, CEO of Empirical Education Inc. and lead author of the Guidelines. “Our goal in developing the new guidelines was to clarify current requirements in a way that will help edtech companies provide school districts with the evidence they need to consistently quantify the value of software tools. My thanks go to SIIA and the highly esteemed panel of reviewers whose contribution helped us provide the roadmap for the change that is needed.”

“In light of the ESSA evidence standards and the larger movement toward evidence-based reform, publishers and software developers are increasingly being called upon to show evidence that their products make a difference with children,” said Guidelines peer reviewer Dr. Robert Slavin, director of the Center for Research and Reform in Education, Johns Hopkins University. “The ETIN Guidelines provide practical, sensible guidance to those who are ready to meet these demands.”

ETIN’s goal is to improve the market for edtech products by advocating for greater transparency in reporting research findings. For that reason, it is actively working with government, policy organizations, foundations, and universities to gain the needed consensus for change.

“As digital instructional materials flood the market place, state and local leaders need access to evidence-based research regarding the effectiveness of products and services. This guide is a great step in supporting both the public and private sector to help ensure students and teachers have access to the most effective resources for learning,” stated Christine Fox, Deputy Executive Director, SETDA. The Guidelines can be downloaded here: https://www.empiricaleducation.com/research-guidelines.

2017-07-25

SIIA ETIN EIS Conference Presentations 2017


We are playing a major role in the Education Impact Symposium (EIS), organized by the Education Technology Industry Network (ETIN), a division of The Software & Information Industry Association (SIIA).

  1. ETIN is releasing a set of edtech research guidelines that CEO Denis Newman wrote this year
  2. Denis is speaking on 2 panels this year

The edtech research guidelines that Denis authored and ETIN is releasing on Tuesday, July 25 are called “Guidelines for Conducting and Reporting EdTech Impact Research in U.S. K-12 Schools” and can be downloaded from this webpage. The Guidelines are a much-needed response to a rapidly-changing environment of cloud-based technology and important policy changes brought about by the Every Student Succeeds Act (ESSA).

The panels Denis will be presenting on are both on Tuesday, July 25, 2017.

12:30 - 1:15pm
ETIN’s New Guidelines for Product Research in the ESSA Era
With the recent release of ETIN’s updated Guidelines for EdTech Impact Research, developers and publishers can ride the wave of change from NCLB’s sluggish concept of “scientifically-based” to ESSA’s dynamic view of “evidence” for continuous improvement. The Guidelines are being made publicly available at the Symposium, with a discussion and Q&A led by the lead author and some of the contributing reviewers.
Moderator:
Myron Cizdyn, Chief Executive Officer, The BLPS Group
Panelists:
Malvika Bhagwat, Research & Efficacy, Newsela
Amar Kumar, Sr. Vice President, Pearson
Denis Newman, CEO, Empirical Education Inc.
John Richards, President, Consulting Services for Education

2:30 - 3:30pm
The Many Faces of Impact
Key stakeholders in the EdTech Community will each review in Ted Talk style, what they are doing to increase impact of digital products, programs and services. Our line-up of presenters include:
- K-12 and HE content providers using impact data to better understand their customers improve their products, and support their marketing and sales teams
- an investor seeking impact on both disadvantaged populations and their financial return in order to make funding decisions for portfolio companies
- an education organization helping institutions decide what research is useful to them and how to grapple with new ESSA requirements
- a researcher working with product developers to produce evidence of the impact of their digital products

After the set of presenters have finished, we’ll have time for your questions on these multidimensional aspects of IMPACT and how technology can help.
Moderator:
Karen Billings, Principal, BillingsConnects
Panelists:
Jennifer Carolan, General Partner, Reach Capital
Christopher Cummings, VP, Institutional Product and Solution Design, Cengage
Melissa Greene, Director, Strategic Partnerships, SETDA
Denis Newman, CEO, Empirical Education Inc.
Kari Stubbs, PhD, Vice President, Learning & Innovation, BrainPOP

Jennifer Carolan, Denis Newman, and Chris Cummings on a panel at ETIN EIS

If you get a chance to check out the Guidelines before EIS, Denis would love to hear your thoughts about them at the conference.

2017-07-21

Report of the Evaluation of iRAISE Released

Empirical Education Inc. has completed its evaluation (read the report here) of an online professional development program for Reading Apprenticeship. WestEd’s Strategic Literacy Initiative (SLI) was awarded a development grant under the Investing in Innovation (i3) program in 2012. iRAISE (internet-based Reading Apprenticeship Improving Science Education) is an online professional development program for high school science teachers. iRAISE trained more than 100 teachers in Michigan and Pennsylvania over the three years of the grant. Empirical’s randomized control trial measured the impact of the program on students with special attention to differences in their incoming reading achievement levels.

The goal of iRAISE was to improve student achievement by training teachers in the use of Reading Apprenticeship, an instructional framework that describes the classroom in four interacting dimensions of learning: social, personal, cognitive, and knowledge-building. The inquiry-based professional development (PD) model included a week-long Foundations training in the summer; monthly synchronous group sessions and smaller personal learning communities; and asynchronous discussion groups designed to change teachers’ understanding of their role in adolescent literacy development and to build capacity for literacy instruction in the academic disciplines. iRAISE adapted an earlier face-to-face version of Reading Apprenticeship professional development, which was studied under an earlier i3 grant, Reading Apprenticeship Improving Secondary Education (RAISE), into a completely online course, creating a flexible, accessible platform.

To evaluate iRAISE, Empirical Education conducted an experiment in which 82 teachers across 27 schools were randomly assigned to either receive the iRAISE Professional Development during the 2014-15 school year or continue with business as usual and receive the program one year later. Data collection included monthly teacher surveys that measured their use of several classroom instructional practices and a spring administration of an online literacy assessment, developed by Educational Testing Service, to measure student achievement in literacy. We found significant positive impacts of iRAISE on several of the classroom practice outcomes, including teachers providing explicit instruction on comprehension strategies, their use of metacognitive inquiry strategies, and their levels of confidence in literacy instruction. These results were consistent with the prior RAISE research study and are an important replication of the previous findings, as they substantiate the success of SLI’s development of a more accessible online version of their teacher PD. After a one-year implementation with iRAISE, we do not find an overall effect of the program on student literacy achievement. However, we did find that levels of incoming reading achievement moderate the impact of iRAISE on general reading literacy such that lower scoring students benefit more. The success of iRAISE in adapting immersive, high-quality professional development to an online platform is promising for the field.

You can access the report and research summary from the study using the links below.
iRAISE research report
iRAISE research summary

2016-07-01

Five-year evaluation of Reading Apprenticeship i3 implementation reported at SREE

Empirical Education has released two research reports on the scale-up and impact of Reading Apprenticeship, as implemented under one of the first cohorts of Investing in Innovation (i3) grants. The Reading Apprenticeship Improving Secondary Education (RAISE) project reached approximately 2,800 teachers in five states with a program providing teacher professional development in content literacy in three disciplines: science, history, and English language arts. RAISE supported Empirical Education and our partner, IMPAQ International, in evaluating the innovation through both a randomized control trial encompassing 42 schools and a systematic study of the scale-up of 239 schools. The RCT found significant impact on student achievement in science classes consistent with prior studies. Mean impact across subjects, while positive, did not reach the .05 level of significance. The scale-up study found evidence that the strategy of building cross-disciplinary teacher teams within the school is associated with growth and sustainability of the program. Both sides of the evaluation were presented at the annual conference of the Society for Research on Educational Effectiveness, March 6-8, 2016 in Washington DC. Cheri Fancsali (formerly of IMPAQ, now at Research Alliance for NYC Schools) presented results of the RCT. Denis Newman (Empirical) presented a comparison of RAISE as instantiated in the RCT and scale-up contexts.

You can access the reports and research summaries from the studies using the links below.
RAISE RCT research report
RAISE RCT research summary
RAISE Scale-up research report
RAISE Scale-up research summary

2016-03-09

Conference Season 2015

Empirical researchers are traveling all over the country this conference season. Come meet our researchers as we discuss our work at the following events. If you plan to attend any of these, please get in touch so we can schedule a time to speak with you, or come by to see us at our presentations.

AEFP

We are pleased to announce that we will have our fifth appearance at the 40th annual conference of the Association for Education Finance and Policy (AEFP). Join us in the afternoon on Friday, February 27th at the Marriott Wardman Park, Washington DC as Empirical’s Senior Research Scientist Valeriy Lazarev and CEO Denis Newman present on Methods of Teacher Evaluation in Concurrent Session 7. Denis will also be the acting discussant and chair on Friday morning at 8am in Session 4.07 titled Preparation/Certification and Evaluation of Leaders/Teachers.

SREE

Attendees of this spring’s Society for Research on Effectiveness (SREE) Conference, held in Washington, DC March 5-7, will have the opportunity to discuss instructional strategies and programs to improve mathematics with Empirical Education’s Chief Scientist Andrew P. Jaciw. The presentation, Assessing Impacts of Math in Focus, a ‘Singapore Math’ Program for American Schools, will take place on Friday, March 6 at 1pm in the Park Hyatt Hotel, Ballroom Level Gallery 3.

ASCD

This year’s 70th annual conference for ASCD will take place in Houston, TX on March 21-23. We invite you to schedule a meeting with CEO Denis Newman while he’s there.

AERA

We will again be presenting at the annual meeting of the American Educational Research Association (AERA). Join the Empirical Education team in Chicago, Illinois from April 16-20, 2015. Our presentations will cover research under the Division H (Research, Evaluation, and Assessment in Schools) Section 2 symposium: Program Evaluation in Schools.

  1. Formative Evaluation on the Process of Scaling Up Reading Apprenticeship Authors: Jenna Lynn Zacamy, Megan Toby, Andrew P. Jaciw, and Denis Newman
  2. The Evaluation of Internet-based Reading Apprenticeship Improving Science Education (iRAISE) Authors: Megan Toby, Jenna Lynn Zacamy, Andrew P. Jaciw, and Denis Newman

We look forward to seeing you at our sessions to discuss our research. As soon as we have the schedule for these presentations, we will post them here. As has become tradition, we plan to host yet another of our popular AERA receptions. Details about the reception will follow in the months to come.

2015-02-26

Understanding Logic Models Workshop Series

On July 17, Empirical Education facilitated the first of two workshops for practitioners in New Mexico on the development of program logic models, one of the first steps in developing a research agenda. The workshop, entitled “Identifying Essential Logic Model Components, Definitions, and Formats”, introduced the general concepts, purposes, and uses of program logic models to members of the Regional Education Lab (REL) Southwest’s New Mexico Achievement Gap Research Alliance. Throughout the workshop, participants collaborated with facilitators to build a logic model for a program or policy that participants are working on or that is of interest.

Empirical Education is part of the REL Southwest team, which assists Arkansas, Louisiana, New Mexico, Oklahoma, and Texas in using data and research evidence to address high-priority regional needs, including charter school effectiveness, early childhood education, Hispanic achievement in STEM, rural school performance, and closing the achievement gap, through six research alliances. The logic model workshops aim to strengthen the technical capacity of New Mexico Achievement Gap Research Alliance members to understand and visually represent their programs’ theories of change, identify key program components and outcomes, and use logic models to develop research questions. Both workshops are being held in Albuquerque, New Mexico.

2014-06-17

Importance is Important for Rules of Evidence Proposed for ED Grant Programs

The U.S. Department of Education recently proposed new rules for including serious evaluations as part of its grant programs. The approach is modeled on how evaluations are used in the Investing in Innovation (i3) program where the proposal must show there’s some evidence that the proposed innovation has a chance of working and scaling and must include an evaluation that will add to a growing body of evidence about the innovation. We like this approach because it treats previous research as a hypothesis that the innovation may work in the new context. And each new grant is an opportunity to try the innovation in a new context, with improved approaches that warrant another check on effectiveness. But the proposed rules definitely had some weak points that were pointed out in the public comments, which are available online. We hope ED heeds these suggestions.

Mark Schneiderman representing the Software and Information Industry Association (SIIA) recommends that outcomes used in effectiveness studies should not be limited to achievement scores.

SIIA notes that grant program resources could appropriately address a range of purposes from instructional to administrative, from assessment to professional development, and from data warehousing to systems productivity. The measures could therefore include such outcomes as student test scores, teacher retention rates, changes in classroom practice or efficiency, availability and use of data or other student/teacher/school outcomes, and cost effectiveness and efficiency that can be observed and measured. Many of these outcome measures can also be viewed as intermediate outcomes—changes in practice that, as demonstrated by other research, are likely to affect other final outcomes.

He also points out that quality of implementation and the nature of the comparison group can be the deciding factors in whether or not a program is found to be effective.

SIIA notes that in education there is seldom a pure control condition such as can be achieved in a medical trial with a placebo or sugar pill. Evaluations of education products and services resemble comparative effectiveness trials in which a new medication is tested against a currently approved one to determine whether it is significantly better. The same product may therefore prove effective in one district that currently has a weak program but relatively less effective in another where a strong program is in place. As a result, significant effects can often be difficult to discern.

This point gets to the heart of the contextual issues in any experimental evaluation. Without understanding the local conditions of the experiment the size of the impact for any other context cannot be anticipated. Some experimentalists would argue that a massive multi-site trial would allow averaging across many contextual variations. But such “on average” results won’t necessarily help the decision-maker working in specific local conditions. Thus, taking previous results as a rough indication that an innovation is worth trying is the first step before conducting the grant-funded evaluation of a new variation of the innovation under new conditions.

Jon Baron, writing for the Coalition for Evidence Based Policy expresses a fundamental concern about what counts as evidence. Jon, who is a former Chair of the National Board for Education Sciences and has been a prominent advocate for basing policy on rigorous research, suggests that

“the definition of ‘strong evidence of effectiveness’ in §77.1 incorporate the Investing in Innovation Fund’s (i3) requirement for effects that are ‘substantial and important’ and not just statistically significant.”

He cites examples where researchers have reported statistically significant results, which were based on trivial outcomes or had impacts so small as to have no practical value. Including “substantial and important” as additional criteria also captures the SIIA’s point that it is not sufficient to consider the internal validity of the study—policy makers must consider whether the measure used is an important one or whether the treatment-control contrast allows for detecting a substantial impact.

Addressing the substance and importance of the results gets us appropriately into questions of external validity, and leads us to questions about subgroup impact, where, for example, an innovation has a positive impact “on average” and works well for high scoring students but provides no value for low scoring students. We would argue that a positive average impact is not the most important part of the picture if the end result is an increase in a policy-relevant achievement gap. Should ED be providing grants for innovations where there has been a substantial indication that a gap is worsened? Probably yes, but only if the proposed development is aimed at fixing the malfunctioning innovation and if the program evaluation can address this differential impact.

2013-03-17
Archive