blog posts and news stories

Comment on the NY Times: In Classroom of Future, Stagnant Scores

The New York Times is running a series of front-page articles on “Grading the Digital School.” The first one ran Labor Day weekend and raised the question as to whether there’s any evidence that would persuade a school board or community to allocate extra funds for technology. With the demise of the Enhancing Education Through Technology (EETT) program, federal funds dedicated to technology will no longer be flowing into states and districts. Technology will have to be measured against any other discretionary purchase. The resulting internal debates within schools and their communities about the expense vs. value of technology promise to have interesting implications and are worth following closely.

The first article by Matt Richtel revisits a debate that has been going on for decades between those who see technology as the key to “21st Century learning” and those who point to the dearth of evidence that technology makes any measurable difference to learning. It’s time to try to reframe this discussion in terms of what can be measured. And in considering what to measure, and in honor of Labor Day, we raise a question that is often ignored: what role do teachers play in generating the measurable value of technology?

Let’s start with the most common argument in favor of technology, even in the absence of test score gains. The idea is that technology teaches skills “needed in a modern economy,” and these are not measured by the test scores used by state and federal accountability systems. Karen Cator, director of the U.S. Department of Education office of educational technology, is quoted as saying (in reference to the lack of improvement in test scores), “…look at all the other things students are doing: learning to use the Internet to research, learning to organize their work, learning to use professional writing tools, learning to collaborate with others.” Presumably, none of these things directly impact test scores. The problem with this perennial argument is that many other things that schools keep track of should provide indicators of improvement. If as a result of technology, students are more excited about learning or more engaged in collaborating, we could look for an improvement in attendance, a decrease in drop-outs, or students signing up for more challenging courses.

Information on student behavioral indicators is becoming easier to obtain since the standardization of state data systems. There are some basic study designs that use comparisons among students within the district or between those in the district and those elsewhere in the state. This approach uses statistical modeling to identify trends and control for demographic differences, but is not beyond the capabilities of many school district research departments1 or the resources available to the technology vendors. (Empirical has conducted research for many of the major technology providers, often focusing on results for a single district interested in obtaining evidence to support local decisions.) Using behavioral or other indicators, a district such as that in the Times article can answer its own questions. Data from the technology systems themselves can be used to identify users and non-users and to confirm the extent of usage and implementation. It is also valuable to examine whether some students (those in most need or those already doing okay) or some teachers (veterans or novices) receive greater benefit from the technology. This information may help the district focus resources where they do the most good.

A final thought about where to look for impacts of technologies comes from a graph of the school district’s budget. While spending on technology and salaries have both declined over the last three years, spending on salaries is still about 25 times as great as on technologies. Any discussion of where to find an impact of technology must consider labor costs, which are the district’s primary investment. We might ask whether a small investment in technology would allow the district to reduce the numbers of teachers by, for example, allowing a small increase in the number of students each teacher can productively handle. Alternatively, we might ask whether technology can make a teacher more effective, by whatever measures of effective teaching the district chooses to use, with their current students. We might ask whether technologies result in keeping young teachers on the job longer or encouraging initiative to take on more challenging assignments.

It may be a mistake to look for a direct impact of technology on test scores (aside from technologies aimed specifically at that goal), but it is also a mistake to assume the impact is, in principle, not measurable. We need a clear picture of how various technologies are expected to work and where we can look for the direct and indirect effects. An important role of technology in the modern economy is providing people with actionable evidence. It would be ironic if education technology was inherently opaque to educational decision makers.

1 Or we would hope, the New York Times. Sadly, the article provides a graph of trends in math and reading for the district highlighted in the story compared to trends for the state. The graphic is meant to show that the district is doing worse than the state average. But the article never suggests that we should consider the population of the particular district and whether it is doing better or worse than one would expect, controlling for demographics, available resources, and other characteristics.


New RFP calls for Building Regional Research Capacity

The US Department of Education (ED) has just released the eagerly anticipated RFP for the next round of the Regional Education Laboratories (RELs). This RFP contains some very interesting departures from how the RELs have been working, which may be of interest especially to state and local educators.

For those unfamiliar with federal government organizations, the RELs are part of the National Center for Education Evaluation and Regional Assistance (abbreviated NCEE), which is within the Institute of Education Sciences (IES), part of ED. The country is divided up into ten regions, each one served by a REL—so the RFP announced today is really a call for proposals in ten different competitions. The RELs have been in existence for decades but their mission has evolved over time. For example, the previous RFP (about 6 years ago) put a strong emphasis on rigorous research, particularly randomized control trials (RCTs) leading the contractors in each of the 10 regions to greatly expand their capacity, in part by bringing in subcontractors with the requisite technical skills. (Empirical conducted or assisted with RCTs in four of the 10 regions.) The new RFP changes the focus in two essential ways.

First, one of the major tasks is building capacity for research among practitioners. Educators at the state and local levels told ED that they needed more capacity to make use of the longitudinal data systems that the ED has invested in through grants to the states. It is one thing to build the data systems. It is another thing to use the data to generate evidence that can inform decisions about policies and programs. Last month at the conference of the Society for Research on Educational Effectiveness, Rebecca Maynard, Commissioner of NCEE talked about building a “culture of experimentation” among practitioners and building their capacity for simpler experiments that don’t take so long and are not as expensive as those NCEE has typically contracted for. Her point was that the resulting evidence is more likely to be used if the practitioners are “up close and immediate.”

The second idea found in the RFP for the RELs is that each regional lab should work through “alliances” of state and local agencies. These alliances would cross state boundaries (at least within the region) and would provide an important part of the REL’s research agenda. The idea goes beyond having an advisory panel for the REL that requests answers to questions. The alliances are also expected to build their own capacity to answer these questions using rigorous research methods but applying them cost-effectively and opportunistically. The capacity of the alliances should outlive the support provided by the RELs. If your organization is part of an existing alliance and would like to get better at using and conducting research, there are teams being formed to go after the REL contracts that would be happy to hear from you. (If you’re not sure who to call, let us know and we’ll put you in touch with an appropriate team.)


A Conversation About Building State and Local Research Capacity

John Q Easton, director of the Institute of Education Sciences (IES), came to New Orleans recently to participate in the annual meeting of the American Educational Research Association. At one of his stops, he was the featured speaker at a meeting of the Directors of Research and Evaluation (DRE), an organization composed of school district research directors. (DRE is affiliated with AERA and was recently incorporated as a 501©(3)). John started his remarks by pointing out that for much of his career he was a school district research director and felt great affinity to the group. He introduced the directions that IES was taking, especially how it was approaching working with school systems. He spent most of the hour fielding questions and engaging in discussion with the participants. Several interesting points came out of the conversation about roles for the researchers who work for education agencies.

Historically, most IES research grant programs have been aimed at university or other academic researchers. It is noteworthy that even in a program for “Evaluation of State and Local Education Programs and Policies,” grants have been awarded only to universities and large research firms. There is no expectation that researchers working for the state or local agency would be involved in the research beyond the implementation of the program. The RFP for the next generation of Regional Education Labs (REL) contracts may help to change that. The new RFP expects the RELs to work closely with education agencies to define their research questions and to assist alliances of state and local agencies in developing their own research capacity.

Members of the audience noted that, as district directors of research, they often spend more time reviewing research proposals from students and professors at local colleges who want to conduct research in their schools, rather than actually answering questions initiated by the district. Funded researchers treat the districts as the “human subjects,” paying incentives to participants and sometimes paying for data services. But the districts seldom participate in defining the research topic, conducting the studies, or benefiting directly from the reported findings. The new mission of the RELs to build local capacity will be a major shift.

Some in the audience pointed out reasons to be skeptical that this REL agenda would be possible. How can we build capacity if research and evaluation departments across the country are being cut? In fact, very little is known about the number of state or district practitioners whose capacity for research and evaluation could be built by applying the REL resources. (Perhaps, a good first research task for the RELs would be to conduct a national survey to measure the existing capacity.)

John made a good point in reply: IES and the RELs have to work with the district leadership—not just the R&E departments—to make this work. The leadership has to have a more analytic view. They need to see the value of having an R&E department that goes beyond test administration, and is able to obtain evidence to support local decisions. By cultivating a research culture in the district, evaluation could be routinely built in to new program implementations from the beginning. The value of the research would be demonstrated in the improvements resulting from informed decisions. Without a district leadership team that values research to find out what works for the district, internal R&E departments will not be seen as an important capacity.

Some in the audience pointed out that in parallel to building a research culture in districts, it will be necessary to build a practitioner culture among researchers. It would be straightforward for IES to require that research grantees and contractors engage the district R&E staff in the actual work, not just review the research plan and sign the FERPA agreement. Practitioners ultimately hold the expertise in how the programs and research can be implemented successfully in the district, thus improving the overall quality and relevance of the research.


Quasi-experimental Design Used to Build Evidence for Adolescent Reading Intervention

A study of Jamestown Reading Navigator (JRN) from McGraw-Hill (now posted on our reports page), conducted in Miami-Dade County Public Schools, found positive results on the Florida state reading test (FCAT) for high school students in their intensive reading classes. JRN is an online application, with internal record keeping making it possible to identify the treatment group for a comparison design. While the full student, teacher and roster data for 9th and 10th grade intensive reading classes were provided by the district, JRN—as an online application—provided the identification of the student and teacher users through the computer logs. The quasi-experimental design was strengthened by using schools with both JRN and non-JRN students. Of the 70 schools that had JRN logs, 23 had JRN and non-JRN intensive reading classes and sufficient data for analysis.


Looking Back 35 Years to Learn about Local Experiments

With the growing interest among federal agencies in building local capacity for research, we took another look at an article by Lee Cronbach published in 1975. We found it has a lot to say about conducting local experiments and implications for generalizability. Cronbach worked for much of his career at Empirical’s neighbor, Stanford University, and his work has had a direct and indirect influence on our thinking. Some may interpret Cronbach’s work as stating that randomized trials of educational interventions have no value because of the complexity of interactions between subjects, contexts, and the experimental treatment. In any particular context, these interactions are infinitely complex, forming a “hall of mirrors” (as he famously put it, p. 119), making experimental results—which at most can address a small number of lower-order interactions—irrelevant. We don’t read it that way. Rather, we see powerful insights as well as cautions for conducting the kinds of field experiments that are beginning to show promise for providing educators with useful evidence.

We presented these ideas at the Society for Research in Educational Effectiveness conference in March, building the presentation around a set of memorable quotes from the 1975 article. Here we highlight some of the main ideas.

Quote #1: “When we give proper weight to local conditions, any generalization is a working hypothesis, not a conclusion…positive results obtained with a new procedure for early education in one community warrant another community trying it. But instead of trusting that those results generalize, the next community needs its own local evaluation” (p. 125).

Practitioners are making decisions for their local jurisdiction. An experiment conducted elsewhere (including over many locales, where the results are averaged) provides a useful starting point, but not “proof” that it will or will not work in the same way locally. Experiments give us a working hypothesis concerning an effect, but it has to be tested against local conditions at the appropriate scale of implementation. This brings to mind California’s experience with class size reduction following the famous experiment in Tennessee, and how the working hypothesis corroborated through the experiment did not transfer to a different context. We also see applicability of Cronbach’s ideas in the Investing in Innovation (i3) program, where initial evidence is being taken as a warrant to scale-up intervention, but where the grants included funding for research under new conditions where implementation may head in unanticipated directions, leading to new effects.

Quote #2: “Instead of making generalization the ruling consideration in our research, I suggest that we reverse our priorities. An observer collecting data in one particular situation…will give attention to whatever variables were controlled, but he will give equally careful attention to uncontrolled conditions…. As results accumulate, a person who seeks understanding will do his best to trace how the uncontrolled factors could have caused local departures from the modal effect. That is, generalization comes late, and the exception is taken as seriously as the rule” (pp. 124-125).

Finding or even seeking out conditions that lead to variation in the treatment effect facilitates external validity, as we build an account of the variation. This should not be seen as a threat to generalizability because an estimate of average impact is not robust across conditions. We should spend some time looking at the ways that the intervention interacts differently with local characteristics, in order to determine which factors account for heterogeneity in the impact and which ones do not. Though this activity is exploratory and not necessarily anticipated in the design, it provides the basis for understanding how the treatment plays out, and why its effect may not be constant across settings. Over time, generalizations can emerge, as we compile an account of the different ways in which the treatment is realized and the conditions that suppress or accentuate its effects.

Quote #3: “Generalizations decay” (p. 122).

In the social policy arena, and especially with the rapid development of technologies, we can’t expect interventions to stay constant. And we certainly can’t expect the contexts of implementation to be the same over many years. The call for quicker turn-around in our studies is therefore necessary, not just because decision-makers need to act, but because any finding may have a short shelf life.

Cronbach, L. J. (1975). Beyond the two disciplines of scientifi­c psychology. American Psychologist, 116-127.


Conference Season 2011

Empirical researchers will again be on the road this conference season, and we’ve included a few new conference stops. Come meet our researchers as we discuss our work at the following events. If you will be present at any of these, please get in touch so we can schedule a time to speak with you, or come by to see us at our presentations.


This year, the NCES-MIS “Deep in the Heart of Data” Conference will offer more than 80 presentations, demonstrations, and workshops conducted by information system practitioners from federal, state, and local K-12 agencies.

Come by and say hello to one of our research managers, Joseph Townsend, who will be running Empirical Education’s table display at the Hilton Hotel in Austin, Texas from February 23-25th. Joe will be presenting interactive demonstrations of MeasureResults, which allows school district staff to conduct complete program evaluations online.


Attendees of this spring’s Society for Research on Educational Effectiveness (SREE) Conference, held in Washington, DC March 3-5, will have the opportunity to discuss questions of generalizability with Empirical Education’s Chief Scientist, Andrew Jaciw and President, Denis Newman at two poster sessions. The first poster, entitled External Validity in the Context of RCTs: Lessons from the Causal Explanatory Tradition applies insights from Lee Cronbach to current RCT practices. In the second poster, The Use of Moderator Effects for Drawing Generalized Causal Inferences, Jaciw addresses issues in multi-site experiments. They look forward to discussing these posters both online at the conference website and in person.


We are pleased to announce that we will have our first showing this year at the Association for Education Finance and Policy (AEFP) Annual Conference. Join us in the afternoon on Friday, March 25th at the Grand Hyatt in Seattle, WA as Empirical’s research scientist, Valeriy Lazarev, presents a poster on Cost-benefit analysis of educational innovation using growth measures of student achievement.


We will again have a strong showing at the 2011 American Educational Research Association (AERA) Conference. Join us in festive New Orleans, April 8-12 for the final results on the efficacy of the PCI Reading Program, our qualitative findings from the first year of formative research on our MeasureResults online program evaluation tool, and more.

View our AERA presentation schedule for more details and a complete list of our participants.


This year’s SIIA Ed Tech Industry Summit will take place in gorgeous San Francisco, just 45 minutes north of Empirical Education’s headquarters in the Silicon Valley. We invite you to schedule a meeting with us at the Palace Hotel from May 22-24.


Empirical Education Partners with Carnegie Learning on New Student Performance Guarantee

Schools looking to improve student Algebra and Geometry achievement have signed up for a guarantee from Carnegie Learning® that states that students using the company’s Cognitive Tutor programs will pass their math courses. Empirical Education is tasked to monitor student performance in participating schools. Starting this school year, Carnegie Learning guarantees that students who take three complete and consecutive years of Carnegie Learning’s math courses will pass their math class in the third year. The guarantee applies to middle and high school students taking the Carnegie Learning Bridge to Algebra, Algebra, Algebra II, and Geometry courses.

In the coming weeks/months, Empirical will collect roster data, course grades, and assessment scores from schools as well as usage data from Carnegie Learning’s math teaching software. These data will be combined to generate biannual reports that will provide schools with evidence they can use to effectively improve implementation of the courses and raise student achievement.

Carnegie Learning’s guarantee is part of their School Improvement Grant support efforts. “Partnering with Empirical Education will allow us to get mid- and end-of-year research reports into the hands of our school partners,” says Steve Ritter, Co-Founder and Chief Scientist at Carnegie Learning. “It’s part of our continuous improvement cycle; we’re excited to see the progress districts committed to the turnaround and transformation process can make with these new, powerful tools.”


Recognizing Success

When the Obama-Duncan administration approaches teacher evaluation, the emphasis is on recognizing success. We heard that clearly in Arne Duncan’s comments on the release of teacher value-added modeling (VAM) data for LA Unified by the LA Times. He’s quoted as saying, “What’s there to hide? In education, we’ve been scared to talk about success.” Since VAM is often thought of as a method for weeding out low performing teachers, Duncan’s statement referencing success casts the use of VAM in a more positive light. Therefore we want to raise the issue here: how do you know when you’ve found success? The general belief is that you’ll recognize it when you see it. But sorting through a multitude of variables is not a straightforward process, and that’s where research methods and statistical techniques can be useful. Below we illustrate how this plays out in teacher and in program evaluation.

As we report in our news story, Empirical is participating in the Gates Foundation project called Measures of Effective Teaching (MET). This project is known for its focus on value-added modeling (VAM) of teacher effectiveness. It is also known for having collected over 10,000 videos from over 2,500 teachers’ classrooms—an astounding accomplishment. Research partners from many top institutions hope to be able to identify the observable correlates for teachers whose students perform at high levels as well as for teachers whose students do not. (The MET project tested all the students with an “alternative assessment” in addition to using the conventional state achievement tests.) With this massive sample that includes both data about the students and videos of teachers, researchers can identify classroom practices that are consistently associated with student success. Empirical’s role in MET is to build a web-based tool that enables school system decision-makers to make use of the data to improve their own teacher evaluation processes. Thus they will be able to build on what’s been learned when conducting their own mini-studies aimed at improving their local observational evaluation methods.

When the MET project recently had its “leads” meeting in Washington DC, the assembled group of researchers, developers, school administrators, and union leaders were treated to an after-dinner speech and Q&A by Joanne Weiss. Joanne is now Arne Duncan’s chief of staff, after having directed the Race to the Top program (and before that was involved in many Silicon Valley educational innovations). The approach of the current administration to teacher evaluation—emphasizing that it is about recognizing success—carries over into program evaluation. This attitude was clear in Joanne’s presentation, in which she declared an intention to “shine a light on what is working.” The approach is part of their thinking about the reauthorization of ESEA, where more flexibility is given to local decision- makers to develop solutions, while the federal legislation is more about establishing achievement goals such as being the leader in college graduation.

Hand in hand with providing flexibility to find solutions, Joanne also spoke of the need to build “local capacity to identify and scale up effective programs.” We welcome the idea that school districts will be free to try out good ideas and identify those that work. This kind of cycle of continuous improvement is very different from the idea, incorporated in NCLB, that researchers will determine what works and disseminate these facts to the practitioners. Joanne spoke about continuous improvement, in the context of teachers and principals, where on a small scale it may be possible to recognize successful teachers and programs without research methodologies. While a teacher’s perception of student progress in the classroom may be aided by regular assessments, the determination of success seldom calls for research design. We advocate for a broader scope, and maintain that a cycle of continuous improvement is just as much needed at the district and state levels. At those levels, we are talking about identifying successful schools or successful programs where research and statistical techniques are needed to direct the light onto what is working. Building research capacity at the district and state level will be a necessary accompaniment to any plan to highlight successes. And, of course, research can’t be motivated purely by the desire to document the success of a program. We have to be equally willing to recognize failure. The administration will have to take seriously the local capacity building to achieve the hoped-for identification and scaling up of successful programs.


Empirical Education Develops Web-Based Tool to Improve Teacher Evaluation

For school districts looking for ways to improve teacher observation methods, Empirical Education has begun development of a web-delivered tool that will provide a convenient way to validate their observational protocols and rubrics against measures of the teacher’s contribution to student academic growth.

Empirical Education is charged with developing a “validation engine” as part of the Measures of Teacher Effectiveness (MET) project, funded by the Bill and Melinda Gates Foundation. As described on the project’s website, the tool will allow users to “view classroom observation videos, rate those videos and then receive a report that evaluates the predictive validity and rater consistency for the protocol.” The MET project has collected thousands of hours of video of classrooms as well as records of the characteristics and academic performance associated with the students in the class.

By watching and coding videos of a range of teachers, users will be able to verify whether or not their current teacher rating systems are identifying teaching behavior associated with higher achievement. The tool will allow users to review their own rating systems against a variety of MET project measures, and will give real-time feedback through an automated report generator.

Development of the validation engine builds on two years of MET Project research, which included data from six school districts across the country and over 3,000 teachers. Researchers will now use the data to identify leading indicators of teacher practice on student achievement. The engine is expected to undergo beta testing over the next few months, beginning with the National Math and Science Initiative.

Announcement of the new tool comes as interest in alternative ways to measure the effectiveness of teachers is becoming a major issue in education and as federal, state and local officials and teacher organizations look for researched-based ways to identify effective teachers and improve student outcomes.

“At a time when schools are experiencing budget cuts, it is vital that school districts have ready access to research tools, so that they can make the most informed decisions,” says Denis Newman, President of Empirical Education. The validation engine will be part of a suite of web-based technology tools developed by the company, including [MeasureResults, an online tool that allows districts to evaluate the effectiveness of the products and programs they use.


Empirical Education at AERA 2011

Empirical is excited to announce that we will again have a strong showing at the 2011 American Educational Research Association (AERA) Conference. Join us in festive New Orleans, LA, April 8-12 for the final results on the efficacy of the PCI Reading Program, our findings from the first year of formative research on our MeasureResults program evaluation tool, and more. Visit our website in the coming months to view our AERA presentation schedule and details about our annual reception—we hope to see you there!