blog posts and news stories

Determining the Impact of CREATE on Math and ELA Achievement

Empirical Education is conducting the evaluation of Collaboration and Reflection to Enhance Atlanta Teacher Effectiveness (CREATE) under an Investing in Innovation (i3) development grant awarded in 2014. The CREATE evaluation takes place in schools throughout the state of Georgia.

Approximately 40 residents from the Georgia State University (GSU) College of Education (COE) are participating in the CREATE teacher residency program. Using a quasi-experimental design, outcomes for these teachers and their students will be compared to those from a matched comparison group of close to 100 teachers who simultaneously enrolled in GSU COE but did not participate in CREATE. Implementation for cohort 1 started in 2015, and cohort 2 started in 2016. Confirmatory outcomes will be assessed in years 2 and 3 of both cohorts (2017 - 2019).

Confirmatory research questions we will be answering include:

What is the impact of one-year of exposure of students to a novice teacher in their second year of teacher residency in the CREATE program, compared to the Business as Usual GSU teacher credential program, on mathematics and ELA achievement of students in grades 4-8, as measured by the Georgia Milestones Assessment System?

What is the impact of CREATE on the quality of instructional strategies used by teachers, as measured by the Teacher Assessment of Performance Standards (TAPS) scores, at the end of the third year of residency, relative to the business as usual condition?

What is the impact of CREATE on the quality of the learning environment created by teachers, as measured by Teacher Assessment of Performance Standards (TAPS) scores, at the end of the third year of residency, relative to the business as usual condition?

Exploratory research questions will address additional teacher-level outcomes including retention, effectiveness, satisfaction, collaboration, and levels of stress in relationships with students and colleagues.

We plan to publish the results of this study in fall of 2019. Please check back to read the research summary and report.


2010-2011: The Year of the VAM

If you haven’t heard about Value-Added Modeling (VAM) in relation to the controversial teacher ratings in Los Angeles and subsequent brouhaha in the world of education, chances are that you’ll hear about it in the coming year.

VAM is a family of statistical techniques for estimating the contribution of a teacher or of a school to the academic growth of students. Recently, the LA Times obtained the longitudinal test score records for all the elementary school teachers and students in LA Unified and had a RAND economist (working as an independent consultant) run the calculations. The result was a “score” for all LAUSD elementary school teachers. Note that the economist who did the calculations wrote up a technical report on how it was done and the specific questions his research was aimed at answering.

Reactions to the idea that a teacher could be evaluated using a set of test scores—in this case from the California Standards Test—were swift and divisive. The concept was denounced by the teachers’ union, with the local leader calling for a boycott. Meanwhile, the US Secretary of Education, Arne Duncan, made headlines by commenting favorably on the idea. The LA Times quotes him as saying “What’s there to hide? In education, we’ve been scared to talk about success.”

There is a tangle of issues here, along with exaggerations, misunderstandings, and confusion between research techniques and policy decisions. This column will address some of the issues over the coming year. We also plan to announce some of our own contributions to the VAM field in the form of project news.

The major hot-button issues include appropriate usage (e.g., for part or all of the input to merit pay decisions) and technical failings (e.g., biases in the calculations). Of course, these two issues are often linked; for example, many argue that biases may make VAM unfair for individual merit pay. The recent Brief from the Economic Policy Institute, authored by an impressive team of researchers (several our friends/mentors from neighboring Stanford), makes a well reasoned case for not using VAM as the only input to high-stakes decisions. While their arguments are persuasive with respect to VAM as the lone criterion for awarding merit pay or firing individual teachers, we still see a broad range of uses for the technique, along with the considerable challenges.

For today, let’s look at one issue that we find particularly interesting: How to handle teacher collaboration in a VAM framework. In a recent Education Week commentary, Kim Marshall argues that any use of test scores for merit pay is a losing proposition. One of the many reasons he cites is its potentially negative impact on collaboration.

A problem with an exercise like that conducted by the LA Times is that there are organizational arrangements that do not come into the calculations. For example, we find that team teaching within a grade at a school is very common. A teacher with an aptitude for teaching math may take another teacher’s students for a math period, while sending her own kids to the other teacher for reading. These informal arrangements are not part of the official school district roster. They can be recorded (with some effort) during the current year but are lost for prior years. Mentoring is a similar situation, wherein the value provided to the kids is distributed among members of their team of teachers. We don’t know how much difference collaborative or mentoring arrangements make to individual VAM scores, but one fear in using VAM in setting teacher salaries is that it will militate against productive collaborations and reduce overall achievement.

Some argue that, because VAM calculations do not properly measure or include important elements, VAM should be disqualified from playing any role in evaluation. We would argue that, although they are imperfect, VAM calculations can still be used as a component of an evaluation process. Moreover, continued improvements can be made in testing, in professional development, and in the VAM calculations themselves. In the case of collaboration, what is needed are ways that a principal can record and evaluate the collaborations and mentoring so that the information can be worked into the overall evaluation and even into the VAM calculation. In such an instance, it would be the principal at the school, not an administrator at the district central office, who can make the most productive use of the VAM calculations. With knowledge of the local conditions and potential for bias, the building leader may be in the best position to make personnel decisions.

VAM can also be an important research tool—using consistently high and/or low scores as a guide for observing classroom practices that are likely to be worth promoting through professional development or program implementations. We’ve seen VAM used this way, for example, by the research team at Wake County Public Schools in North Carolina in identifying strong and weak practices in several content areas. This is clearly a rich area for continued research.

The LA Times has helped to catapult the issue of VAM onto the national radar. It has also sparked a discussion of how school data can be used to support local decisions, which can’t be a bad thing.


Research: From NCLB to Obama’s Blueprint for ESEA

We can finally put “Scientifically Based Research” to rest. The term that appeared more than 100 times in NCLB appears zero times in the Obama administration’s Blueprint for Reform, which is the document outlining its approach to the reauthorization of ESEA. The term was always an awkward neologism, coined presumably to avoid simply saying “scientific research.” It also allowed NCLB to contain an explicit definition to be enforced—a definition stipulating not just any scientific activities, but research aimed at coming to causal conclusions about the effectiveness of some product, policy, or laboratory procedure.

A side effect of the SBR focus has been the growth of a compliance mentality among both school systems and publishers. Schools needed some assurance that a product was backed by SBR before they would spend money, while textbooks were ranked in terms of the number of SBR-proven elements they contained.

Some have wondered if the scarcity of the word “research” in the new Blueprint might signal a retreat from scientific rigor and the use of research in educational decisions (see, for example, Debra Viadero’s blog). Although the approach is indeed different, the new focus makes a stronger case for research and extends its scope into decisions at all levels.

The Blueprint shifts the focus to effectiveness. The terms “effective” or “effectiveness” appear about 95 times in the document. “Evidence” appears 18 times. And the compliance mentality is specifically called out as something to eliminate.

“We will ask policymakers and educators at all levels to carefully analyze the impact of their policies, practices, and systems on student outcomes. … And across programs, we will focus less on compliance and more on enabling effective local strategies to flourish.” (p. 35)

Instead of the stiff definition of SBR, we now have a call to “policymakers and educators at all levels to carefully analyze the impact of their policies, practices, and systems on student outcomes.” Thus we have a new definition for what’s expected: carefully analyzing impact. The call does not go out to researchers per se, but to policymakers and educators at all levels. This is not a directive from the federal government to comply with the conclusions of scientists funded to conduct SBR. Instead, scientific research is everybody’s business now.

Carefully analyzing the impact of practices on student outcomes is scientific research. For example, conducting research carefully requires making sure the right comparisons are made. A study that is biased by comparing two groups with very different motivations or resources is not a careful analysis of impact. A study that simply compares the averages of two groups without any statistical calculations can mistakenly identify a difference when there is none, or vice versa. A study that takes no measure of how schools or teachers used a new practice—or that uses tests of student outcomes that don’t measure what is important—can’t be considered a careful analysis of impact. Building the capacity to use adequate study design and statistical analysis will have to be on the agenda of the ESEA if the Blueprint is followed.

Far from reducing the role of research in the U.S. education system, the Blueprint for ESEA actually advocates a radical expansion. The word “research” is used only a few times, and “science” is used only in the context of STEM education. Nonetheless, the call for widespread careful analysis of the evidence of effective practices that impact student achievement broadens the scope of research, turning all policymakers and educators into practitioners of science.


At NCES Conference, Empirical Education Explains a Difficulty in Linking Student and Teacher Records

The San Francisco Bay Area, our “home town”, was the site for the 2008 National Center for Educational Statistics (NCES) conference. From February 25 to 29, educators and researchers from all over the country came to discuss data collection and analysis. Laurel Sterling and Robert Smith of Empirical Education presented “Tracking Teachers of Instruction for Data Accuracy and Improving Educational Outcomes”. Their topic was the need to differentiate between, the teachers who actually perform instruction for students and the teachers with whom those students are officially registered. They explained that in our research we keep track of the “teacher of instruction” vs. “teacher of registration”. Without this distinction we are unable to properly identify the student clusters or associate student growth with the right teacher. Sharing instructional responsibilities within a grade-level team is common enough to be of concern. In a large experiment involving teachers in grades 4 through 8, 17% reported teaching students who were not assigned to them on the official class roster. The audience was very lively and in the question period contributed to the topic. One district IT manager indicated that there is movement in this direction even at the state level. For a copy of the presentation, send an email to Robin Means.