blog posts and news stories

Empirical Education Develops Web-Based Tool to Improve Teacher Evaluation

For school districts looking for ways to improve teacher observation methods, Empirical Education has begun development of a web-delivered tool that will provide a convenient way to validate their observational protocols and rubrics against measures of the teacher’s contribution to student academic growth.

Empirical Education is charged with developing a “validation engine” as part of the Measures of Teacher Effectiveness (MET) project, funded by the Bill and Melinda Gates Foundation. As described on the project’s website, the tool will allow users to “view classroom observation videos, rate those videos and then receive a report that evaluates the predictive validity and rater consistency for the protocol.” The MET project has collected thousands of hours of video of classrooms as well as records of the characteristics and academic performance associated with the students in the class.

By watching and coding videos of a range of teachers, users will be able to verify whether or not their current teacher rating systems are identifying teaching behavior associated with higher achievement. The tool will allow users to review their own rating systems against a variety of MET project measures, and will give real-time feedback through an automated report generator.

Development of the validation engine builds on two years of MET Project research, which included data from six school districts across the country and over 3,000 teachers. Researchers will now use the data to identify leading indicators of teacher practice on student achievement. The engine is expected to undergo beta testing over the next few months, beginning with the National Math and Science Initiative.

Announcement of the new tool comes as interest in alternative ways to measure the effectiveness of teachers is becoming a major issue in education and as federal, state and local officials and teacher organizations look for researched-based ways to identify effective teachers and improve student outcomes.

“At a time when schools are experiencing budget cuts, it is vital that school districts have ready access to research tools, so that they can make the most informed decisions,” says Denis Newman, President of Empirical Education. The validation engine will be part of a suite of web-based technology tools developed by the company, including [MeasureResults, an online tool that allows districts to evaluate the effectiveness of the products and programs they use.

2010-11-17

Empirical Education at AERA 2011

Empirical is excited to announce that we will again have a strong showing at the 2011 American Educational Research Association (AERA) Conference. Join us in festive New Orleans, LA, April 8-12 for the final results on the efficacy of the PCI Reading Program, our findings from the first year of formative research on our MeasureResults program evaluation tool, and more. Visit our website in the coming months to view our AERA presentation schedule and details about our annual reception—we hope to see you there!

2010-11-15

2010-2011: The Year of the VAM

If you haven’t heard about Value-Added Modeling (VAM) in relation to the controversial teacher ratings in Los Angeles and subsequent brouhaha in the world of education, chances are that you’ll hear about it in the coming year.

VAM is a family of statistical techniques for estimating the contribution of a teacher or of a school to the academic growth of students. Recently, the LA Times obtained the longitudinal test score records for all the elementary school teachers and students in LA Unified and had a RAND economist (working as an independent consultant) run the calculations. The result was a “score” for all LAUSD elementary school teachers. Note that the economist who did the calculations wrote up a technical report on how it was done and the specific questions his research was aimed at answering.

Reactions to the idea that a teacher could be evaluated using a set of test scores—in this case from the California Standards Test—were swift and divisive. The concept was denounced by the teachers’ union, with the local leader calling for a boycott. Meanwhile, the US Secretary of Education, Arne Duncan, made headlines by commenting favorably on the idea. The LA Times quotes him as saying “What’s there to hide? In education, we’ve been scared to talk about success.”

There is a tangle of issues here, along with exaggerations, misunderstandings, and confusion between research techniques and policy decisions. This column will address some of the issues over the coming year. We also plan to announce some of our own contributions to the VAM field in the form of project news.

The major hot-button issues include appropriate usage (e.g., for part or all of the input to merit pay decisions) and technical failings (e.g., biases in the calculations). Of course, these two issues are often linked; for example, many argue that biases may make VAM unfair for individual merit pay. The recent Brief from the Economic Policy Institute, authored by an impressive team of researchers (several our friends/mentors from neighboring Stanford), makes a well reasoned case for not using VAM as the only input to high-stakes decisions. While their arguments are persuasive with respect to VAM as the lone criterion for awarding merit pay or firing individual teachers, we still see a broad range of uses for the technique, along with the considerable challenges.

For today, let’s look at one issue that we find particularly interesting: How to handle teacher collaboration in a VAM framework. In a recent Education Week commentary, Kim Marshall argues that any use of test scores for merit pay is a losing proposition. One of the many reasons he cites is its potentially negative impact on collaboration.

A problem with an exercise like that conducted by the LA Times is that there are organizational arrangements that do not come into the calculations. For example, we find that team teaching within a grade at a school is very common. A teacher with an aptitude for teaching math may take another teacher’s students for a math period, while sending her own kids to the other teacher for reading. These informal arrangements are not part of the official school district roster. They can be recorded (with some effort) during the current year but are lost for prior years. Mentoring is a similar situation, wherein the value provided to the kids is distributed among members of their team of teachers. We don’t know how much difference collaborative or mentoring arrangements make to individual VAM scores, but one fear in using VAM in setting teacher salaries is that it will militate against productive collaborations and reduce overall achievement.

Some argue that, because VAM calculations do not properly measure or include important elements, VAM should be disqualified from playing any role in evaluation. We would argue that, although they are imperfect, VAM calculations can still be used as a component of an evaluation process. Moreover, continued improvements can be made in testing, in professional development, and in the VAM calculations themselves. In the case of collaboration, what is needed are ways that a principal can record and evaluate the collaborations and mentoring so that the information can be worked into the overall evaluation and even into the VAM calculation. In such an instance, it would be the principal at the school, not an administrator at the district central office, who can make the most productive use of the VAM calculations. With knowledge of the local conditions and potential for bias, the building leader may be in the best position to make personnel decisions.

VAM can also be an important research tool—using consistently high and/or low scores as a guide for observing classroom practices that are likely to be worth promoting through professional development or program implementations. We’ve seen VAM used this way, for example, by the research team at Wake County Public Schools in North Carolina in identifying strong and weak practices in several content areas. This is clearly a rich area for continued research.

The LA Times has helped to catapult the issue of VAM onto the national radar. It has also sparked a discussion of how school data can be used to support local decisions, which can’t be a bad thing.

2010-09-18

New Education Pilot Brings Apple’s iPad Into the Classroom

Above: Empirical Education President Denis Newman converses with Secretary Bonnie Reiss and author, Dr. Edward Burger

They’re not contest winners, but today, dozens of lucky 8th grade Algebra 1 students enthusiastically received new iPad devices, as part of a pilot of the new technology.

California Secretary of Education Bonnie Reiss joined local officials, publishers, and researchers at Washington Middle School in Long Beach for the kick-off. Built around this pilot is a scientific study designed to test the effectiveness of a new iPad-delivered Algebra textbook. Over the course of the new school year, Empirical Education researchers will compare the effect of the interactive iPad-delivered textbook to that of its conventional paper counterpart.

The new Algebra I iPad Application is published by Houghton Mifflin Harcourt and features interactive lessons, videos, quizzes, problem solving, and more. While students have to flip pages in a traditional textbook to reveal answers and explanations, students using the iPad version will be able to view interactive explanations and study guides instantly by tapping on the screen. Researchers will be able to study data collected from usage logs to enhance their understanding of usage patterns.

Empirical Education is charged with conducting the study, which will incorporate the performance of over twelve hundred students from four school districts throughout California, including Long Beach, San Francisco, Riverside, and Fresno. Researchers will combine measures of math achievement and program implementation to estimate the new program’s advantage while accounting for the effects of teacher differences and other influences on implementation and student achievement. Each participating teacher has one randomly selected class using the iPads while the other classes continue with the text version of the same material.

Though the researchers haven’t come up with a way of dealing with jealousy from students who will not receive an iPad, they did come up with a fair way to choose the groups who would use the new high tech program. Classes who received iPads were determined by a random number generator.

2010-09-08

Empirical Education is Part of Winning i3 Team

Of the almost 1700 grant applications submitted to the federal Investing in Innovation (i3) fund, the U.S. Department of Education chose only 49 proposals for this round of funding. A proposal submitted by our colleagues at WestEd was the third highest rated. Empirical Education assisted in developing the evaluation plan for the project. The project (officially named “Scaling Up Content-Area Academic Literacy in High School English Language Arts, Science and History Classes for High Needs Students”) is based on the Reading Apprenticeship model of academic literacy instruction. The grant will span five years and total $22.6 million, including 20 percent in matching funds from the private sector. This collaborative effort is expected to include 2,800 teachers and more than 400,000 students in 300 schools across four states. The evaluation component, on which we will collaborate with researchers from Academy for Educational Development, will combine a large scale randomized control trial with extensive formative research for continuous improvement of the innovation as it scales up.

2010-08-16

REL West Releases Report of RCT on Problem-Based Economics Conducted with Empirical Ed Help

Three years ago, Empirical Education began assisting the Regional Educational Laboratory West (REL West) housed at WestEd in conducting a large-scale randomized experiment on the effectiveness of the Problem-Based Economics (PBE) curriculum.

Today, the Institute of Education Sciences released the final report indicating a significant impact of the program for students in 12th grade as measured by the Test of Economic Literacy. In addition to the primary focus on student achievement outcomes, the study examined changes in teachers’ content knowledge in economics, their pedagogical practices, and satisfaction with the curriculum. The report, Effects of Problem Based Economics on High School Economics Instruction is found on the IES website.

Eighty Arizona and California school districts participated in the study, which encompassed 84 teachers and over 8,000 students. Empirical Education was responsible for major aspects of research operations, which involved collecting, tracking, scoring, and warehousing all data including rosters and student records from the districts, as well as the distribution of the PBE curricular materials, assessments, and student and teacher surveys. To handle the high volume and multiple administrations of surveys and assessments, we created a detail-oriented operation including schedules for following up with survey responses where we achieved response rates of over 95% for both teacher and student surveys. The experienced team of research managers, RAs and data warehouse engineers maintained a rigorous 3-day turnaround for gathering end-of-unit exams and sending score reports to each teacher. The complete, documented dataset was delivered to the researchers at WestEd as our contribution to this REL West achievement.

2010-07-30

Making Vendor Research More Credible

The latest evidence that research can be both rigorous and relevant was the subject of an announcement that the Software and Information Industry Association (SIIA) made last month about their new guidelines for conducting effectiveness research. The document is aimed at SIIA members, most of whom are executives of education software and technology companies and not necessarily schooled in research methodology. The main goal in publishing the guidelines is to improve the quality—and therefore the credibility—of research sponsored by the industry. The document provides SIIA members with things to keep in mind when contracting for research or using research in marketing materials. The document also has value for educators, especially those responsible for purchasing decisions. That’s an important point that I’ll get back to.

One thing to make clear in this blog entry is that while your humble blogger (DN) is given credit as the author, the Guidelines actually came from a working group of SIIA members who put in many months of brainstorming, discussion, and review. DN’s primary contribution was just to organize the ideas, ensure they were technically accurate, and put them into easy to understand language.

Here’s a taste of some of the ideas contained in the 22 guidelines:

  • With a few exceptions, all research should be reported regardless of the result. Cherry picking just the studies with strong positive results distorts the facts and in the long run hurts credibility. One lesson that might be taken from this is that conducting several small studies may be preferable to trying to prove a product effective (or not) in a single study.

  • Always provide a link to the full report. Too often in marketing materials (including those of advocacy groups, not just publishers) a fact such as “8th grade math achievement increased from 31% in 2004 to 63% in 2005,” is offered with no citation. In this specific case, the fact was widely cited but after considerable digging could be traced back to a report described by the project director as “anecdotal”.

  • Be sure to take implementation into account. In education, all instructional programs require setting up complex systems of teacher-student interaction, which can vary in numerous ways. Issues of how research can support the process and what to do with inadequate or outright failed implementation must be understood by researchers and consumers of research.

  • Watch out for the control condition. In education there are no placebos. In almost all cases we are comparing a new program to whatever is in place. Depending on how well the existing program works, the program being evaluated may appear to have an impact or not. This calls for careful consideration of where to test a product and understandable concern by educators as to how well a particular product tested in another district will perform against what is already in place in their district.

The Guidelines are not just aimed at industry. SIIA believes that as decision-makers at schools begin to see a commitment to providing stronger research, their trust in the results will increase. It is also in the educators’ interest to review the guidelines because they provide a reference point for what actionable research should look like. Ultimately, the Guidelines provide educators with help in conducting their own research, whether it is on their own or in partnership with the education technology providers.

2010-06-01

AERA 2010 Recap

Empirical Education had a strong showing at the American Educational Research Association annual conference this year in Denver, Colorado. Copies of our poster and paper presentations are available for download by clicking the links below. We also enjoyed seeing so many of you at our reception at Cru Wine Bar.


View the pictures from our event!

Formative and Summative Evaluations of Math Interventions, Paper Discussion
Division: Division H - Research, Evaluation and Assessment in Schools
Section 2: Program Evaluation in School Settings
Chair: Dale Whittington (Shaker Heights City School District)
Measuring the Impact of a Math Program as It Is Rolled Out Over Several Years
Reading, Written Expression, and Language Arts, Poster Session
Division: Division C - Learning and Instruction
Section 1: Reading, Writing, and Language Arts
Examining the Efficacy of a Sight-Word Reading Program for Students With Significant Cognitive Disabilities: Phase 2
Statistical Theory and Quantitative Methods, Poster Session
Division: Division D - Measurement and Research Methodology
Section 2: Quantitative Methods and Statistical Theory
Matched Pairs, ICCs, and R-Squared: Lessons From Several Effectiveness Trials in Education
Formative Evaluations of Educational Programs, Poster Session
Division: Division H - Research, Evaluation and Assessment in Schools
Section 2: Program Evaluation in School Settings
Addressing Challenges of Within-School Randomization
2010-05-20

Software Industry Sets High Standards for Product Evaluation Research

The Software & Information Industry Association (SIIA) announced the release of their new report, authored by our very own Dr. Denis Newman under the direction of the SIIA Education Division’s Research & Evaluation Working Group, the guidelines provide practical considerations and share best practices of product evaluation design, conduct, and reporting. Written primarily for publishers and developers of education technology, the guidelines reflect the high standards necessary to carry out rigorous, unbiased effectiveness research. Reviewers of the guidelines included Larry Hedges with Northwestern University, Robert Slavin with Johns Hopkins University, and Talbot Bielefeldt with the International Society for Technology in Education (ISTE). A delegation of software publishers presented the Guidelines May 17 at the US Department of Education to John Q. Easton (Director of IES) and Karen Cator (Director of the Office of Education Technology). The document is now available to the public at the link above.

2010-05-13
Archive