Using Machine Learning to Identify Impact Variation in Randomized Control Trials

Overview

In evaluating social policies, policymakers sometimes want to explore whether a program has different effects for different subgroups of participants. Research around such heterogeneous impact variation — different effects for different groups — can inform policy decisions by suggesting whether some participants may benefit more from a program. In research on educational interventions, researchers often conduct these subgroup analyses by looking for differences in impacts across a small number of socio-demographic characteristics, like gender, race, and socioeconomic status. Researchers specify these groups in advance, in order to reduce the likelihood of a spurious impact finding and forgo any subgroup analyses beyond those pre-specified. As a result of this approach, potential heterogeneous impacts may go undiscovered.

Funded by a Statistical and Research Methodology in Education grant from the Institute of Education Sciences, this project in the MDRC Center for Data Insights will examine the usefulness of an alternative approach to looking for heterogenous subgroup effects: using machine learning to let the “data speak” when identifying subgroups. Machine learning methods have the potential to identify heterogeneity that is based on a complex set of interactions across characteristics, which might be difficult to uncover using more typical methods.  The project will investigate three primary research questions:

  • Can machine learning methods replicate the published findings on heterogeneity?
  • Do machine learning methods suggest new findings of heterogeneous effects across subgroups that published studies missed?
  • Which characteristics of studies make it more likely that machine learning will be a productive technique for investigating the presence of heterogeneous effects? Also, which machine learning methods perform better in different data settings?

The project will apply machine learning methods to three data sets from four MDRC multisite randomized controlled trials — of the City University of New York’s Accelerated Study in Associate Programs (ASAP) and its replication in Ohio, Career Academies, and Growth Mindset.

The project will result in the following products:

  • A methodological paper submitted to a peer-reviewed journal, which will provide an opportunity to disseminate findings with methodologists and applied researchers.
  • An R program to assist applied researchers in using the identified methods in education research.