MDRC Center for Data Insights: Improving Programs and Systems with Actionable Data Science


Across the social sector, government agencies, educational institutions, and nonprofit organizations are all benefiting from greater access both to more detailed and frequent data and to a variety of options for increased computing power. With data-science tools and guidance in applying them, practitioners can harness multiple sources of data to gain new insights about the individuals they serve, the contexts in which they operate, their staff members, and their program features. When such tools are incorporated into daily operations in a responsible way, they can help practitioners improve their programs and the lives of those they serve.

With the launch of the MDRC Center for Data Insights, MDRC is furthering its long-standing commitment to helping our partners improve their programs and systems by harnessing the benefits of operational data-science techniques — those that produce actionable insights that can affect daily practice. Ranging from simple descriptive summaries to advanced machine learning algorithms, the center's projects aim to use institutions’ increasingly rich data to provide new insights that can help them refine and target their services.

MDRC has more than 40 years of experience in developing, evaluating, and improving social and education programs, as well as in managing and analyzing vast amounts of data in ways that meet the highest standards of responsibility, security, and privacy. MDRC frequently partners with government agencies, educational institutions, and nonprofit organizations to help them develop their evidence-building agendas and to understand the challenges and opportunities inherent in adapting programs to changes in contexts and populations.

The MDRC Center for Data Insights provides tools and support services to help institutions better manage caseload dynamics, better understand patterns of behavior, and better target individuals for interventions. In projects that are affiliated with the Center, MDRC staff members meet partnering institutions where they are. We help build an agenda for continuous improvement and provide analytic tools that support a partner's priorities. We assess data systems and suggest strategies for improvements. We explore opportunities for data integration and innovative data-collection methods. And we provide training and assistance in implementing open-source analytical tools and in interpreting and acting on results.

The Center helps government agencies, educational institutions, and nonprofit organizations that want to use data analytics to improve their practices, organizational cultures, and the structure of their work in a sustainable way, to ultimately lower their costs and improve the outcomes of their clients.

For more information, contact

Agenda, Scope, and Goals

The following projects are currently affiliated with the MDRC Center for Data Insights:

Temporary Assistance for Needy Families Data Innovations (TDI) will support innovation and effectiveness of state-level Temporary Assistance for Needy Families (TANF) programs by enhancing the use of data from TANF and related human services programs. This work may include encouraging and strengthening state integrated data systems promoting proper payments and program integrity, and enabling data analytics for TANF program improvement. Across its activities, the contract will support the use of data for understanding the broad impact that TANF has on families and will improve knowledge of how the federal government and state partners can use data to more efficiently and effectively serve TANF clients. Partners include Chapin Hall at the University of Chicago, the Center for Urban Science and Progress at New York University, and Actionable Intelligence for Social Policy at the University of Pennsylvania.

Partnership with the Center for Employment Opportunities (CEO) for Predictive Analytics, a partnership focused on harnessing granular, longitudinal administrative data to build a system for ongoing, advanced analytics that support CEO's continuous improvement process. Foremost, the project is using predictive analytics to provide early warnings and frequent updates of participants’ risks of not reaching milestones in CEO’s employment training program. The goal is for these early warnings to be transmitted, practically in real time, to front-line case workers and leaders, as part of their standard dashboards and data protocols. CEO plans to train staff members to act on this information, and to work with MDRC to design, implement, and test new interventions based on insights provided by the predictive analytics results. The project is also incorporating additional, iterative, and automated data analytics that will provide real-time monitoring of program outcomes. These analytics capabilities include attendance and attrition reports, A/B testing, and data visualization.

Subprime Lending Data Exploration Project, a “big data” project, funded by the MetLife foundation, is designed to produce policy-relevant insights using an administrative data set that covers nearly 50 million individuals who have applied for or used subprime credit. The data set contains information on borrower demographics, loan types and terms, account types and balances, and repayment histories. To investigate whether there were distinct groups of borrowers in terms of loan usage patterns and outcomes, MDRC used a data discovery process called K-means clustering. The project used several other techniques to derive insights into payday lending behavior including geospatial analysis and conjoint experiments. More recently, the project has exploited the enormous scale of the database  to analyze cross-border variation in payday lending usage as a function of whether states took advantage of the Medicaid expansion.

Validating and Improving a Pretrial Risk Assessment Tool is focused on assessing and potentially improving a tool that predicts defendants' risks of failure to appear for court, committing a new crime while awaiting court, and committing a new violent crime. The project team brings a unique combination of substantive and methodological expertise in cutting-edge predictive analytics methods while also understanding the pretrial criminal justice context and the practical considerations that could affect the usefulness of a risk tool in various jurisdictional contexts and its potential to expand to a larger scale.

New Visions for Public Schools (NVPS) Researcher-Practitioner Partnership for Predictive Analytics was a partnership in which MDRC researchers developed and implemented a comprehensive predictive modeling framework that allows for rapid and iterative estimation of a continuous measure of risk for each student at a point in time. The framework was implemented with data from NVPS's network of 70 high schools to estimate students’ risk of not graduating on time and of not passing the state algebra exam required for graduation.

The Long-Term Outcomes Study is an effort to produce new findings from older studies using new matching techniques and approaches. The project is assessing the feasibility of linking administrative data sets to program evaluation records, a promising and potentially low-cost means of tracking the long-term impacts of social interventions. While social programs are often designed to have long-term benefits for participants, many evaluations do not (or are not able to) track outcomes in the long term. With recent interest in making administrative data more accessible — reflected in the recommendations of the Commission on Evidence-Based Policymaking — it is important for researchers and others to understand whether and how these data sets can be linked to evaluation data sets. For the Long-Term Outcomes project, data are being collected from 16 employment-related evaluations to assess the practical and legal feasibility of accomplishing these links, to assess potential costs, to determine who owns various sources of data, to identify any history of links to other projects, to catalogue past findings, and to gauge the current availability of relevant data and metadata. Information is also being collected on administrative data sources, including the availability and content of the data sources, the identifiers needed to facilitate data linking, and the restrictions that may exist on who can access the data and for what purpose. 

Chicago Community Networks Study uses social network analysis to explore ways to consider power in networks of neighborhood organizations, how power is configured differently in different Chicago neighborhoods, and how these patterns can help communities respond to local challenges. Social network analysis can allow researchers to understand how neighborhoods differ in the levels and extents of these interactions across domains — how “comprehensive” community connections are. MDRC has produced state-of-the-art network graphics to help measure comprehensiveness in ties among local organizations in Chicago neighborhoods, and to show how comprehensiveness can help neighborhoods work together to build needed affordable housing and improve schools.

Nonprofit Data Science Initiative. MDRC has carefully evaluated more than 30 major nonprofit agencies and the findings from those studies have provided a blueprint for program improvement.  As part of our commitment to sustaining relationships and building capacity, MDRC is now beginning to partner with several of those agencies to create a set of advanced predictive analytics tools to help them identify clients at risk of not graduating high school, not completing training, not finding work, committing a crime, and more. These predictions are available for each individual student or program participant and use machine-learning methods to update risk predictions weekly. A hallmark of this work is that MDRC collaborates with the organizations when designing the tools and trains them how to use the tools going forward. MDRC also shares other analytical insights that are uncovered as the tools are developed.

The MDRC Center for Data Insights aims to advance a culture of rigorous, data-driven decision-making among government agencies, educational institutions, and nonprofit organizations by helping them to:

Use their existing and new data to the fullest extent while meeting the highest standards for privacy and security:

  • We assess data systems for quality and completeness and suggest strategies for improvements.

  • We help programs securely integrate data from multiple sources, incorporating unstructured data (for example, text and case data) and publicly accessible data.

  • We build customized information systems to track programs' participants, their use of services, and their outcomes, and we introduce novel data-collection methods (for example, with mobile and web apps).

Foster a strong analytic mindset for insightful and responsible research:

  • We help organizations develop analytical strategies to transform themselves into institutions of data-driven learning.

  • We help programs gain new insights into their programs and populations, bringing to the surface ideas for future program improvement.

  • We identify the right analytic tools to address particular challenges or answer particular questions.

  • We add new information to existing data dashboards or build new dashboards that summarize the most up-to-date information.

  • We train participants how to interpret results and apply them to real-world scenarios.

Use customized tools we build — and the latest advances in data science — to improve program outcomes.

  • We modernize case management across multiple systems.

  • We estimate program participants' risks of negative outcomes, allowing organizations to rank and target individuals or sites for interventions that are designed to be more effective for different risk levels.

Figure out next steps based on analytic results and new insights and evaluate their success.

  • We bring to bear MDRC's deep knowledge of the evidence base in many areas of social policy research.

  • We collaborate with MDRC's Center for Applied Behavioral Science (CABS) to develop innovative, low-cost interventions based on research from behavioral science (which includes behavioral economics, social psychology, cognitive psychology, and organizational behavior).

  • We rapidly measure the short-term impacts of new interventions or program refinements.


Design, Sites, and Data Sources

Government agencies, educational institutions, and nonprofit organizations are increasingly interested in using data to do better case management, to better understand patterns of behavior, to better manage caseload dynamics, and to better target individuals for interventions.

Sources of data for projects in the MDRC Center for Data Insights include:

  • Program data that are stored in various management information systems (for example, Salesforce) but are underused for program improvement

  • Student records data that are stored in various student management systems (for example, PowerSchool or PeopleSoft)

  • Outcome data stored in various government administrative records systems (for example, unemployment insurance wage records, TANF/Medicaid/food stamp benefit records)

The Center takes a very deep dive into the data. We think it is critical to understand the details and contextual factors relating to how data are collected, entered, stored, modified, and used. We ask our partners many questions, we do exhaustive checks, we provide details about all decisions for data processing, and we share our findings.

Staff members working on projects in the Center for Data Insights bring expertise as data scientists, data engineers, statisticians, economists, programmers, behavioral scientists, and public-policy experts. In addition, we draw on the expertise of MDRC staff members who are former social service and educational administrators, teachers, and front-line human service workers.

Our experienced team enables the Center to use whatever technique and level of sophistication makes sense given the problem and given the current organizational capabilities of our partners. Below are some of the techniques that we would use, depending on the problem at hand:

  • Predictive analytics (a full suite, including ensemble models)

  • Clustering/segmentation
  • Simple descriptive/inferential statistics
  • Data visualization (static and interactive)

  • A/B testing and rapid-cycle randomized controlled trials

  • Social network analyses

  • Factoring/scale creation

  • Conjoint analysis

  • Caseload dynamics

  • Clustering
  • Data integration
  • Text analysis

  • Advanced regression (panel/survival/geospatial/quantile regression/multilevel models/Bayesian shrinkage)

  • Data collection system design (building custom surveys/MIS systems, etc.)