Lessons from MDRC’s Predictive Analytics Hackathon
Predictive analytics—the use of historical data to estimate the likelihood of future outcomes—can be a helpful way for social-service organizations to anticipate needs and work proactively. By uncovering patterns in data about program participants, predictive models can identify individuals or groups who may face challenges, enabling program staff members to target support services and allocate resources more effectively. MDRC works with partners to explore these approaches in contexts such as education, housing, and family support, always emphasizing transparency, fairness, and practical application.
Recently, MDRC hosted a predictive analytics hackathon to deepen staff expertise and explore new applications. The event focused on building skills, refining MDRC’s predictive analytics framework, and sparking innovation through fresh ideas and approaches.
This post, authored by participants in the hackathon, shares lessons learned about predictive analytics during the hackathon.
1. Thoughtful scoping is essential.
Scoping determines whether using predictive analytics is appropriate for a program and lays a foundation for meaningful results. It involves clarifying what will be predicted, whom the predictions are about, and when in the program flow they will occur.
For example, members of one hackathon team working with postsecondary education data scoped their project to predict which students were at risk of delayed graduation early. Predictions were estimated soon after students enrolled in college—after initial performance data were available but still early enough to act on them and for an intervention to make a difference. The team defined the outcome (on-time graduation), identified predictors available at program intake and during the first term of the school year, and discussed how program staff members could use this information to provide timely support.
2. Ethics require active planning.
During MDRC’s hackathon, teams considered how their results would be used, what limitations might arise, and the ethical implications. Predictive analytics can unintentionally reinforce bias or create harmful consequences if ethical risks aren’t addressed early. Teams learned to use consequence scans—a structured process for identifying any potential harms and unintended effects of predictive models. This process involves asking several questions:
- Could predictions disproportionately affect certain groups?
- Might predictions lead to punitive actions instead of supportive interventions?
- How can researchers design safeguards to prevent any misuse of the predictions?
For example, a team predicting student suspensions recognized that without careful planning, predictions could perpetuate disparities in which students were disciplined. They worked with program partners to ensure predictions would trigger supportive measures—such as counseling or family engagement—rather than punishment.
Ethical planning isn’t a one-time step; it’s an ongoing process that includes reviewing model inputs for bias, monitoring outcomes, and speaking with partners about equity. To make this process possible, models should be transparent so that they are interpretable and so that researchers’ decisions are explainable to stakeholders.
3. A strong proof of concept lays a foundation for the predictive analysis.
A proof of concept is a preliminary analysis that tests whether predictive analytics is feasible and useful for a given context. It helps answer questions like Do we have the right data? Can predictions be generated accurately? Is predictive performance similar across groups (unbiased)?
During the hackathon, teams built prototype models and assessed performance and bias metrics to determine whether their predictions were reliable and fair enough to inform decision-making. For example, one team working with K–12 education data tested whether early attendance and engagement indicators could predict chronic absenteeism. Their proof-of-concept analysis showed strong predictive power, confirming that the approach was technically viable before they began considering deployment.
4. Simpler methods often perform just as well as complex ones—and trade-offs matter.
One of the most striking lessons from the hackathon was that more complex models, including machine learning, do not always outperform simpler approaches. MDRC’s research from multiple projects confirms this finding: Logistic regression or even simple indicator-based rules often predict outcomes nearly as well as advanced algorithms, especially when datasets are modest in size or when predictors are strong.[1]
Complex models come with trade-offs:
- Transparency: Machine-learning methods can be harder to interpret, making it difficult for program staff members to understand why a prediction was made.
- Cost and maintenance: Advanced models require more time, expertise, and computing resources to develop and sustain.
- Bias risk: Larger models can amplify biases if not carefully monitored.
For example, members of a team who work with projects related to families and children tested early marital satisfaction indicators to predict relationship challenges between married couples. They found that a straightforward logistic regression model performed nearly as well as gradient-boosted trees, with the added benefit of transparency for the program’s staff. Similarly, a team that focuses on K–12 education predicted chronic absenteeism among students and discovered that simpler models were easier to interpret and required fewer resources, while complex models offered only marginal gains.
The teams’ takeaway was that complexity should be empirically justified, not assumed. Researchers should start with simpler models, evaluate performance, and only move to more advanced methods if they offer meaningful gains toward the program’s goals.
5. Predictions show patterns, not causes.
Hackathon teams emphasized that predictive models identify correlations, not underlying reasons. Predictions can flag which individuals are at risk, but they don’t explain why.
An MDRC team predicting household hardship requests found that while the model identified which families in a housing voucher program were likely to struggle to maintain their lease, it didn’t reveal whether the cause was job loss, health issues, or something else. Teams agreed that predictions should be paired with qualitative information and program expertise to design interventions that address root causes rather than symptoms.
6. Feasibility and fit matter as much as technical ability.
Even strong models won’t succeed without operational readiness. Teams learned to assess whether programs had the data infrastructure, available staff members, and flexibility to act on predictions.
Two teams evaluating income-based rent policies discovered that incomplete administrative data and limited resources would make it difficult to effectively implement and sustain any additional support to program participants. Technical feasibility must align with practical realities for predictive analytics to deliver value.
Looking Ahead
The hackathon demonstrated that predictive analytics can inform program improvement in diverse areas—from housing and K–12 education to postsecondary success and family support. These lessons underscore that predictive analytics is not just about building models—it’s about aligning technical work with ethics, practicality, and program goals.
MDRC is building on this work by partnering with organizations to thoughtfully explore predictive approaches in new contexts while making fairness, transparency, and practical information a priority.
This post was written by several participants in MDRC’s predictive analytics hackathon, reflecting the insight and experiences gained during the event.
Read more about MDRC’s work in predictive analytics:
- The Value of Predictive Analytics and Machine Learning to Predict Social Service Milestones
- MDRC’s Approach to Using Predictive Analytics to Improve and Target Social Services Based on Risk
- Using Predictive Analytics to Combine Indicators of Third-Grade Reading Proficiency
- Pairing Predictive Analytics with Implementation Research
- Exploring the Value of Predictive Analytics for Strengthening Home Visiting
- How the MDRC Center for Data Insights Is Improving Programs and Systems with Actionable Evidence
- Keep It Simple: Picking the Right Data Science Method to Improve Workforce Training Programs
The analyses and conclusions presented herein are the work of the authors listed. Artificial Intelligence (AI) tools were used to help prepare the text.
[1] Camille Preel-Dumas, Richard Hendra, and Dakota Denison, “Keep It Simple: Picking the Right Data Science Method to Improve Workforce Training Programs,” OPRE Report 2023-058 (Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services, 2023), website: https://www.mdrc.org/work/publications/keep-it-simple-picking-right-data-science-method-improve-workforce-training; Kristin E. Porter, Polina Polskaia, Camille Préel-Dumas, Richard Hendra, and Dakota Denison, “When More Data Science Doesn't Mean Better Predictions: Insights from Workforce Training Evaluations” (MDRC, forthcoming); Samantha Xia, Zarni Htet, Kristin E. Porter, and Meghan McCormick, “Exploring the Value of Predictive Analytics for Strengthening Home Visiting: Evidence from Child First” (MDRC, 2022), website: https://www.mdrc.org/work/publications/exploring-value-predictive-analytics-strengthening-home-visiting.