Keep It Simple: Picking the Right Data Science Method to Improve Workforce Training Programs


By Camille Préel-Dumas, Richard Hendra, Dakota Denison

This brief summarizes findings and recommendations from a study designed to measure and compare the added value of models used to predict participant success within career pathways programs in the Health Profession Opportunity Grants (HPOG) Program. HPOG provided education and training in high-demand occupations in the health care field to Temporary Assistance for Needy Families participants and other individuals with low incomes.

As part of program enrollment and participation, workforce program providers have access to program data through their management information systems that has the potential to improve program outcomes through data analysis. In addition to improving program outcomes, analyzing the wealth of program data allows providers to identify participants at greater risk for program dropout and tailor the program accordingly to participants with an increased risk.

From a provider perspective, predictive models vary in the level of program value, costs, and complexity. For example, methods such as machine learning provide the ability for programs to identify hidden patterns without oversight from a data analyst; however, there are additional costs and complex processes associated with machine learning. To help inform practitioner decision-making and explore differences using real-world program data, the brief explores the tradeoffs of using simple to complex data science methodology to analyze career pathways program data.

Key findings from the brief include:

  • Program outcomes are predictable even when simple, cost-effective methods of data science are used. 
  • Within HPOG 1.0 programs, the most important factor to predict participant success is prior education level.
  • When one powerful indicator is used to predict program outcomes, results show that the simple model is only marginally less accurate than the best machine learning algorithm.
  • Complex methods such as machine learning provide small gains in predicting program outcomes compared to simple methods. These small gains should be balanced with consideration of program staff resources, decreased transparency in machine learning methods, and bias from the algorithm that can reinforce existing discrimination and inequity.

To help decide whether to use machine learning models, workforce program providers should weigh:

  • The need for improvements in predicting performance. Consider how the results will be used and how crucial it is to see improvements in performance predictions.
  • The size of the data set available for modeling. Depending on the size of program data, simple methods may be able to capture patterns adequately. For larger datasets, more complex models such as machine learning may be worth consideration.
  • The budget and study timeline. Staff resources and time will be required to test and learn new methods of predictive modeling. Consider the capacity and ability to develop a reliable model.
  • The potential sources of bias in the predictive study. Consider the areas that bias might be inherently present in the dataset or be introduced through the planned model of prediction. Identify solutions to mitigate inherent or introduced biases to contribute to the transparency and equity of a study.

Preel-Dumas, Camille, Richard Hendra, and Dakota Denison. 2023. Keep It Simple: Picking the Right Data Science Method to Improve Workforce Training Programs. OPRE Report 2023-058. Washington, DC: Office of Planning, Research, and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services.