College of Education/Human Dev
Twin Cities
Machine learning (ML) has been widely criticized for its purely data-driven nature without theoretical foundations and thus has been questioned for its utility in social science research. In the face of this limitation, these researchers have developed a literature-driven ML-based study paradigm, namely the data-based cross-study synthesis, which uses systematic reviews for predictor set construction for ML models. One example of its use was to examine the predictive value of high-dimensional data on adolescent family experiences for young adult educational achievement using the public dataset from the National Longitudinal Study of Adolescent to Adult Health (Add Health). The review of 101 studies that used Add Health data to examine links between adolescents’ family experiences and young adult educational attainment yielded 55 family experience variables that had been examined across these studies. Despite these prior studies, how all the variables tested may work together to predict educational attainment was unknown. Accordingly, this study used ML-based predictive models to address three questions: By incorporating adolescent family experience factors examined across prior studies in a single analysis, how accurately can we predict young adult educational attainment? Which family experience factors are the best predictors of educational attainment? And what complex patterns (e.g., nonlinearities, interactions) among family experience predictors merit further examination? More broadly, this research highlights the utility of data-driven, ML approaches in answering “big picture” questions about family and adolescent development.
This researcher is expanding on this effort to answer even broader questions. Specifically, the current project moves beyond the focus on family experience variables to include all variables examined in previous studies that used Add Health data to study predictors of educational achievement. By incorporating not only the family factors but also personal characteristics and factors in other contexts such as school, peers, neighborhood, and culture, the following questions can be answered with the updated ML-based cross-study synthesis: How important is the family context relative to other contexts and individual characteristics? What are the key predictors across different contexts during adolescence for future achievements? And how do these key predictors from different contexts interact with one another in the predictions?