Advances in new technologies, when incorporated into routine health screening, have tremendous promise to benefit children. The number of health screening tests, many of which have been developed with machine learning or genomics, has exploded. To assess efficacy of health screening, ideally, randomized trials of screening in youth would be conducted; however, these can take years to conduct and may not be feasible. Thus, innovative methods to evaluate the long-term outcomes of screening are needed to help clinicians and policymakers make informed decisions. These methods include using longitudinal and linked-data systems to evaluate screening in clinical and community settings, school data, simulation modeling approaches, and methods that take advantage of data available in the digital and genomic age. Future research is needed to evaluate how longitudinal and linked-data systems drawing on community and clinical settings can enable robust evaluations of the effects of screening on changes in health status. Additionally, future studies are needed to benchmark participating individuals and communities against similar counterparts and to link big data with natural experiments related to variation in screening policies. These novel approaches have great potential for identifying and addressing differences in access to screening and effectiveness of screening across population groups and communities.
The potential for incorporating new technologies into routine health screening of pediatrics populations is enormous. With the introduction of machine learning, genomics, and novel diagnostics into the practice of medicine, the number of potential health screening tests is exponentially increasing. For example, newborn screening started with a test for 1 disorder, phenylketonuria, in the 1960s1 and has expanded to individual states testing for between 34 and 66 conditions in 2020.2
Historically, early diagnosis has been considered beneficial, and, because screening leads to early diagnosis, early screening must also be beneficial.3 However, early diagnosis and early screening only make sense in certain situations, for example, when screening leads to early interventions that improve outcomes. Ideally, direct evidence of the effect of screening on care received and ultimately on intermediate or long-term health outcomes would be available for policymakers to make decisions about screening appropriateness; however, such evidence is often lacking.
Randomized trials of screening in youth to assess efficacy of screening and treatment may be optimal, but they can take years to conduct and may not be feasible.3 Using longitudinal and linked-data systems on a more systematic basis is becoming more and more feasible in a wide range of contexts,4,5 allowing researchers to address complex issues of selection and getting closer to causal inference. Building on these linkages and connecting them to screening will also allow us to advance our current models to incorporate greater breadth (eg, community-level factors) and depth (eg, longitudinal clinical record linkages) of social context measures when assessing the impact of screening in childhood. Therefore, novel methods to evaluate the long-term outcomes of screening are important, including using longitudinal and linked-data systems to evaluate screening in clinical and community settings, school data, simulation modeling approaches, and methods that take advantage of data available in the digital and genomic age.
To understand the impact of screening, we need to understand recommendations for care that follow from screening, care received as a result of screening and recommendations, and proximal and distal health outcomes. We review creative approaches for assessing long-term outcomes of universal health screenings conducted from birth to 18 years old, providing specific examples and suggesting research opportunities. Because disorders may present throughout the pediatric age spectrum and up to adolescence, continued vigilance around screenings is necessary to catch problems at an earlier, perhaps emergent, stage, and to deliver prevention services.6 Interpreting these outcomes requires insights across the various settings where children are screened and receive care. Because outcomes for pediatric populations can take a lifetime to observe, durable approaches to longitudinal follow-up assessment are essential.
Coordination of Data Within Clinical Settings
Screening sites range from primary care or more traditional clinical settings7 (eg, behavioral health or dental offices), to clinical settings embedded within the community to enhance care access (eg, school-based health centers [SBHCs]8,9 ). One potential obstacle is that the clinical settings where children are screened are not necessarily where they receive care.10–12 Resultantly, coordination of care13,14 and information flows become a central consideration. Institutional electronic health records (EHRs) are notoriously siloed across sites of care and have historically been difficult to extract information from. Even within an institution, data may be siloed across various information systems and not well integrated. The Family Educational Rights and Privacy Act (20 USC §1232g; 34 CFR Part 99) is a federal law that protects the privacy of student education records. These challenges are compounded when the referrals stretch across different organizations, such as between primary care and behavioral health organizations, which are unlikely to have compatible information technology.
The Centers for Medicare and Medicaid Services Meaningful Use15 program provided incentives for health care providers to adopt EHRs to, among other things, improve care coordination and improve health outcomes.16 Because of Meaningful Use stage 1, which focused on EHR adoption and data collection by hospitals and health care professionals, launched in 2011, today’s children are likely to have their entire health history in electronic format.
Bridging patient records across organizations is needed to track health impacts of screening. Because EHR data are often site-specific, methods are needed to examine care across sites. Health information exchanges17,18 are one potential approach, but these organizations are generally regional, and many have failed to find sustainable business models may be best captured from payer data. Longitudinally linking EHR data to payer claims data19 from the Centers for Medicaid and Medicare Services and private insurers represents one approach for increasing the completeness of data used in tracking screening outcomes.20,21
Even so, there are challenges. Patients often switch insurance plans when they move, or when family member employment changes. In addition, payer claims are informative about what care was billed for, which may differ from care a patient received, which, in turn, may not necessarily be what was needed. It has been proposed that enhanced integration of EHRs within practices as well as methods available in the digital age may be solutions.
The 21st Century Cures Act specified a new form of health information technology interoperability that will underpin redesign of screening and measurement processes by collecting patient-generated data, empowering patients with applications that have access to standardized health system data, and extracting standardized population data sets from EHRs.22 Cures and the federal rule that implements the interoperability provisions require the EHRs have an application programming interface (API) that gives access to all elements of a patient’s record “with no special effort.”23 APIs are how modern computer systems talk to each other in standardized, predictable ways. The Substitutable Medical Applications, Reusable Technologies (SMART) on Fast Healthcare Interoperability Resources (FHIR) API,24 required under the rule, enables researchers, clinicians, and patients to connect applications to the health system across EHR platforms. For example, Apple used SMART on FHIR to connect its native iPhone Health App to EHRs at hundreds of hospitals.25 With extensions to SMART, data from sensors, data from mobile devices, patient-generated data, and patient-reported outcomes could become more measures in the evaluation of screening programs.26 The SMART/HL7 FHIR Bulk Data Access API enables the creation of standardized population-level data sets from EHRs.27
Moving Beyond the Clinic to the Community
To understand individual-level health screening in a broader context, it is important to link screening studies and clinical data to community and systems data at local and national levels. Such links will allow the influence of broader, systemic, community-level factors on health outcomes to be assessed in models evaluating screening impacts on health and wellbeing. For example, the importance of health-related social needs, such as housing insecurity and exposure to neighborhood violence, in shaping children’s health is well known.28–30 Research has also shown that community-level human and social capital dimensions and the organization of social networks affect community wellbeing as well as child health.31,32 Systematically integrating empirically and theoretically derived dimensions of community health will help us to better understand how screening outcomes may be conditioned by the following: (1) communities’ socioeconomic and physical infrastructure, connectivity, and safety and (2) social interactions, cohesion, and trust as they vary within and between community settings. Such dimensions of social ecology and community health are often critical selection factors and modifiers of health and screening outcomes, especially in the long term because children interact with their communities long after screening and often even during interventions. For example, community social capital dimensions, such as trust and close communication among neighbors, may influence both the chances for children to access screening (eg, for depression) and the outcome of screening. In such cases, omitting community social capital indices may bias estimates of screening effects on child health.
Developing novel approaches for considering the complex way that communities shape screening outcomes will depend on the following: (1) linking different types of data sets to explore linked complex systems across different levels of analysis,28,33 such as linking individual surveys with administrative data on communities and with data on employers and organizations, and (2) linking longitudinal data of different types of structures, such as individual surveys and administrative data about individuals, organizations, and communities, with dyadic data that highlight the links among organizations or communities. One example is to look at ties based on people’s shared exposures to risk and opportunities in different communities (such as schools and residential neighborhoods) or communities of work and home. Indeed, emerging evidence suggests that risk factors associated with neighborhood poverty and socioeconomic disadvantage can travel across commuting channels to diffuse health problems across space and communities over time.34 Such community- and systems-level longitudinal linkages are thus important to consider in advancing our understanding of health and long-term outcomes of prevention efforts.
For all their benefits, these approaches also often augment, and even more often create, joined data sets of high volume, high complexity, and rapid accumulation or change, creating great problems for analyses using standard tools and approaches. Big data analytics, by using computational approaches and machine learning, such as cross-classification and permutation techniques, can be of great help in dealing with the increased complexity issues.34–36
Longitudinal, national, and local surveys may be linked together to complement screening studies and clinical care data to better understand community health and levels of exposure to risk and to benchmark children’s outcomes to others in the community. Such integrated data systems are starting to emerge37–39 and are important for understanding theoretically derived dimensions of community health context. Some examples include the County Health Rankings and the Centers for Disease Control and Prevention 500 Cities project. Community-level systems may also be linked geographically to screening studies by using geocoordinates and geographic information system matching techniques, enabling important information to be included on the population-level context of geographically defined communities, such as census tracts or counties. For instance, indices of neighborhood socioeconomic infrastructure and safety may be created by using administrative data from the Census and American Community Survey and Uniform Crime Reporting, respectively. Longitudinal national and local surveys, such as the Behavioral Risk Factor Surveillance System and the Los Angeles Family and Neighborhood Study can be valuable in creating measures of community health and exposures to risks and opportunities in nonresidential areas, respectively.40 “Organic,” always-on social media data, such as Twitter and Facebook, may be used to link screening data as a measure of social capital and engagement within and outside a community. Such linkages and systems have the potential to bridge the gap between clinical and public health approaches, helping to assess and address health risk factors and screening outcomes among both individuals and population groups.41
Clinical care data can be also be aggregated to give a picture at the community level40 and linked to other individual- and community-level measures to enable a more complete view of the overall health and wellbeing context of people in the community as well as benchmarking comparisons of children’s risk exposures and screening outcomes to corresponding outcomes in the broader community.42 Examples include the following: (1) EHRs aggregated to the community level, (2) small-area health insurance estimate (eg, share uninsured), (3) area health resources file and national provider identification file (to understand the community availability of mental health providers and dentists), and (4) mapping Medicaid disparities tool (to measure the local rate of preventable hospital stays, mammography screening, and influenza vaccinations).
Thinking creatively about linkages and data systems is important to advance the field and bridge current evidence gaps. Key needs that may be addressed with such systems include identifying vulnerable communities and individuals most in need of childhood screening, identifying and addressing demographic differences in access to screening, and balancing communities in child health outcomes of screening. Longitudinal, linked-data systems will allow for a more systematic creation of a comprehensive set of indicators that are comparable across space and time. This will be important to enable assessments43 of screening effects on changes in the health status of communities and individuals over time, establishing benchmarking capabilities against similar counterparts.
Integrating School Data for a Whole-Child Approach
School partnerships offer creative, innovative approaches for additional screening and for assessing the impact of screening on long-term child health outcomes. Health and education are linked, and every health risk can affect academic success. Interventions can narrow disparities and improve both learning and health.44 A whole-child approach to education is defined by policies, practices, and relationships that ensure each child, in each school and in each community, is healthy, safe, engaged, supported, and challenged.45 The focus on health and wellbeing encompassed in this approach is well aligned with the goals of pediatrics and pediatric screening.
School health services staff can help all students with preventive care, such as influenza immunizations and vision and hearing screening, as well as urgent care. For students for whom screening has revealed chronic health conditions, school nurses and other health care professionals play a large role in daily care and management. School health services staff are also responsible for coordinating care by communicating with the student’s family and primary care medical home so that the students can stay healthy and ready to learn.46
To optimize the potential of SBHCs to improve academic success, more comprehensive databases that include both health and education information are needed. Through analyzing the interconnections between children’s health and educational success (and the role SBHCs play in influencing both), health and educational professionals are better able to serve and support the school and the SBHC in achieving health and educational goals. To this end, several communities have made progress in building linked health and educational information systems in recent years.47
School health data can be useful to assess youth risk behavior and to evaluate school health profiles and the impact of health policies and programs.48 Child development data and statistics are available through several national surveys.49 The University of Pennsylvania Actionable Intelligence for Social Policy works with state and local governments to develop integrated data systems that link administrative data across government agencies. Integrated data systems give governments and their partners the ability to better understand the needs of individuals and communities and to improve programs and practices through evidence-based collaboration.50 In Milwaukee, a Medical College of Wisconsin research team is analyzing linked health department and public school data to better understand factors associated with kindergarten readiness and third grade reading.51 Strive Together is another data-driven school-community collaboration, aiming to impact cradle-to-career efforts.52
Multilevel Integration Through Simulation Modeling
Simulation modeling is an innovative method that uses a mathematical framework to assess long-term outcomes of health screening. Simulation models can reduce complex processes that typically play out over time to key simplified elements and their consequences and thereby can aid in health care decision-making. Relatively simple simulation models tend to be cohort-based, whereas agent-based simulation models that account for the history and future behavior of individuals require individual-level microsimulation methods and are more complex.
Whether cohort- or individual-based, simulation models involve many decisions and assumptions, including the perspective, time horizon, target population, interventions, and outcomes of interest. High-quality data are also foundational to the integrity of these models. Concerns about the credibility of simulation models have called for the development of guidelines for good practices. Assessing the impact of screening is one application of simulation modeling.
With respect to screening, simulation models can help clinicians decide when to screen and can help policymakers evaluate if screening should be provided more broadly. A practical and feasible option is to develop a detailed mathematical model to simulate the natural history, clinical outcomes, and cost-effectiveness of integrating various genomic sequencing strategies into clinical care in the United States. A mathematical model provides an important link between scientific developments in genomics and the policy implications of using this information, both in clinical and economic terms. The mathematical model allows updating with the most current evidence in genomic medicine as it evolves. Thus, as new genomic technologies and screening tests are developed, their clinical use and economic value can be quickly assessed. The goal of using such a model is to project clinical and economic outcomes associated with alternative screening strategies to assess the potential value of genomic technologies for screening. This type of a model provides a durable platform for integration of genomic information into clinical care and health policy over the next decades.
An example of the potential role for simulation modeling is assessing the long-term outcomes of genetic testing. Advances in technology have led to the availability of childhood screening, including genetic testing for a wide range of conditions for healthy children or children at high risk. It is expected that the funds spent on genetic testing in the United States will reach 25 billion dollars by 2021. With the numerous uses of genomic information, understanding the clinical value and long-term impact of genomic technologies on morbidity, mortality, quality of life, and diagnosis and treatment costs is essential. Conducting genomic sequencing in the newborn period of life has compelling logic because it may provide insights for an active illness that an infant has or an early warning for future illnesses. Regardless of the cost of genomic sequencing in newborns, what remains unclear is how beneficial and valuable such population-based testing might be. Given the sample size and time horizon needed for a randomized clinical trial to study and provide timely estimates of the lifetime health impact and cost of population-based newborn genomic sequencing, this option is infeasible. Simulation modeling provides an alternative method for understanding the impact and cost-effectiveness of genomic sequencing.
Screening Pitfalls in the Era of Genomics and Artificial Intelligence
As we design screening strategies, it is essential to be aware of the pitfalls of screening for new biomarkers, particularly false positives, which will become more prevalent in the era of genomics and artificial intelligence. Artificial intelligence will involve ever-expanding approaches to passively and actively capture patient- and clinician-generated data. The use of wearables, trackers, home monitors, and other connected devices will produce a dizzying array of digital biomarkers. As genomic and other molecular approaches are increasingly used in the routine clinical practice, the number of biomarkers measured climb into the tens of thousands.
Tests of our complex and always shifting anatomy and physiology will catch states that may signal disease or may be incidental or fleeting. For example, some people experience scares (eg, cancer scares) that are later resolved with follow-up testing. The less fortunate are overdiagnosed and treated for conditions they do not have. Others may simply experience “cascades of care” and additional testing after false positives or incidental finding on screening tests.53 Furthermore, because of the revenue generated by the tests themselves, the drugs prescribed, and procedures performed to treat diagnosed conditions, economic pressures can drive increased screening for biomarkers, a phenomenon known as “biomarkup.”25
However, even with these advances in interoperability, there are important challenges in linking data across sites of care. The lack of a universal medical identifier makes it challenging to link a patient’s record across sites or care, or to de-duplicate data sets in which the same patient may be counted multiple times across different data sets. Although the Common Rule allows disclosures of de-identified data for research, such data can generally not be linked by using personal identifiers. The Health Insurance Portability and Accountability Act allows disclosures of health information for treatment, payment, and operations and specifies research uses. Often, linkage to external data sets requires specific consent from individuals, so substantial forethought and planning is necessary for planning data collection.
Future studies are needed to evaluate how longitudinal and linked-data systems drawing on community and clinical settings can enable careful and rigorous assessments of screening effects on changes in health status that could occur under differing conditions. Other research can benchmark participating individuals and communities against similar counterparts. Moreover, matching ongoing big data (administrative; survey; organic big data, such as Facebook and Twitter; phone records) with natural experiments related to variation in screening policies is needed across space and time.
Together, these approaches have great potential for identifying and addressing differences in access and agreement to screening and in understanding the heterogeneity in effects across population groups and communities in child health screening outcomes. Additional attention needs to focus on educating clinicians and patients about return of results, privacy, and confidentiality issues, especially related to big data. Use of simulation modeling to assess long-term outcomes and artificial intelligence to passively and actively capture patient- and clinician-generated data are likely to play important roles in the future.
Dr Wu conceptualized, drafted, reviewed, and revised the manuscript; Drs Graif, Mitchell, Meurer, and Mandl conceptualized and drafted the manuscript; and all authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
FUNDING: Dr. Wu received funding from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD; R01 HD090019-01 and R01 HD085993-01); Dr Graif received funding from NICHD (K01-HD093863 and P2C-HD041025); Dr Meurer received funding from the Medical College of Wisconsin, Advancing a Healthier Wisconsin Endowment, Research and Education Program Fund; Dr. Mandl received funding from the National Center for Advancing Translational Sciences/NIH (U01TR002623). Funded by the National Institutes of Health (NIH).
POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.