Addressing the Covid‐19 pandemic and future public health challenges through global collaboration and a data‐driven systems approach – Ros – – Learning Health Systems


Covid‐19 has already taught us that the greatest public health challenges of our generation will show no respect for national boundaries, will impact lives and health of people of all nations, and will affect economies and quality of life in unprecedented ways. The collaboration required to address these challenges will be cross‐sector, multi‐stakeholder, and transdisciplinary in nature. Decades ago, the preeminent systems scientist and organizational learning pioneer, Peter Senge, recognized that the only sustainable source of competitive advantage is the ability to learn faster than one’s competition.1 Indeed, as the whole human race finds itself in competition with Covid‐19 for our lives, our health, our livelihoods, and our way of life, we share an imperative to learn rapidly together.

The types of rapid learning envisioned to address Covid‐19 and future public health crises require a data‐driven systems approach that enables sharing of information derived from multidisciplinary data, and lessons learned at scale. Agreement on a multi‐sectoral systems approach, with clinical, social, demographic, and administrative data, augmented by technology and standards, will be foundational to making such learning meaningful and to ensuring its scientific integrity and comprehensiveness. Decisions at local, regional, national, and global levels vis‐à‐vis reopening economies, mitigating risks, and understanding which regulations do and do not work are at the core. These are underpinned by the ability to reliably track and analyze key metrics, to learn from this information, and to translate the lessons learned back into practice. Such information will augment learning from experiences of patients and healthcare workers themselves; policy makers will inform and benefit from such learning as well. Shared understanding and the resulting data‐driven system will be foundational to ensuring trustworthiness and validity of the lessons derived from them.

The methods and values of Learning Health Systems (LHS) inform such an approach. In 2012, stakeholders spanning the health spectrum, with shared interest in learning from real‐world data to improve health, convened in Washington, DC and collaboratively developed a set of consensus LHS Core Values.2 These Core Values and the vision they represent have been endorsed and incorporated into strategic planning by public and private organizations, and at regional, national, and international levels.3 These values teach us that data alone cannot solve these problems. The collection, analysis, and application of data to address pandemics must be part of an integrated multi‐sectoral system and must also be informed by the key values of inclusiveness, transparency, and respect for privacy, in order for all people whose lives are affected by a pandemic to trust the shared learning process and the decisions and actions such learning informs. The common experiences thus far of Spain,4 Italy, and the United States related to the Covid‐19 pandemic, as well as the knowledge gleaned from these experiences, provide object lessons in the effects of ignoring and/or adhering to these values.

With this purpose in mind, a group of individuals from these three countries have formed a transatlantic collaboration, with the aim of generating a proposed comprehensive standards‐based and data‐driven systems approach for collection, management, and processing of high‐quality data, followed by transformation of these data into actionable information that can drive well‐informed decisions to underpin the management of clinical responses and social measures to overcome the global Covid‐19 pandemic and to prepare for the future. It is well known that there are groups that are critical in terms of the spread of infections, such as the elderly (e.g., nursing home residents), chronic disease patients, health professionals, temporary workers, and socially fragile or underserved communities; in fact, there are analyses showing that these vulnerable groups are disproportinately suffering from the impacts of this global pandemic. Unfortunately, the data‐driven knowledge currently used to deal with the special needs of these highly vulnerable populations is woefully inadequate.5, 6 A pandemic in the digital era should not be fought with analog data and manual procedures of the previous century. Health systems operations and management can and should benefit from powerful 21st‐century technologies and tools, including mobile sensors, wireless connectivity, high‐power processing, big data analytics, deep learning, machine learning, and artificial intelligence.7

Although we have scaled this transnational effort to three nations, we hope to stimulate an international dialogue with a culmination of realizing a more global standards‐based, data‐driven system. We believe that what we are proposing will be foundational to enabling new pandemic management capabilities that are based on rapid learning from actual multidisciplinary data that will empower informed decision‐making by policy makers, accelerate research into treatments and vaccines, and ultimately save lives and control economic damage.

In the sections that follow, we first argue that the data inputs that drive the systems approach must be standardized, using regulated clinical research as an example; well‐defined and broadly adopted data inputs will be the essential fuel that will power a global system for addressing current and future pandemics. We then present a blueprint for a system that will convert these data into information and knowledge upon which a range of key decisions can be based. In the context of this system, we describe and categorize the specific types of data the system will require for different purposes and document the standards currently in use, if any, for each of these categories, with a focus on the three nations participating in this work. In so doing, we anticipate challenges in building and acheiving consensus around definitions and data standards, but also suggest opportunities for further global collaboration in this regard.


This narrative will begin and end with discussions of data, the precursor to knowledge. As Nigam Shah wrote during the early months of this pandemic, “There are enough models, we need accurate inputs.”8 The accuracy, quality, and appropriateness of data inputs are critical to the validity and usefulness of a model. Johns Hopkins University, which has been a key tracker and reporter of cases of Covid‐19 globally, provides a lengthy explanation on why inconsistencies that exist in data reported from Covid‐19 testing have the potential to negatively influence the models that are based upon these data.9 When there is variation in the way data are collected and reported, the accuracy of the models and projections based upon these data are compromised, which may lead to inappropriate decisions that can ultimately cost lives. Conversely, broad adoption and use of standard definitions for the data inputs and the metrics or indicators being tracked will facilitate the sharing of data across organizations and aggregation of data from varied sources; in turn, this enables valid comparisons across reporting sites and increases the accuracy and statistical power of the information upon which learning and knowledge are based and decisions are made.

Data sharing has been studied extensively10; there are numerous factors that must be addressed, including alignment of incentives, ethics, patient privacy, and other issues that are included in the LHS Core Values. Acknowledging that the system proposed herein must respect all of these values, this discussion focuses on the overall data‐driven systems approach and the important role of data standards. Ethical and responsible data sharing entails the meaningful exchange of data, and thus the use of robust data standards.11 These must be considered in the foundational planning and design stages of such a system. Data standards must describe both the data itself and associated metadata (i.e., the data about the data) to ensure that the intended meaning of the data being shared can be readily understood and interpreted by the recipient (semantic interoperability). When data are collected without common definitions or without metadata, they cannot be readily aggregated, understood, or compared; hence, the meaning of the results may be lost or misleading. Robust data standards should be non‐redundant (i.e., unique and/or harmonized) and developed through a consensus‐based process such as that of standards development organizations (SDOs).

Regulated clinical research can be used as an illustrative example of the value of global data standards, particularly when time is of the essence during a pandemic or public health crisis. When data are collected differently at the start of a research study, the necessary mapping and interpretation to compare or aggregate data after the fact are costly and time‐consuming; quality may be compromised and precious but uninterpretable data may be lost.12, 13 In contrast, incorporating global industry‐wide data standards into a research project at its outset can reduce the start‐up time for a research study by 70% to 90%, and can correspondingly reduce the time required to create tables and analysis datasets from results across studies by 40% to 60%.14 These are important time‐saving opportunities during a pandemic. The World Health Organization (WHO), the International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC), and the Infectious Diseases Data Observatory (IDDO) learned this lesson through previous outbreaks, such as Ebola.15 For Ebola, a standard data collection form was developed so that the efficacy and safety of potential cures could readily be compared without spending unnecessary time mapping and interpreting data at the end. Based on the Ebola experience, there are now global research data standards available to support research on Ebola vaccines and virology. These augment a set of foundational clinical research standards that apply across all research studies (e.g., demographics, medical history, medications, adverse events). The relevant therapeutic‐area specific standards were leveraged such that a Covid‐19 standard was developed by the Clinical Data Interchange Standards Consortium (CDISC) and many important partners in rapid fashion (<6 weeks vs ~2 years).16 This standard is harmonized with the WHO/ISARIC/IDDO data collection form for Covid‐19 and can substantially decrease the time necessary to initiate and conduct research studies on various potential treatments and vaccines for Covid‐19. Comparisons across such studies are also facilitated since the results are expressed in a common manner. In addition, the CDISC standards for tables and analysis datasets are currently required by regulatory agencies (FDA and PMDA) to facilitate their reviews of the data submitted to support approval of new therapies and vaccines.17

Unfortunately, consensus‐based global standards, such as those available for clinical research, are not available or are not being implemented for many of the data inputs necessary to provide information upon which decisions should be made to manage other relevant aspects of a pandemic. Global collaboration in the development and adoption of data standards of the type that has been largely achieved in regulated clinical research,18 will be foundational to driving a real‐world rapid learning process for controlling the spread of Covid‐19 and future pandemics.

In the case of vaccine development, the importance of such collaboration was recognized in the comments of Seth Berkley, CEO of Gavi, in Science magazine19 when the Covid‐19 pandemic struck: “If ever there was a case for coordinated global vaccine development effort using a ‘big science’ approach, it is now. There is a strong track record for publicly‐funded, large‐scale scientific endeavors that bring together global expertise and resources toward a common goal.” While we should not expect such endeavors—whether directed at vaccine development, infection control, or treatment—to be coordinated in a top‐down hierarchical fashion, they cannot progress in the absence of shared standards. The Internet provides another useful model for global collaborative standards development and coordination. The Internet standards were developed by an ever‐expanding international community of practitioners that built on an initial pilot implementation. The Internet remains today a “loosely organized international collaboration”20 that is remarkably able to support a global communications infrastructure. The Internet has also been successful in large part because it benefits from an international public interest coalition of coordinating organizations, which serve to enable broad industry, government, and research collaboration. Today, we can embrace a comparable approach to enable our response to Covid‐19 and future pandemics. The data important to manage pandemics are much broader in nature than the data required to develop therapies and vaccines; it will include health data, contact tracing, mobilization of supplies, availability of hospital beds, and other inputs into an entire data‐driven system. Global standards for all of these types of data inputs, along with a global coalition, will be needed to coordinate the effort to address the Covid‐19 pandemic and lay the foundation for more rapid and coordinated responses in the future.


An information ecosystem, established with a foundation of standardized data inputs, could become an integral part of health protocols to improve the management of Covid‐19 and future pandemics, at regional or national levels or global levels. If data are collected at a local level using a standard approach that has global consensus, then the results and information generated from these data can be compared broadly. It is widely accepted that most national information systems have had significant limitations in coping with this pandemic challenge.21 Clinical and medical data are not integrated among themselves and, moreover, they are not integrated with social, demographic, and mobility data. Certainly, they are not integrated with the results of clinical research. Furthermore, data are often incomplete and/or incompatible with the standards required for their integration with other data sources. Even when the data are adequate, the integrated processing system required to transform data into productive, tailored, and actionable knowledge is often fragmented or non‐existent. As a result, it is impossible to base healthcare decisions on the behavior of the pandemic and to evaluate the effectiveness of those decisions.

What we are proposing here is a comprehensive systems approach, placed at the core of the management of the pandemic, at regional or national level. Data are accessed from relevant healthcare units or public authorities and are processed in accord with a multidisciplinary Control Panel, which dictates the parameters and indicators required for intelligent production of the necessary information for comprehensive management and decisions making. It is important to emphasize that access to clinical data alone will not be sufficient to manage this pandemic. Clinical data must be integrated, on a continuous basis, with social, contact tracing, and mobility data, as well as with health resources monitoring data and administrative data, to build an integral and reliable pandemic management system. The resulting complex system was initially conceived as one diagram in Figure S1 in Data S1. However, two simplified figures are shown herein to itemize the necessary connections and flows of data, knowledge, infected people, and decisions among the different system units (Figure 1) and to explore the people/patient flow aspect in more depth (Figure 2). The system includes information flows providing feedback loops enabling continuous and rapid learning based on day‐to‐day experience. This cycle of expanding knowledge, based on the processing of data and derivation of information from actual experience with corresponding research and practice, is the foundation for a continuous LHS. In other words, the main objective is to propose the use of the highly‐advanced digital methods of our 21st century to handle a very complex public health challenge. In fact, the WHO is currently promoting the acceleration of the digital transformation of health systems: “The transformative nature of digital technologies for health is undeniable ……Long‐term systemic changes are needed, including a change in the culture of using data.”7


A systems approach to data management for Covid‐19


Flow of people/patients through clinical units

When fully mature, this digital data‐driven systems approach should:

  • Improve the quality, efficiency, and timeliness of public health decisions, including the management of clinical resources (always scarce at pandemic peak times) and the regulated closing/opening of regions through modulated control of restrictive measures.
  • Inform the actual development and testing of new therapies or vaccines and provide timely awareness of therapy reactions, virus mutation, and new symptoms.
  • Operate at regional, national, or global scales and, through standardization, enable compatible systems in different geographies to readily compare results leading to shared learning and improved decision‐making.
  • Gain public trust through a rigorous, independent, and systematized scheme, based on the LHS Core Values and full compliance with privacy regulations in the various regions.

Figure 1 is a high‐level representation of a block diagram of such a data‐driven dynamic system, with its main involved institutions, data sources/inputs, information processing, and decision‐making. The different flows of data and decisions/recommendations, are portrayed in different colors, to facilitate their visualization and comprehension. A second representation (Figure 2) describes a detailed flow of people/patients, expanding upon the content of Block 2 of Figure 1.

Block 1 is the Processor Engine and Control Panel unit. The Control Panel should be managed by a multidisciplinary group of experts, including health professionals, medical and pharmaceutical researchers, IT experts, mathematicians, civil servants, and policy makers. This group will set both the list of primary data to be directly collected, and the computed indicators required to transform the data into useful information or knowledge to be made available to managers and policy makers at different levels. For this systems approach, an indicator will be a sign, symptom, or index showing synthesized and relevant information, mostly quantitative, related to the pandemic. Indicators will be elaborated from primary or raw data from the various units and should be represented in tables or graphics showing their historic evolution. In some cases, an indicator will itself represent primary data to be reported by the corresponding unit, and the knowledge elaborated by the data processor will consist of the historic evolution and forward projection of that indicator. Some indicators will require a rather straightforward processing of data while others, like the reproductive number (R0) by territory, or the forward projection evolution of relevant indicators, will require more sophisticated computing. As an example, most of the territory related indicators will be knowledge elaborated by the processor engine from primary data reported by the various units. Outputs resulting from the elaborated computing of the processing unit represent knowledge or information upon which informed decisions can be made. That is, the units in Blocks 2, 3, and 4 will be sending data to the engine unit, which will be converting these data, with the help of Control Panel experts, into useful information and knowledge for units in the other blocks to adopt and implement decisions.

The main objective of the indicators should be to set data that will be (a) accurate and meaningful, (b) consistently defined, (c) capable of being coded in standard data fields and formats, and (d) computable. These indicators should look to identify the impact of the infection and of the adopted decisions. These indicators could be organized in four categories or groups, for the sake of simplicity: case related, health system related, public administration related, and medically related. Examples of indicators are provided below for each unit in the diagram; a final list of indicators will always be made by local/regional authorities, customizing as appropriate to their local conditions and needs. As discussed in Section 4, standardization of data, definitions, and metadata for all data inputs will facilitate data processing, sharing, aggregation, relevant comparisons, and analyses to generate knowledge.

Once the Control Panel experts have defined the indicators to be used for that region, Block 1 becomes the system’s main data receptor. Its objective is to process the data received from the other units using tools such as big data analytics, machine learning, and artificial intelligence to transform these data into useful information and knowledge, as prescribed by the Control Panel experts. That is to say, the processor should be able to learn how to predict the evolution of the key parameters, leveraging the aforementioned tools. This timely dynamic combination of past and future evolution provides the knowledge base undergirding the decisions from this data‐driven system. Critical analyses by the experts of the outcome information supplied by the processor should provide additional feedback to continuously monitor the processor and improve its learning algorithms. In addition, Block 1 will be the point of connection to other equivalent engines of regional, national, or international systems, sharing data as deemed appropriate to increase the common database to improve parameter estimations and learning algorithms.

Block 2 of Figure 1 will provide all the data related to people and a patient’s course through the clinical units. To better understand this course of actions, let us unfold Block 2 (Figure 2). People will enter the system if they are tested or have been in contact with someone who has symptoms or if they have symptoms. Depending on the situation, a person will go to (i) the main “sickness‐oriented entrance doors” outlined in Block 2.3 (Primary Care, Emergency Centers, Telemedicine or Nursing Homes), or (ii) they are part of a testing or contact tracing program outlined in Block 2.2. If they are detected positive (Block 2.4), they may be sent either to a hospital (Block 2.5) or monitored via telemedical assistance (Block 2.6).

The vast abundance of telemedical devices allows for the treatment and necessary control of many infected people without having to send them to hospitals, where they might make unnecessary and wasteful use of scarce resources. In fact, because many chronic diseases are already monitored via telemedicine, telemedicine is both a door of entrance to Covid‐19 detection and a place of destination for infected people who require illness control that can be performed from their residence or nursing homes.7 It is also important to emphasize that the larger the role of primary care units in test and contact tracing programs, the better the balancing of clinical resources for the whole system. Contact Tracing, in particular, drives and registers the information related to tracing contacts of infected people22; it is, therefore, of growing importance and consideration. Many countries are now developing and allocating the necessary resources, including mobile applications, to improve contact tracing.23 Given the time that will be required to develop and fully distribute a vaccine, many experts see traceability as the most effective tool, much as a “technological vaccine” for overall pandemic control.24 Here again, global technology compatibility and full respect for privacy regulations are critical.

Examples of case‐related indicators that will be made from primary data requested to units of Figure 2 are given below. In parenthesis are displayed the blocks that will need to provide the primary data, ideally in real time, to build the graphics showing both the historic as well as the expected evolution of those indicators:

  • Number of new cases by age groups. (Blocks 2.2, 2.3)
  • Number of new cases, by age group, requiring hospitalization/ICU, by territory. (Block 2.4)
  • Number of closed cases, either by death or resolution‐cure. (Blocks 2.5, 2.6)
  • Rate of new cases among special groups: health professionals, social health, security forces, armed forces, nursing homes …. (Block 1)

The need for continuous coordination with public authorities regarding mobility control, specific rulings, lockdown regulations, and public force involvement, among other agency actions, are important components for the management of the pandemic. Block 4 of Figure 1 is the component of the system diagram to handle data based upon these important matters. Examples of indicators related to this block are shown below. In parenthesis, again, are the key blocks providing the primary data to build those indicators. These examples are divided into two groups: the first one relates to the management of the overall Health System resources and the other one relates to indicators connected to Public Administration units directly involved in the management of the pandemic. Although all the indicators shown below are important to quantify relevant pandemic parameters, the first bullet should be emphasized, due to the reality of the very negative impact the saturation of pandemic health resources, including staff, is having on the treatment of other diseases. Some experts are starting to talk about a syndemic, instead of a pandemic, to refer to the aggregation effect of the Covid‐19 to other diseases, mainly those of a chronic nature for socially vulnerable groups.5 For this calculation, it should be important to include both public and private resources. An effort of this nature will require a strong and productive public‐private‐partnership.

Examples of indicators connected to the Health System are:

  • Number of new cases, by age group, which the Health System can absorb without underscoring the “normal” demand. (Blocks 2.5, 4)
  • Average time from onset of symptoms to hospitalization, by age and sex. (Block 2.4)
  • Average time from hospitalization to ICU admission, by age and sex. (Block 2.5)
  • Average time from onset of symptoms to death, by age and sex. (Blocks 2.5, 2.6)
  • Number of deaths classified by whether they were in wards, ICUs, or outside the hospital (nursing homes, other). (Blocks 2.5, 2.6)

Examples of indicators connected to involved Public Administration units are:

  • Origin of cases (autochthonous, other areas/territories), to adjust the need for mobility control. (Blocks 2.2, 2.3)
  • Number of people who have been in contact with confirmed or suspected infected people who are being hold in preventive quarantine, by location, age, and sex. (Block 2.2)
  • Rate of people susceptible to be infected by territory, defined as 1 % people who have developed immunity (detected by sampling with rapid tests, or other procedures). (Block 1)
  • R0 (basic reproductive number) by territory. (Block 1)
  • Volume of users of public transportation. (Block 4)

Finally, it is important to also refer to Block 3 (Figure 1). This block represents the connection to medical, pharmaceutical, and R&D professionals who are working on developing, testing, analyzing, and investigating new or existing medical products, therapies, and vaccines. They will be regularly providing to the experts in the Control Panel updated information regarding proposals and research results of new infection symptoms, new and existing treatments, virus mutations, and vaccines progress. The panel will evaluate the need to adjust test instructions and quantifiable parameters. There will be a close connection between the information provided by these “external” sources and the more operational units of Figure 2. For a more systematized and broader way of cross‐fertilization between Block 3 and Figure 2 units, this exchange of information should be done through the Control Panel of Block 1. From the input received from Block 3, the Control Panel experts will be providing recommendations and useful processed knowledge to the clinical units of Figure 2.

Examples of indicators related to Medical Activities in Block 3 could be:

  • Number of people with positive serological IgM and IgG tests, confirmed by PCR negative tests, which can be classified as immunized. (Blocks 2.2, 2.3)
  • Average time since infection to onset of symptoms in identified cases, according to age and sex. (Blocks 2.2, 2.3)
  • Tabulations of aggregated patient data and analysis datasets, from clinical research studies, indicating efficacy of existing therapies given to Covid‐19 positive patients. (Block 3)
  • Efficacy and safety data in tables and statistical analysis results of Phase 1, Phase 2, and Phase 3 research studies with new therapies and vaccines. (Block 3)
  • Types and numbers of adverse events from research studies with therapies (new or approved) and potential vaccines. (Block 3)

In summary, a systems approach to pandemic management relies on the ability to gather and monitor high‐quality data to enable efficient and productive management of the propagation and progressive containment of Covid‐19 or other pandemic infections. Such data are foundational to efforts aimed at improved practice based on learning; what is learned will be only as good as the data analyzed and the algorithms used for its processing. For data to be useful and operational, such data must be organized according to important components and features, as described below:

  1. Specification and standardized definitions of key indicators, both clinical and social in nature, cataloged as necessary for follow‐up and control of the pandemic, prioritizing those related to controlling scarce clinical resources and to testing and traceability of infected (or potentially to be infected) people, in particular those belonging to critical groups (elderly patients, chronic disease patients, medical professionals, and fragile social communities) and their contacts.
  2. A data capturing plan, with clear obligations for data captor responsible, and with permanent updating routines, oriented to gathering and aggregating data from all relevant sources (sample tests, primary assistance, nursing homes, clinic, hospitals, medical histories, clinical experiences, private contributions).
  3. A schema for collecting and coding data in accordance with pre‐established or newly proposed standards, to allow efficient processing and interoperability allowing comparisons across equivalent data sets from other geographical areas, or complementary databases.
  4. A data repository that enables advanced processing and learning capabilities, through big data analytics, machine learning, and artificial intelligence, which could generate recommendations of specific anticipative actions based on future projections of relevant indicators.
  5. A public presentation of this data repository as a trustworthy data platform, with clear guidelines for public access, ownership, detailed management rules, and transparent privacy conditions, to generate maximum confidence and collaboration from society.

It is important to highlight, as has become evident through this text, that this systems approach is not just an exclusive Health System issue, but a broader Public Administration matter. A problem of this nature requires involvement and coordination with multiple administrative public and private units. Furthermore, only strong and solid data‐driven management paradigms will reduce the negative impact of cognitive biased interpretation of pandemic signals, consisting of the natural inclination of specialists to look at a polyhedric matter from a single angle and then to extrapolate from it.


As discussed, a data‐driven systems approach to managing a pandemic requires data standards; such data standards are well‐defined and commonly used data inputs. The types and definitions of data that should be collected in connection with the indicators to support the systems model fall into a number of different categories associated with various use cases. Types of standards also vary; they include quality standards (e.g., certifications/licensure), content standards (e.g., data, data definitions, terminologies, codelists, and metadata), and transport standards (e.g., when moving data out of EHRs for various “secondary” purposes such as research or safety surveillance or when exchanging content). There are opportunities to leverage existing standards and/or develop new global standards to optimize the data acquisition and sharing for and among each of these categories and the associated entities that provide the data. Eight categories or use cases were identified and are described in more detail in Table S1 in Data S1; these use cases align with the integrated block diagram for a systems approach to pandemics in Figure S1 in Data S1. Figure S1 in Data S1 integrates the diagrams of Figures 1 and 2 into a single and more detailed diagram.) These categories include test kits/testing, laboratories, contact tracing, observational research and epidemiology, clinical research, clinical trial registration, public health research, and safety surveillance.

Table 1 in Section 5 summarizes the available standards of Table S1 in Data S1, with a focus on Italy, Spain, and the U.S. Included in Section 4 is a discussion of the data standards that would facilitate the proposed systems approach, based upon the prior Figures 1 and 2. First is a discussion of the available standards that may support data inputs for Block 2 (Figure 2), the Flow of People/Patients (e.g., contact tracing and testing), and when a positive Covid‐19 test‐drives the Person to be a Patient, who enters the Health System. This is followed by a discussion of standards that may support data inputs more broadly for Figure 1, including the use of EHR data for observational research and clinical research. For a data‐driven system to be successful, it will be important to build consensus around the standards that are most useful, to encourage broad adoption of these standards, and to collaborate in the development of additional standards where necessary. Even with the use of sophisticated tools such as machine learning and artificial intelligence, the benefits of having comprehensible standard data inputs are clear; doing so leads to more valid, reliable, and meaningful results and outputs that are more rapidly available and usable.

SUMMARY table of connections between examples of available standards and systems diagram blocks
Use Case: Standards Need and Availability Link to Systems Diagram

Testing, Test Kits, Laboratory Test Results:

Laboratories and Test Kits require certifications (e.g., CLIA in the US) and approvals in each country.

Consensus around how to report test results is lacking.

The LOINC Codelist is available; however, new codes were added for Covid‐19.

Lab results requirements have been posted by HHS; these should be compared with healthcare and research data standards and aligned.

People/Patient Flow—Blocks 2.2 and 2.4 (Figure 2)

Contact Tracing:

Apps are in development, piloting, and implementation; however, use is inconsistent across U.S. and there is no standardization of data across apps.

Spain has recently developed an app and the Asturias region is piloting a new contact tracing methodology.

Italy is also piloting a new app for this purpose.

People/Patient Flow—Blocks 2.2 and 2.4 (Figure 2)

Healthcare (EHR) Data:

Disparate data standards exist among EHRs/vendors; however, the U.S. has now identified a CORE set of data, which must be provided in the future to patients, from EHRs in HL7 FHIR.

HL7 FHIR is of interest globally, but adoption and resources are currently inadequate for Covid‐19 analyses or for research; further development and consensus building are needed.

“Real‐world data” from EHRs at large academic institutions are now being aggregated using common codelists and the OMOP data model (N3C) or by private companies (e.g., TriNetX) with proprietary data models.

People/Patient Flow—Blocks 2.3, 2.4, 2.5, and 2.6 (Figure 2)

Clinical Research, Vaccine Development, Public Health Research, Safety (Adverse Event) Reporting; Clinical Trial Registration:

Global clinical research data standards (CDISC SDTM, ADaM, and define.xml) are required by the U.S. FDA and Japan’s PMDA (and are endorsed by Europe, China) to submit data in support of new treatment and vaccine approvals. Collection of data using CDISC CDASH) is strongly encouraged to minimize “back‐end” mapping into SDTM and ADaM and to enable direct cross‐study comparisons of clinical trial results.

Standard controlled terminology complements the CDISC standards and is hosted by the NIH/NCI Enterprise Vocabulary Services.

COVID‐19 CDISC TA standard user guide has been published. The WHO/ISARIC/IDDO data collection forms have been annotated with CDISC elements and are in use by ~40 countries.

Master protocols can standardize research studies to simultaneously compare multiple therapies. These are being encouraged by policy makers and regulators.

For registering clinical trials in the public domain, one standard (for Clinical Trial Registration) can populate three international registries—WHO ICTRP, EudraCT, ct.gov; all clinical trials in progress for new therapies and/or vaccines should be registered in at least one of these registries.

Medical Research and Vaccine Development—Block 3 (Figure 1)

Health System and Public Administration:

The indicators driving the data for these areas are largely centered around numbers of people, case numbers, time, outcome (e.g. death or resolution), race, and sex. These are deceptively simple metrics, currently without global standards. Developing such data standards will require collaboration to build consensus on the definitions of what is being counted and how to report the information. Standards for demographics and time/date data could be adopted from CDISC or HL7; there is incentive to align these standards.

Health Regional System and Public Administration—Block 4 (Figure 1)

(also relevant to People/Patient Flow Blocks 2.3, 2.4, 2.5, and 2.6)

4.1 The flow of people/patients

Block 2, the People/Patient Flow block, is complicated in a pandemic, which is no longer simply a Health System issue, but also a Public Health and Administration problem. Block 2 is, therefore, expanded further into a separate diagram (Figure 2). Data driving these aspects of a pandemic are often disparate, difficult to collect, and may rely largely on human behavior. Yet, they are critical to the successful management of such a widespread crisis. Included are test programs (testing, test kits, and laboratory data) and contact tracing, which are typically done separately from the actual patient care that may occur in hospitals, emergency centers, nursing homes, and via telemedicine. The standards available for such data inputs currently include, for example: (a) quality standards in the various regions for the certification of laboratories and test kits; (b) LOINC,25 a codelist for laboratory data that are widely used by laboratories and electronic health records; and c) WHO codelists such as the International Classification for Diseases (ICD)‐9, 10, and 11,26. However, with the onset of Covid‐19, new tests and test kits had to be developed and it was necessary to identify new codes, including LOINC and ICD codes, within the EHR systems to represent certain Covid‐19 symptoms and procedures related to the treatment of this particular virus.

Challenges have arisen in the collection and reporting of data from Covid‐19 tests; for example, issues arising from combining data from different types of tests when these should have been reported separately (e.g., specific viral tests and antigen tests vs antibody tests) were referenced with respect to the Johns Hopkins University’s COVID‐19 tracking statistics. Antigen tests may be misleading since they are not specific for coronavirus. Test kits normally require approvals before they are distributed, which provides an opportunity for standardization in terms of quality; however, some test kits were marketed without appropriate approvals.

With respect to laboratory results, the U.S. Department of Health and Human Services announced on 4 June 2020 laboratory reporting requirements of test results for Covid‐19; specifically, this is a list of elements that are to be included in the reports.27 The definitions and structures of these elements could be harmonized with standards for healthcare and/or with global research data standards.

Contact tracing, which is now receiving significant attention,22, 24, 28 is an area ripe for standardization. Challenges relate not only to defining the data collected, but also to ensuring appropriate privacy around whether to share data, what data to share, and with whom. Compliance of citizens is also an important factor. For the traceability of those infected and their contacts, advanced connectivity and tracing technologies could and should be utilized. These technologies, anchored in strong privacy protections, would reduce the need to disturb citizens with more phone calls than necessary and cumbersome memory exercises. For these connectivity technologies to be implemented in the western world, explicit acceptance and collaboration of citizens will be required. To achieve this acceptance, trustworthiness, trust, and transparency regarding rules and ownership for this traceability platform will be mandatory. Furthermore, the platform will need “friendly procedures” that encourage citizens to freely provide their own health‐related data in exchange for better personal risk control and well‐being. This connectivity could be materialized through mobile phones and/or wearables. In the region of Asturias, in Spain, a pilot project on traceability is being initiated with the use of a very simple Bluetooth device, which can be worn by those who are not familiar with mobile phones and/or by those who are reluctant to use smartphones for such data gathering activities. Not surprisingly, the United States has developed a patchwork, state‐by‐state approach with respect to the use of contact tracing apps. As of October 2020, 10 states and Washington, DC had launched a Google‐Apple app and 11 states were piloting or building such tools. “But the lack of a national strategy—unlike in many European countries that have adopted such apps—adds a hurdle to making sure the tools work across state lines as case counts tick upward, tech and public health experts say.”29 Spain has recently finished a pilot of an app based on the Google‐Apple framework (Radar COVID), which has been offered to the various regions, but which is having an irregular rate of support by the regional authorities.30

A recent study for Spain indicates that the spread of the infection is showing some initially unexpected patterns, more related to the K dispersion factor than to the R reproductive number.31 Taking those new patterns into account, the authors claim that a more efficient contact tracing strategy could be designed. This is a good example of how advanced data processing and continuous learning mechanisms could help, if they were part of integral management systems, to improve the fight against this pandemic.

4.2 From person to patient

Covid‐19 has proven elusive in the sense that a person may test positive and show no symptoms, yet they may be spreading the virus to others. This has created additional challenges not only for contact tracing, as mentioned in the prior section, but also for tracking of positive cases through public health measures. The U.S. has 56 different public health systems that all function differently; responsibility was distributed. Spain has 17 different regions; a national approach was in place for 3 months; the responsibility was then transferred to regional governments and, at the end October 2020, another national approach was brought in place. Italy has 20 different regions, however, during the pandemic, a national approach was instituted; this is still in place. Unfortunately, the region‐specific or state‐specific approaches, without national or international standards for how to report or compare important information, have created havoc in terms of being able to reliably track the spread of the pandemic. In addition, overloaded hospitals and care facilities have also resulted in lack of resources to collect the necessary data. It has been estimated in a report by Resolve to Save Lives (21 July 2020) that there are “critical gaps in the availability of information necessary to track and control COVID‐19: across the 50 states, only 40% of essential data points are being monitored and reported publicly. More than half the essential information—strategic intelligence that leaders need to turn the tide against COVID‐19—is not reported at all.”32

Once a patient tests positive and experiences symptoms, they may at that point be tracked through a hospital system or via telemedicine, as a patient. Data for such patients are now typically entered into an electronic health record (EHR) system. The prior practice for reporting data on outbreaks in the U.S. was to the Centers for Disease Control and Prevention (CDC) through a National Health Safety Network, which ~6000 hospitals used to report outbreak information. When Covid‐19 struck, the CDC added questions to this system to track the Covid‐19 outbreak.33 Unfortunately, the U.S. HHS granted a contract to a private company to implement a newly built system beginning in April. These systems have both proven to be too cumbersome for many hospitals to manage, thus, they have failed to report all of their data. In 2010, a standard for automating the reporting of outbreaks from EHRs was demonstrated at the HIMSS Interoperability Showcase; however, this was not moved forward by HHS nor widely adopted by EHRs.34 Hence, it remains a challenge to obtain adequate data on patients with Covid‐19.

4.3 Use of EHR data for observational research

In terms of conducting observational research with real‐world data from EHRs, these data are covered by HIPAA or GDPR.35, 36 Although regulatory agencies have access to such data, other researchers may need to access these data through Data Use Agreements (DUAs) or business agreements. Initially, this was a challenge, but the threat of the pandemic accelerated the implementation of a standard DUA by the U.S. National Covid Cohort Collaborative (N3C),37 a group of NIH‐funded large academic research centers that are using an architecture developed by FDA and NIH/NCATS to report EHR data on Covid patients. They have agreed to use the OMOP data model,38 which was initially based on claims data, over other common data models. Unfortunately, the use of HL7 FHIR as a transport standard for reporting of Covid‐19 data for this project did not appear to be a viable option. Currently, implementation of HL7 FHIR is not sufficiently widespread nor are there adequate FHIR resources to make this emerging global healthcare standard an option for this purpose. In addition, implementations of FHIR vary.

The Evidence Accelerator, led by the Reagan‐Udall Foundation and Friends of Cancer Research, in conjunction with FDA, is exploring many other projects that are leveraging real‐world data for the purpose of supporting regulatory decision‐making.39 Multiple organizations (private and public) have presented their findings to inform this effort.

4.4 Clinical research and development

In the area of regulated clinical research for evaluating existing therapies, new therapies, or vaccines, as mentioned previously, there are existing global data and metadata (and terminology) standards, which are currently required for regulatory submission of data (tables that aggregate patient data and analysis datasets) to regulators in the U.S. and Japan. Other regulators such as EMA have endorsed these standards, but do not require them (since their reviewers do not initially receive raw data). A provisional therapeutic area standard user guide specific to Covid‐19 (which augments the foundational CDISC standards in terms of data specific to this virus) is available as are standards for virology and vaccines.16, 28 As noted previously, this CDISC Covid‐19 standard has also been harmonized with the WHO/ISARIC/IDDO form recommended for collecting data from the start of a research study for the purpose of studying potential treatments of Covid‐19.40 These standards, which now cover both data collection and data aggregated, tabulated and analyzed for regulatory approval, can also apply to confirming efficacy for existing therapies with respect to a new indication, that is, Covid‐19. In addition, certain companies and regulators are recommending standardizing at the protocol level (conducting master protocols) to compare multiple therapies simultaneously in the same study.

Clinical trials are required to be registered openly, allowing patients or their caregivers to search to see if they are eligible. There is a harmonized standard that is currently available and can be leveraged to register trials in any of the following clinical trial registries: WHO’s International Clinical Trial Registry Platform (ICTRP), EMA’s EudraCT or the U.S. NIH/NLM ClinicalTrials.gov. Lastly, in the area of public health research, the WHO/ISARIC/IDDO form for clinical research can also be used for data collected to support new clinical research or public health research to enable comparisons across treatments.40

4.5 Health system and public administration

For the purpose of Block 4 on Health System and Public Administration, many of the indicators rely on data such as counts of people, race, sex, supplies, and other such information. The data appear deceptively simple; however, building consensus around standards for such data and metadata to serve as indicators can be challenging. One option for doing this quickly would be to adopt the standards for these data from either CDISC or HL7. Unfortunately, the standards for research and healthcare are not yet harmonized completely and there is work to be done. Certain HL7 accelerators (Codex and Vulcan) have encouraged leveraging rich CDISC content standards to inform the HL7 FHIR resources that need to be developed to support research.41


The current situation in the three countries (Italy, Spain, and the U.S.), in terms of available standards to support each of these blocks within the system, is described in more detail in Table S1 in Data S1, where the standards are grouped by category or use case as described in the systems diagram in Figure S1 in Data S1. Table 1 is a summarized version of this information with examples of data inputs that align with the indicators and units depicted in Figures 1 and 2.

A data‐driven systems approach with consensus‐based standards, applied to the definitions of the metrics captured and the means of data exchange, will be critical to realizing an LHS that can change the trajectory of this pandemic and, hopefully, reduce the impact of future pandemics. As discussed, there are numerous opportunities to leverage such an approach and build upon the existing standards where standards do not currently exist. This systems approach can be applied, as indicated previously, to varying settings and institutions. The broader the adoption of the approach and the standards, the more value there will be to citizens in different countries around the world. It is now more important than ever to create all necessary standards (including definitions and metadata), build global consensus and broaden adoption to collaboratively improve data quality, and accelerate meaningful data sharing to support a systems approach in the battle against Covid‐19.


Rapid learning in the truest sense of the phrase—including capturing human health experiences as data, analyzing the data to synthesize new knowledge, implementing the knowledge in practice to impact lives and health, and continuing learning cycles by assessing these impacts—will be the key to addressing the Covid‐19 pandemic and future public health crises; doing so promises to save lives and money. Rendering such rapid learning possible will require unprecedented and sustained collaboration among diverse stakeholders and among multiple nations. We aim to seed the possibilities for realizing a true public LHS built atop an adaptable and extensible infrastructure that will stand the test of time. This process begins with the multi‐country methodology and collaboration we propose, aiming to catalyze an international learning community and an interconnected global dialogue.

A key desired outcome from working to achieve these objectives will be to reduce the spread of the virus and to curb hospitalizations and deaths based upon standards‐based metrics and data‐informed decisions by the system and the diverse people comprising it. John Glaser highlights, in his timely and forward‐looking article42 which points to EHRs (but that in our view should be extended more generally to health data collection and data processing), that health systems must start “shifting their focus from reactive sick care to the proactive management of health”, in order to move, not only EHRs but, in our opinion, the whole Health System, “from transaction‐oriented to intelligence‐oriented”.

In fact, the systems approach, which we are applying herein to the regulation of a Health System during a pandemic, could and should apply to the management of public health issues moving forward. Systemic, intelligent, proactive health management is far more socially rewarding and economically productive than “a posteriori” panic/stress reactions. Realizing that political and cultural factors will need to be addressed to build consensus around development and adoption of standards that are fundamental to the data sharing necessary to truly realize the systems approach we are proposing, we believe that now is the time to work together globally in removing these barriers.

Spain, Italy, and the United States have been three of the countries hardest hit by the Covid‐19 pandemic, although no country and no person has been untouched. Individuals across these three countries—desiring to transform health by mobilizing health information technology to unleash human potential and by bringing together stakeholders to collaboratively realize LHSs—are proposing a standards‐based, data‐driven systems approach upon which the envisioned international learning community can be built. We hope to offer to those working to learn from Covid‐19‐related health experiences a framework they can implement and from which they can share learnings. To quote Dr. Glaser, “Health care delivery is in the early stages of an extraordinary change. This change is being driven by the relentless movement to the value‐based care model, as well as the pressing problems and systemic inequities exposed by the Covid‐19 crisis. This ongoing transformation is paving the way for a new EHR design: a platform that fuses the current EHR with complementary systems, capabilities, and technologies.” This move should be global.

To be more specific in terms of recommendations to policy makers and country leaders (and appropriate others who drive the response to a pandemic), we recommend the following:

  • To manage a pandemic, it is essential to implement an integral systems approach driven by high‐quality standardized data from all relevant sources, and highly advanced AI powered processing to enhance its learning capabilities.
  • Global data standards and terminologies to support clinical research are available now; they should be leveraged, aligned with other standards as appropriate, and widely adopted for all necessary data inputs into the system.
  • The systems approach must be multi‐stakeholder and multidisciplinary, engaging data providers and decision makers from the public health, clinical, research, and technology domains, as well as the governance of all included nations and regions.
  • The effort should be a multinational collaboration, to share experiences and expand learning capabilities.

The goal is to make this happen in ways that merit trustworthiness and correspondingly engender trust. Collectively, we believe this work can create a platform upon which infrastructures and systems empowering multiple and diverse health stakeholders can potentially address the Covid‐19 global pandemic and future international public health crises. This work promises to catalyze and incent a larger ongoing global dialogue to shape and refine the future of key foundational elements underpinning global public health.


All authors have declared that they do not have a conflict of interest.