By Allison Proffitt
March 1, 2021 | Josh Denny, CEO of NIH’s All of Us program, gave an update on the population sequencing program in the opening session of the virtual AGBT General Meeting this week.
Using numbers current as of January 19, 2021, Denny reported that the All of Us program has more than 366,000 participants. More than 233,000 have shared access to their medical records, and more than 279,000 have contributed some biosamples which includes physical measurements, blood, saliva, and/or urine samples. The All of Us program did pause recruitment in March 2020 because of the pandemic, but recruitment resumed in late 2020, Denny said and numbers are again increasing.
Participants come from all 50 states and all available ages (18 and older), though the majority of participants are between 50 and 69 years old. A little less than half of the current participants are white and about 22% are black or African American. Over 80% of the participants are underrepresented in biomedical research, Denny said, considering race, age, socioeconomic status, ethnicity, rural residents, and sexual orientation/gender identity.
All of Us continues to collect data from participants in a variety of ways: surveys, biosample contribution, shared medical records, and data from Fitbit and Apple HealthKit. And the data are deep. Survey data, of course, only dates back to the All of Us pilot and launch in 2017 and 2018, but Denny emphasized that EHR data goes back decades. ICD codes date back to 1980 and data from CPT codes and some EHR labs and measurements dating back to 1985.
This means, Denny said, “that even though we haven’t recruited children, we have over 10,000 individuals for whom we have pediatric data, including people from birth to age 18.” These individuals have shared their childhood data as adults, and such deep datasets enable in silico-based prospective studies of long range and childhood indications.
Participant Data Return
The return of findings to research participants has been a central goal of All of Us from the beginning, and Denny reported that genetic findings were returned to participants beginning in December 2020. The first results are what Denny called “recreational genetics” including physical traits—so far lactose tolerance, bitter taste perception, cilantro preference, and earwax type—and genetic ancestry. These results are based on a custom genotyping array created by Illumina for All of Us (and sold publicly as the Global Diversity Array) and sequenced by the Broad Institute, University of Washington Genome Science, and the Center for Inherited Disease Research at Johns Hopkins University.
Whole genome sequencing for All of Us is being done by the Broad Institute, University of Washington Genome Science, and the Human Genome Sequencing Center clinical laboratory. Whole genome sequencing results—including pharmacogenomic findings and results from ACMG’s list of 59 variants—will be first returned to participants in about a year, Denny said.
The health-related genetic findings are what the All of Us participants have expressed the most interest in, Denny said: 97% of participants have expressed interest in receiving genetic results. All participants have the option of receiving the types of data they want to have.
As of January 12, 2021, All of Us has generated 27,987 genotyping results and 10,196 whole genomes. Right now, Denny said, whole genome sequencing is running behind genotyping, but All of Us plans a “sequencing first” approach. “We hope that they come up to the point where we run whole genome sequencing at the same pace or nearly the same pace as genotyping in terms of our release schedules,” he said.
For those participants to whom genotyping results have been returned, more than 70% have accessed the findings. “Almost everyone is viewing the genetic ancestry piece,” Denny said. Giving updated numbers today, Denny reported that about 34,000 participants have been notified that their ancestry and traits data are available, and about 25,000 participants have viewed the results. Right now, All of Us is processing about 4,500 DNA samples each week, Denny said.
The data within All of Us is available to browse without login credentials. At DataBrowser.ResearchAllofUs.org, researchers can search for survey and EHR data by searching across disease types. This itself is quite the feat—mapping common disease names across multiple EHRs. “Of course I say this is a journey, not a destination, with electronic health record data. We’re always seeking to expand the data we get and harmonize it better,” Denny clarified.
To dig deeper, All of Us launched Researcher Workbench Beta in May 2020. In the beta launch, access is restricted to U.S. researchers with eRA Commons accounts. As of January, more than 230 Institutional Data Use and Registration Agreements have been completed. The median time to complete a DUA has been 24 days, Denny said. Once a researcher’s institution has completed the DUA, the process of registering as an individual researcher—including the All of Us “responsible conduct of research training”—takes about 2-3 hours.
“This Researcher Workbench is really designed as a web-based environment to allow you to do a full range of analyses from having point-and-click cohort builders that let you even select a class of medication and get all the medications under there for your population and build other kinds of demographic and phenotypic descriptors to fill that cohort, to doing richer analyses in languages such as R and Python through Jupyter Notebooks,” Denny said.
Currently researchers need to be able to use R or Python and Jupyter Notebooks to do more in depth analyses, Denny said, but the Researcher Workbench does support collaborative workbenches, so that groups can bring various skills to work together on a problem. Individual researchers access the Workbench via a passport.
Roadmap and the All of Us Future
All of Us has also launched more than 40 “demonstration projects” across the portfolio, Denny explained. “The idea on these demonstration projects was to do things that we think we know what we expect as the result,” he said. “Not to discover new science, but to test the system.”
These projects—prebuilt cohorts or full studies—are published or publicly shared on medRxiv as notebooks, so that other groups can borrow and repurpose the code and cohorts and see examples of how to use the system, Denny said.
While COVID-19 did slow All of Us enrollment in 2020, Denny expects a steady future roadmap of All of Us progress. All of Us just released refreshed data into the platform, he said, including the COVID-19 surveys that All of Us began in May 2020 and the first genomic returns. In about a year, Denny said, they expect to launch a Controlled Tier, which will include genomic data and more COVID-19 data including serology. Further along, All of Us plans to incorporate clinical notes, imaging data, and data linkages and then, later, to offer participant re-contact and biospecimen access.