The Personal Genome Project Canada: findings from whole genome sequences of the inaugural 56 participants
BACKGROUND: The Personal Genome Project Canada is a comprehensive public data resource that integrates whole genome sequencing data and health information. We describe genomic variation identified in the initial recruitment cohort of 56 volunteers.
METHODS: Volunteers were screened for eligibility and provided informed consent for open data sharing. Using blood DNA, we performed whole genome sequencing and identified all possible classes of DNA variants. A genetic counsellor explained the implication of the results to each participant.
RESULTS: Whole genome sequencing of the first 56 participants identified 207 662 805 sequence variants and 27 494 copy number variations. We analyzed a prioritized disease-associated data set (n = 1606 variants) according to standardized guidelines, and interpreted 19 variants in 14 participants (25%) as having obvious health implications. Six of these variants (e.g., in BRCA1 or mosaic loss of an X chromosome) were pathogenic or likely pathogenic. Seven were risk factors for cancer, cardiovascular or neurobehavioural conditions. Four other variants — associated with cancer, cardiac or neurodegenerative phenotypes — remained of uncertain significance because of discrepancies among databases. We also identified a large structural chromosome aberration and a likely pathogenic mitochondrial variant. There were 172 recessive disease alleles (e.g., 5 individuals carried mutations for cystic fibrosis). Pharmacogenomics analyses revealed another 3.9 potentially relevant genotypes per individual.
INTERPRETATION: Our analyses identified a spectrum of genetic variants with potential health impact in 25% of participants. When also considering recessive alleles and variants with potential pharmacologic relevance, all 56 participants had medically relevant findings. Although access is mostly limited to research, whole genome sequencing can provide specific and novel information with the potential of major impact for health care.
Rapid technological advances are enabling a view of human genetic variation in ever-increasing detail and at plummeting costs.1 Until recently, analysis has been targeted largely to defined genes, but pan-genomic approaches, such as microarrays, gene-panel testing and exome sequencing, have become mainstream. Now, whole genome sequencing can capture all of the genes (about 1% of the whole genome) and most of the rest of the genome in a single experiment, with the potential to recognize all types of genetic variation and thereby usurp the less comprehensive technologies (Box 1).2 Information from whole genome sequencing can already identify the molecular causes of suspected heritable conditions and cancer;2–7 however, we anticipate that genomic analysis will become a standard component of proactive health care, given its potential to identify predisposition to medically actionable conditions, explain uncharacterized disease and reveal carriers for recessive disorders and predictors of medication safety and response.8 Interpretation of sequence data remains challenging, with unknown clinical utility and predictive value among the general population.9
Human genome variation
The genome is the complete set of genetic material (DNA), contained in the cell’s nucleus and mitochondria. Genes are functional units that instruct the cell to produce specific proteins. They are segmented into exons (coding units) and introns (noncoding spacers), with regulatory sequences at either end and at intron/exon junctions. Noncoding DNA between genes includes various regulatory or structural elements but is largely uncharacterized. Each of 2 versions of a gene (1 maternal and 1 paternal) is called an allele. The Human Genome Project provided the initial draft reference DNA sequence (23 pairs of chromosomes encompassing about 25 000 genes) against which to compare future genome sequences. Despite much similarity, each person’s genome is unique — from variations in the DNA sequence, copy number of genes, its organization and epigenetic changes. Some variations may be inconsequential, contribute to the differences among healthy humans or provide protection against environmental challenges; others have health-related consequences. Genome interpretation involves distinguishing among these. Variant alleles may be null, missense, nonsense, splice variants, deleted, duplicated, disrupted, etc., depending on their effect on the related gene products. Their impact on characteristics of the individual (the phenotype) are described as recessive, semidominant, codominant or dominant. Some traits or diseases result from single-gene variants, with outcomes that are predictable using principles of classical Mendelian genetics. Most involve much more complex interactions among gene variations, with epigenetic and environmental influences. Risk alleles are found more often among people with a particular condition than among those without. Few alleles are deterministic; most have variable expression. Penetrance reflects the proportion of individuals with a particular underlying genetic variant who display a given trait. Mosaicism occurs when a variant arises postfertilization, so that not all cells in the individual have it. Similarly, mitochondrial genomes in each cell may not all be identical, and a variant in only a subset is called heteroplasmy. The size of genetic variants can range from 1 nucleotide pair (bp), into the thousands (kb) or millions (Mb). Canada’s Genetic Non-Discrimination Act S.C. 2017, c.3, which received royal assent on May 4, 2017, prohibits anyone from requiring individuals to undergo a genetic test or disclose the results of a genetic test.
The Personal Genome Project Canada was launched in 2007, and shares the guiding principles and open consent policy of the parent project in the United States.10 It aims to develop a public data set of fully annotated genomic information, connected with human trait information. It can provide control data for other studies, but it also aims to forecast effects of integrating DNA-derived knowledge into routine clinical practice. The project will evaluate the utility of such information, and how best to gather and apply it within Canada’s provincially administered, publicly funded health care system. Participants in this ongoing project are highly motivated to promote genomic research and explicitly forego privacy commitments. We report the data and experiences from whole genome sequencing and medical annotation of genomes of the first 56 participants in the Personal Genome Project Canada.
Information about the Personal Genome Project Canada was posted online (www.personalgenomes.ca) and disseminated through newspaper articles, by word-of-mouth and through Medcan Health Management Inc. Registered volunteers from across Canada underwent an in-person (n = 54) or phone (n = 2) interview and entrance exam (Figure 1), to ensure that they were aware of the potential risks associated with participation and that research results should not substitute for clinical diagnostic testing. To enrol in the project, participants must be over the age of 18 and state their intention to share their genomic data publicly. Self-reported baseline trait data included birth month/year, medications, allergies, vaccines, personal medical history, ethnicity/ancestry, blood pressure, height and weight. We did not exclude individuals based on known health conditions. Blood was drawn at the Medcan clinic (n = 54) or at a community laboratory (n = 2). Participation in the project is an ongoing process, both for the participants described here and for additional volunteers.
Categories: National Headlines