Since the first sequencing of the human genome more than 20 years ago, the study of human genomes has relied almost exclusively on a single reference genome to which others are compared to identify genetic variations. Scientists have long recognized that a single reference genome cannot represent human diversity and that using it introduces a pervasive bias into these studies. Now, they finally have a practical alternative.
In a paper published in Science, researchers have introduced a new tool, called Giraffe, that can efficiently map new genome sequences to a “pangenome” representing many diverse human genome sequences. They show that this approach allows a more comprehensive characterization of genetic variations and can improve the genomic analyses used by a wide range of researchers and clinicians.
“We’ve been working toward this for years, and now for the first time we have something practical that works fast and works better than the single reference genome,” said corresponding author. “It’s important for the future of biomedicine that genomics helps everyone equally, so we need tools that account for the diversity of human populations and are not biased.”
All humans have the same genes, but there are many variations in the exact sequences of the genes—meaning the sequence of DNA subunits (abbreviated A, C, T, G) that spell out the genetic code—as well as in the vast stretches of the genome outside of the protein-coding genes. A difference in a single letter of code is called a single nucleotide variant (SNV), and insertions or deletions of short sequences are known collectively as “indels”.
The most complex variants are structural variations involving rearrangements of large segments of code (50 or more letters). These are especially hard to find using a single reference genome, yet they can have significant effects and are known to play an important role in some diseases. The average person has millions of SNVs and indels and tens of thousands of larger structural variants, and collectively the structural variants actually involve more letters of code than the other types of variants do.
“The workhorses of genomics have been SNVs and short indels, because structural variants have been hidden from view,” the author said. “Pangenomics is making structural variants visible so we can study them the same way we do SNVs and short indels. There are a lot of structural variants and they can have a big impact, so this is critical for the future of genetic studies of disease.”
A pangenome reference can be created from multiple genome sequences using a mathematical graph structure to represent the relationships between different sequences. In the new paper, the researchers built two human genome reference graphs using publicly available data. These were used to evaluate the new tool, Giraffe, which is a set of algorithms for mapping new sequence data to a pangenome reference.
Giraffe can accurately map new sequence data to thousands of genomes embedded in a pangenome reference as quickly as existing tools map to a single reference genome. The study also showed that using Giraffe reduces mapping bias, the tendency to incorrectly map sequences that differ from the reference genome.
“Not only is the analysis better, it is also as fast as current methods that use a linear reference genome,” said the co-first author.
Inexpensive short-read sequencing is a mainstay of modern genomics, yielding snippets of sequence that must be mapped to a reference genome to make sense of them. Mapping shows where each snippet belongs on one of the 23 human chromosomes and identifies the variants present at each location in an individual’s genome, a process known as genotyping.
The researchers found that Google Health’s deep-learning variant caller, DeepVariant, could more accurately identify SNVs and indels using Giraffe’s alignments against a pangenome than it could using alignments against a single reference genome.
“A lot of structural variants have been discovered recently using long-read sequencing,” the author said. “With pangenomes, we can look for these structural variants in large datasets of short-read sequencing. It's exciting because this will allow us to study those new structural variants across many people and ask questions about their functional impact, association with disease, or role in evolution.”
The researchers used Giraffe to map sequence reads from a diverse group of 5,202 people and determine their genotypes for 167,000 recently discovered structural variations. This enabled them to estimate the frequency of different versions of these structural variants in the human population as a whole and within individual subpopulations. They showed that the frequency of some variants differs considerably between subpopulations and could be misinterpreted if analyzed only in, for example, European-ancestry populations where the frequency of a particular variant is low.
A single reference genome must choose one version of any variation to represent, leaving the other versions unrepresented. By making more broadly representative pangenome references practical, Giraffe can make genomics more inclusive.
https://www.science.org/doi/10.1126/science.abg8871
Pangenomics removes bias from human genotyping
- 1,153 views
- Added
Latest News
Linking gene network and pa…
By newseditor
Posted 09 Dec
How immune cells recognize…
By newseditor
Posted 09 Dec
Regulators of astrocyte pla…
By newseditor
Posted 09 Dec
Hypernitrosylation may caus…
By newseditor
Posted 09 Dec
Blocking Cdk5 activation al…
By newseditor
Posted 09 Dec
Other Top Stories
Monoclonal antibody against cancer mutant protein Ras developed!
Read more
How brain cancer spreads
Read more
How circadian 'clock' may influence cancer pathway
Read more
Cancer cells 'talk' to their environment, and it talks back
Read more
A survival mechanism in cancer cells
Read more
Protocols
AA2P-mediated DNA demethyla…
By newseditor
Posted 09 Dec
Brain-wide circuit-specific…
By newseditor
Posted 05 Dec
Cheap, cost-effective, and…
By newseditor
Posted 03 Dec
Temporally multiplexed imag…
By newseditor
Posted 02 Dec
Efficient elimination of ME…
By newseditor
Posted 01 Dec
Publications
A distinct topology of BTN3…
By newseditor
Posted 09 Dec
Injury-specific factors in…
By newseditor
Posted 09 Dec
Neuromelanin accumulation d…
By newseditor
Posted 09 Dec
Mitochondrial complexome an…
By newseditor
Posted 09 Dec
NGLY1 mutations cause prote…
By newseditor
Posted 09 Dec
Presentations
Hydrogels in Drug Delivery
By newseditor
Posted 12 Apr
Lipids
By newseditor
Posted 31 Dec
Cell biology of carbohydrat…
By newseditor
Posted 29 Nov
RNA interference (RNAi)
By newseditor
Posted 23 Oct
RNA structure and functions
By newseditor
Posted 19 Oct
Posters
A chemical biology/modular…
By newseditor
Posted 22 Aug
Single-molecule covalent ma…
By newseditor
Posted 04 Jul
ASCO-2020-HEALTH SERVICES R…
By newseditor
Posted 23 Mar
ASCO-2020-HEAD AND NECK CANCER
By newseditor
Posted 23 Mar
ASCO-2020-GENITOURINARY CAN…
By newseditor
Posted 23 Mar