Haplotypes are a set of genetic variations that, located side by side on the same chromosome, are transmitted in a single group to the next generation. Their examination makes it possible to understand the heritability of certain complex traits, such as the risk of developing a disease. However, to carry out this analysis, genome analysis of family members (parents and their child) is usually necessary, a tedious and expensive process.
To overcome this problem, researchers have developed SHAPEIT4, a powerful computer algorithm that allows the haplotypes of hundreds of thousands of unrelated individuals to be identified very quickly. Results are as detailed as when family analysis is performed, a process that cannot be conducted on such a large scale. Their tool is now available online under an open source license, freely available to the entire research community. Details can be discovered in Nature Communications.
Nowadays, the analysis of genetic data is becoming increasingly important, particularly in the field of personalized medicine. The number of human genomes sequenced each year is growing exponentially and the largest databases account for more than one million individuals. This wealth of data is extremely valuable for better understanding the genetic destiny of humanity, whether to determine the genetic weight in a particular disease or to better understand the history of human migration. To be meaningful, however, these big data must be processed electronically. "However, the processing power of computers remains relatively stable, unlike the ultra-fast growth of genomic Big Data", says the research lead. "Our algorithm thus aims to optimize the processing of genetic data in order to absorb this amount of information and make it usable by scientists, despite the gap between its quantity and the comparatively limited power of computers."
Genotyping makes it possible to know an individual's alleles, i.e. the genetic variations received from his or her parents. However, without knowing the parental genome, we do not know which alleles are simultaneously transmitted to children, and in which combinations. "This information - haplotypes - is crucial if we really want to understand the genetic basis of human variation, explains the co-senior author. This is true for both population genetics or in the perspective of precision medicine."
To determine the genetic risk of disease, for example, scientists assess whether a genetic variation is more or less present in individuals who have developed the disease in order to determine the role of this variation in the disease being studied. "By knowing the haplotypes, we conduct the same type of analysis, says the author. However, we are moving from a single variant to a combination of many variants, which allows us to determine which allelic combinations on the same chromosome have the greatest impact on disease risk. It is much more accurate!"
The method developed by the researchers makes it possible to process an extremely large number of genomes, about 500,000 to 1,000,000 individuals, and to determine their haplotypes without knowing their ancestry or progeny, while using standard computing power. The SHAPEIT4 tool has been successfully tested on the 500,000 individual genomes present in the UK Biobank, a scientific database developed in the United Kingdom. "We have here a typical example of what Big Data is, says the author. Such a large amount of data makes it possible to build very high-precision statistical models, as long as they can be interpreted without drowning in them."
The researchers have decided to make their tool accessible to all under an open source MIT license: the entire code is available and can be modified at will, according to the needs of researchers. This decision was made mainly for the sake of transparency and reproducibility, as well as to stimulate researchers from all over the world. "But we only give access to the analysis tool, under no circumstances to a corpus of data", the author explains. "It is then up to each individual to use it on the data he or she has."
This tool is much more efficient than older tools, as well as faster and cheaper. It also makes it possible to limit the digital environmental impact. The very powerful computers used to process Big Data are indeed very energy-intensive; reducing their use also helps to minimize their negative impact.
https://www.nature.com/articles/s41467-019-13225-y
http://sciencemission.com/site/index.php?page=news&type=view&id=publications%2Faccurate-scalable-and&filter=22
Accurate, scalable and integrative haplotype estimation
- 2,618 views
- Added
Edited
Latest News
Immune cells identified as…
By newseditor
Posted 28 Mar
TB blood test which could d…
By newseditor
Posted 27 Mar
Propionate supplementation…
By newseditor
Posted 27 Mar
Role of human Kallistatin i…
By newseditor
Posted 26 Mar
Addressing both flu and COV…
By newseditor
Posted 26 Mar
Other Top Stories
Social trauma activates brain circuit to block social reward and pr…
Read more
An alternative splicing modulator decreases mutant HTT in Huntingto…
Read more
Changing the intrinsic behavior of neurons
Read more
How neurons respond to aged-related iron accumulation
Read more
Sleep signaling pathway in brain cells identified!
Read more
Protocols
All-optical presynaptic pla…
By newseditor
Posted 23 Mar
Epigenomic tomography for p…
By newseditor
Posted 20 Mar
A mouse DRG genetic toolkit…
By newseditor
Posted 17 Mar
An optogenetic method for t…
By newseditor
Posted 13 Mar
Profiling native pulmonary…
By newseditor
Posted 08 Mar
Publications
BHLHE40/41 regulate microgl…
By newseditor
Posted 28 Mar
Balancing neuronal activity…
By newseditor
Posted 28 Mar
OSBP-mediated PI(4)P-choles…
By newseditor
Posted 28 Mar
Integrated plasma proteomic…
By newseditor
Posted 27 Mar
APP antisense oligonucleoti…
By newseditor
Posted 27 Mar
Presentations
Hydrogels in Drug Delivery
By newseditor
Posted 12 Apr
Lipids
By newseditor
Posted 31 Dec
Cell biology of carbohydrat…
By newseditor
Posted 29 Nov
RNA interference (RNAi)
By newseditor
Posted 23 Oct
RNA structure and functions
By newseditor
Posted 19 Oct
Posters
A chemical biology/modular…
By newseditor
Posted 22 Aug
Single-molecule covalent ma…
By newseditor
Posted 04 Jul
ASCO-2020-HEALTH SERVICES R…
By newseditor
Posted 23 Mar
ASCO-2020-HEAD AND NECK CANCER
By newseditor
Posted 23 Mar
ASCO-2020-GENITOURINARY CAN…
By newseditor
Posted 23 Mar