Researchers have developed a new tool that makes it easier to maximize the power of deep learning for studying genomics. They describe the new approach, Janggu, in the journal Nature Communications.
Imagine that before you could make dinner, you first had to rebuild the kitchen, specifically designed for each recipe. You'd spend way more time on preparation, than actually cooking. For computational biologists, it's been a similar time-consuming process for analyzing genomics data. Before they can even begin their analysis, they spend a lot of valuable time formatting and preparing huge data sets to feed into deep learning models.
To streamline this process, researchers developed a universal programming tool that converts a wide variety of genomics data into the required format for analysis by deep learning models. "Before, you ended up wasting a lot of time on the technical aspect, rather than focusing on the biological question you were trying to answer," says the first author of the paper. "With Janggu, we are aiming to relieve some of that technical burden and make it accessible to as many people as possible."
Janggu is named after a traditional Korean drum shaped like an hourglass turned on its side. The two large sections of the hourglass represent the areas Janggu is focused: pre-processing of genomics data, results visualization and model evaluation. The narrow connector in the middle represents a placeholder for any type of deep learning model researchers wish to use.
Deep learning models involve algorithms sorting through massive amounts data and finding relevant features or patterns. While deep learning is a very powerful tool, its use in genomics has been limited. Most published models tend to only work with fixed types of data, able to answer only one specific question. Swapping out or adding new data often requires starting over from scratch and extensive programming efforts.
Janggu converts different genomics data types into a universal format that can be plugged into any machine learning or deep learning model that uses python, a widely-used programming language.
"What makes our approach special is that you can easily use any genomic data set for your deep learning problem, anything goes in any format," says the research group head.
The research group has a dual mission: developing new machine learning tools, and using them to investigate questions in biology and medicine. During their own research efforts, they were continually frustrated by how much time was spent formatting data. They realized part of the problem was each deep learning model included its own data pre-processing. By separating the data extraction and formatting from the analysis, it provides a much easier way to interchange, combine or reuse sections of data. It's kind of like having all the kitchen tools and ingredients at your fingertips ready to try out a new recipe.
"The difficulty was finding the right balance between flexibility and usability," the leader says. "If it is too flexible, people will be drowned in different options and it will be difficult to get started."
The Nature Communications paper demonstrates Janggu's versatility in handling very large volumes of data, combining data streams, and answering different types of questions, such as predicting binding sites from DNA sequences and/or chromatin accessibility, as well as for classification and regression tasks.
While most of Janggu's benefit is on the front end, the researchers wanted to provide a complete solution for deep learning. Janggu also includes visualization of results after the deep learning analysis, and evaluates what the model has learned. Notably, the team incorporated "higher-order sequence encoding" into the package, which allows to capture correlations between neighboring nucleotides. This helped to increase accuracy of some analyses. By making deep learning easier and more user-friendly, Janggu helps throw open the door to answering all kinds of biological questions.
"One of the most interesting applications is predicting the effect of mutations on gene regulation," the author says. "This is exciting because now we can start understanding individual genomes, for instance, we can pinpoint genetic variants that cause regulatory changes, or we can interpret regulatory mutations occurring in tumors."
https://www.nature.com/articles/s41467-020-17155-y
http://sciencemission.com/site/index.php?page=news&type=view&id=publications%2Fdeep-learning-for&filter=22
https://www.eurekalert.org/pub_releases/2020-07/mdcf-jmd070920.php
New tool Janggu to incorporate diverse genomic data into deep learning
- 712 views
- Added
Edited
Latest News
A vascularized model of the human liver regeneration
Norovirus and other "stomach viruses" can spread through saliva
GPUs to discover human brain connectome
Computer models predict Face dissimilarity
Activation of a glycolytic enzyme in the metastasis of pancreatic cancer
Other Top Stories
A new role for B-complex vitamins in promoting stem cell proliferation
Breakthrough in scaling up life-changing stem cell production
Two proteins safeguard skin stem cells
Original cell type does not affect iPS cell differentiation to blood
Mass produce human neurons for studying neuropsychiatric disorders
Protocols
Light and electron microscopic imaging of synaptic vesicle endocytosis at mouse hippocampal cultures
FLAMBE: A kinetic fluorescence polarization assay to study activation of monomeric BAX
Single-cell mass spectrometry
A behavioral paradigm for measuring perceptual distances in mice
Rapid detection of an Ebola biomarker with optical microring resonators
Publications
Conserved meningeal lymphatic drainage circuits in mice and humans
Junctional instability in neuroepithelium and network hyperexcitability in a focal cortical dyspl…
A vascularized model of the human liver mimics regenerative responses
Mobilization-based chemotherapy-free engraftment of gene-edited human hematopoietic stem cells
Enteric viruses replicate in salivary glands and infect through saliva
Presentations
Hydrogels in Drug Delivery
Lipids
Cell biology of carbohydrate metabolism
RNA interference (RNAi)
RNA structure and functions
Posters
ASCO-2020-HEALTH SERVICES RESEARCH AND QUALITY IMPROVEMENT
ASCO-2020-HEAD AND NECK CANCER
ASCO-2020-GENITOURINARY CANCER–KIDNEY AND BLADDER
ASCO-2020-GENITOURINARY CANCER–PROSTATE, TESTICULAR, AND PENILE
ASCO-2020-GYNECOLOGIC CANCER