It’s been over a decade since the Human Genome Project, and in the years since, scientists have become quite efficient at sequencing DNA. Entire genomes can now be sequenced at incredible speeds, but sorting through nucleotides and creating hypotheses about their role remains guesswork. This week, Google released a new tool that aims to use artificial intelligence and machine learning techniques to begin to bridge that gap — helping scientists build a more accurate picture of the human genome from sequencing data.
The new tool, known as DeepVariant, was designed to help turn high-throughput sequencing readouts into a more precise picture of the full human genome. Through a process that automatically identifies small insertion and deletion mutations, along with single base pair mutations, the new tool can quickly and seamlessly create a more complete picture of a full human genome with little effort.
The process of high-throughput sequencing has been widely available since the early 2000s, but scientists initially lacked the ability to interpret the data being collected. Over the years, different technologies have begun to help researchers analyze some of these large datasets, but the complete picture has consistently remained elusive.
Researchers from Harvard’s School of Public Health, where early versions of DeepVariant have been tested, say the tool could go a long way toward helping scientists unravel some of the most difficult parts of the human genome. For years scientists have struggled with distinguishing small mutations from random errors generated during the sequencing process, specifically in repetitive portions of the genome. Scientists have theorized that many of these mutations may have direct links to various diseases or cancer.
For years researchers have attempted to analyze these readouts by using a variety of different software programs, however, these tools typically used machine approaches that relied on simple statistics to identify mutations by ruling out certain read errors. The end result was essentially a limited, error-prone snapshot of a human genome.
This is when researchers from Google’s Brain team decided to step in, developing a new tool that could focus on developing advanced AI techniques that could collect millions of high-throughput reads, and feed that data through a deep-learning system to help interpret it. In an effort to avoid the errors that other tools produced, the Google Brain team continued to adjust the model until it could interpret all the sequenced data with high accuracy — essentially through the use of deep learning techniques that could automatically train the system to perform better.
The team believes that the tool could be used to begin establishing the genetic links between various diseases and cancers. Considering that many doctors already use family history when diagnosing and treating patients, imagine a world where your entire sequenced genome could be analyzed by artificial intelligence. Doctors may be able to identify specific risk factors for each individual patient and begin treatment regimens to help prevent disease and cancer before they even begin to take shape in the body.
As the group moves forward with their research, the next step is to begin mapping these genetic variants so that doctors can begin to use that knowledge to identify life-saving therapies. Google has already begun to invest heavily in machine learning techniques and AI technologies, as both will be needed to help take genomic medicine to the next level.