AI in the Hand of the Artist

AI in the Hand of the Artist

Humans are wielding AI to create art, and a virtual exhibit that’s part of NVIDIA’s GPU Technology Conference showcases the stunning results.

The AI Art Gallery at NVIDIA GTC features pieces by a broad collection of artists, developers and researchers from around the world who are using AI to push the limits of artistic expression.

When AI is introduced into the artistic process, the artist feeds the machine data and code, explains Heather Schoell, senior art director at NVIDIA, who curated the online exhibit.

Once the output reveals itself, it’s up to the artist to determine if it stands up to their artistic style and desired message, or if the input needs to be adjusted, according to Schoell.

“The output reflects both the artist’s hand and the medium, in this case data, used for creation,” Schoell says.

The exhibit complements what has become the world’s premier AI conference.

GTC, running Oct. 5-9, will bring together researchers from industry and academia, startups and Fortune 500 companies.

So it’s only natural that artists would be among those putting modern AI to work.

“Through this collection we aim to share how the artist can partner with AI as both an artistic medium and creative collaborator,” Schoell explains.

The artists featured in the AI Art Gallery include:

  • Daniel Ambrosi – Dreamscapes fuses computational photography and AI to create a deeply textural environment.
  • Refik AnadolMachine Hallucinations, by the Turkish-born, Los Angeles-based conceptual artist known for his immersive architectural digital installations, such as a project at New York’s Chelsea Market that used projectors to splash AI-generated images of New York cityscapes to create what Anadol called a “machine hallucination.”
  • Sofia Crespo and Dark Fractures – Work from the Argentina-born artist and Berlin-based studio led by Feileacan McCormick uses GANs and NLP models to generate 3D insects in a virtual, digital space.
  • Scott Eaton – An artist, educator and creative technologist residing in London, who combines a deep understanding of human anatomy, traditional art techniques and modern digital tools in his uncanny, figurative artworks.
  • Oxia Palus – The NVIDIA Inception startup will uncover a new masterpiece by Leonardo da Vinci that resurrects a hidden sketch and reconstructs the painting style from one of the most famous artists of all time.
  • Anna Ridler – Three displays showing images of tulips that change based on Bitcoin’s price, created by the U.K. artist and researcher known for her work exploring the intersection of machine learning, nature and history.
  • Helena Sarin – Using her own drawings, sketches and photographs as datasets, Sarin trains her models to generate new visuals that serve as the basis of her compositions — in this case with type of neural network known as a generative adversarial network, or GAN. The Moscow-born artist has embedded 12 of these creations in a book of puns on the acronym GAN.
  • Pindar Van Arman – Driven by a collection of algorithms programmed to work with — and against — one another, the U.S.-based artist and roboticist’s creation uses a paintbrush, paint and canvas to create portraits that fuse the look and feel of a photo and a handmade sketch.

For a closer look, registered GTC attendees can go on a live, personal tour of two of our featured artists’ studios.

On Thursday, Oct. 8, you can virtually tour Van Arman’s Fort Worth, Texas, studio between 11 a.m.-12 p.m. Pacific time. And at 2 p.m. Pacific, you can tour Refik Anadol’s Los Angeles studio.

In addition, a pair of panel discussions, Thursday, Oct. 8, with AI Gallery artists will explore what led them to connect AI and fine art.

And starting Oct. 5, you can tune in to an on-demand GTC session featuring Oxia Palus co-founder George Cann, a Ph.D. candidate in space and climate physics at University College London.

Join us at the AI Art Gallery.

Register for GTC

The post AI in the Hand of the Artist appeared first on The Official NVIDIA Blog.

Read More

OpenAI Licenses GPT-3 Technology to Microsoft

OpenAI Licenses GPT-3 Technology to Microsoft

OpenAI Licenses GPT-3 Technology to Microsoft

OpenAI released its first commercial product back in June: an API for developers to access advanced technologies for building new applications and services. The API features a powerful general purpose language model, GPT-3, and has received tens of thousands of applications to date.

In addition to offering GPT-3 and future models via the OpenAI API, and as part of a multiyear partnership announced last year, OpenAI has agreed to license GPT-3 to Microsoft for their own products and services. The deal has no impact on continued access to the GPT-3 model through OpenAI’s API, and existing and future users of it will continue building applications with our API as usual.

Unlike most AI systems which are designed for one use-case, OpenAI’s API today provides a general-purpose “text in, text out” interface, allowing users to try it on virtually any English language task. GPT-3 is the most powerful model behind the API today, with 175 billion parameters. There are several other models available via the API today, as well as other technologies and filters that allow developers to customize GPT-3 and other language models for their own use.

Today, the API remains in a limited beta as OpenAI and academic partners test and assess the capabilities and limitations of these powerful language models. To learn more about the API or to sign up for the beta, please visit beta.openai.com.

OpenAI

Li Auto Aims to Extend Lead in Chinese EV Market with NVIDIA DRIVE

Li Auto Aims to Extend Lead in Chinese EV Market with NVIDIA DRIVE

One of the leading EV startups in China is charging up its compute capabilities.

Li Auto announced today it would develop its next generation of electric vehicles using the high-performance, energy-efficient NVIDIA DRIVE AGX Orin. These new vehicles will be developed in collaboration with tier 1 supplier Desay SV and feature advanced autonomous driving features, as well as extended battery range for truly intelligent mobility.

The startup has become a standout brand in China over the past year. Its electric model lineup has led domestic sales of medium and large SUVs for eight consecutive months. With this latest announcement, the automaker can extend its lead to the autonomous driving industry.

NVIDIA Orin, the SoC at the heart of the future fleet, achieves 200 TOPS — nearly 7x the performance and 3x the energy efficiency of our previous generation SoC — and is designed to handle the large number of applications and deep neural networks that run simultaneously for automated and autonomous driving. Orin is designed to achieve the systematic safety standards such as ISO 26262 ASIL-D.

This centralized, high-performance system will enable software-defined, intelligent features in Li Auto’s upcoming electric vehicles, making them a smart choice for eco-friendly, safe and convenient driving.

“By cooperating with NVIDIA, Li Auto can benefit from stronger performance and the energy-efficient compute power needed to deliver both advanced driving and fully autonomous driving solutions to market,” said Kai Wang, CTO of Li Auto.

A Software-Defined Architecture

Today, a vehicle’s software functions are powered by dozens of electronic control units, known as ECUs, that are distributed throughout the car. Each is specialized — one unit controls windows and one the door locks, for example, and others control power steering and braking.

This fixed-function architecture is not compatible with intelligent and autonomous features. These AI-powered capabilities are software-defined, meaning they are constantly improving, and require a hardware architecture that supports frequent upgrades.

Vehicles equipped with NVIDIA Orin have the powerful, centralized compute necessary for this software-defined architecture. The SoC was born out of the data center, built with approximately 17 billion transistors to handle the large number of applications and deep neural networks for autonomous systems and AI-powered cockpits.

The NVIDIA Orin SoC

This high-performance platform will enable Li Auto to become one of the first automakers in China to deploy an independent, advanced autonomous driving system with its next-generation fleet.

The Road Ahead

This announcement is just the first step of a long-term collaboration between NVIDIA and Li Auto.

“The next-generation NVIDIA Orin SoC offers a significant leap in compute performance and energy efficiency,” said Rishi Dhall, vice president of autonomous vehicles at NVIDIA. “NVIDIA works closely with companies like Li Auto to help bring new AI-based autonomous driving capabilities to cutting-edge EVs in China and around the globe.”

By combining NVIDIA’s leadership in AI software and computing with Li Auto’s momentum in the electric vehicle space, together, these companies will develop vehicles that are better for the environment and safer for everyone.

The post Li Auto Aims to Extend Lead in Chinese EV Market with NVIDIA DRIVE appeared first on The Official NVIDIA Blog.

Read More

Advancing NLP with Efficient Projection-Based Model Architectures

Advancing NLP with Efficient Projection-Based Model Architectures

Posted by Prabhu Kaliamoorthi, Software Engineer, Google Research

Deep neural networks have radically transformed natural language processing (NLP) in the last decade, primarily through their application in data centers using specialized hardware. However, issues such as preserving user privacy, eliminating network latency, enabling offline functionality, and reducing operation costs have rapidly spurred the development of NLP models that can be run on-device rather than in data centers. Yet mobile devices have limited memory and processing power, which requires models running on them to be small and efficient — without compromising quality.

Last year, we published a neural architecture called PRADO, which at the time achieved state-of-the-art performance on many text classification problems, using a model with less than 200K parameters. While most models use a fixed number of parameters per token, the PRADO model used a network structure that required extremely few parameters to learn the most relevant or useful tokens for the task.

Today we describe a new extension to the model, called pQRNN, which advances the state of the art for NLP performance with a minimal model size. The novelty of pQRNN is in how it combines a simple projection operation with a quasi-RNN encoder for fast, parallel processing. We show that the pQRNN model is able to achieve BERT-level performance on a text classification task with orders of magnitude fewer number of parameters.

What Makes PRADO Work?
When developed a year ago, PRADO exploited NLP domain-specific knowledge on text segmentation to reduce the model size and improve the performance. Normally, the text input to NLP models is first processed into a form that is suitable for the neural network, by segmenting text into pieces (tokens) that correspond to values in a predefined universal dictionary (a list of all possible tokens). The neural network then uniquely identifies each segment using a trainable parameter vector, which comprises the embedding table. However, the way in which text is segmented has a significant impact on the model performance, size, and latency. The figure below shows the spectrum of approaches used by the NLP community and their pros and cons.

Since the number of text segments is such an important parameter for model performance and compression, it raises the question of whether or not an NLP model needs to be able to distinctly identify every possible text segment. To answer this question we look at the inherent complexity of NLP tasks.

Only a few NLP tasks (e.g., language models and machine translation) need to know subtle differences between text segments and thus need to be capable of uniquely identifying all possible text segments. In contrast, the majority of other tasks can be solved by knowing a small subset of these segments. Furthermore, this subset of task-relevant segments will likely not be the most frequent, as a significant fraction of segments will undoubtedly be dedicated to articles, such as a, an, the, etc., which for many tasks are not necessarily critical. Hence, allowing the network to determine the most relevant segments for a given task results in better performance. In addition, the network does not need to be able to uniquely identify these segments, but only needs to recognize clusters of text segments. For example, a sentiment classifier just needs to know segment clusters that are strongly correlated to the sentiment in the text.

Leveraging these insights, PRADO was designed to learn clusters of text segments from words rather than word pieces or characters, which enabled it to achieve good performance on low-complexity NLP tasks. Since word units are more meaningful, and yet the most relevant words for most tasks are reasonably small, many fewer model parameters are needed to learn such a reduced subset of relevant word clusters.

Improving PRADO
Building on the success of PRADO, we developed an improved NLP model, called pQRNN. This model is composed of three building blocks, a projection operator that converts tokens in text to a sequence of ternary vectors, a dense bottleneck layer and a stack of QRNN encoders.

The implementation of the projection layer in pQRNN is identical to that used in PRADO and helps the model learn the most relevant tokens without a fixed set of parameters to define them. It first fingerprints the tokens in the text and converts it to a ternary feature vector using a simple mapping function. This results in a ternary vector sequence with a balanced symmetric distribution that uniquely represents the text. This representation is not directly useful since it does not have any information needed to solve the task of interest and the network has no control over this representation. We combine it with a dense bottleneck layer to allow the network to learn a per word representation that is relevant for the task at hand. The representation resulting from the bottleneck layer still does not take the context of the word into account. We learn a contextual representation by using a stack of bidirectional QRNN encoders. The result is a network that is capable of learning a contextual representation from just text input without employing any kind of preprocessing.

Performance
We evaluated pQRNN on the civil_comments dataset and compared it with the BERT model on the same task. Simply because the model size is proportional to the number of parameters, pQRNN is much smaller than BERT. But in addition, pQRNN is quantized, further reducing the model size by a factor of 4x. The public pretrained version of BERT performed poorly on the task hence the comparison is done to a BERT version that is pretrained on several different relevant multilingual data sources to achieve the best possible performance.

We capture the area under the curve (AUC) for the two models. Without any kind of pre-training and just trained on the supervised data, the AUC for pQRNN is 0.963 using 1.3 million quantized (8-bit) parameters. With pre-training on several different data sources and fine-tuning on the supervised data, the BERT model gets 0.976 AUC using 110 million floating point parameters.

Conclusion
Using our previous generation model PRADO, we have demonstrated how it can be used as the foundation for the next generation of state-of-the-art light-weight text classification models. We present one such model, pQRNN, and show that this new architecture can nearly achieve BERT-level performance, despite using 300x fewer parameters and being trained on only the supervised data. To stimulate further research in this area, we have open-sourced the PRADO model and encourage the community to use it as a jumping off point for new model architectures.

Acknowledgements
We thank Yicheng Fan, Márius Šajgalík, Peter Young and Arun Kandoor for contributing to the open sourcing effort and helping improve the models. We would also like to thank Amarnag Subramanya, Ashwini Venkatesh, Benoit Jacob, Catherine Wah, Dana Movshovitz-Attias, Dang Hien, Dmitry Kalenichenko, Edgar Gonzàlez i Pellicer, Edward Li, Erik Vee, Evgeny Livshits, Gaurav Nemade, Jeffrey Soren, Jeongwoo Ko, Julia Proskurnia, Rushin Shah, Shirin Badiezadegan, Sidharth KV, Victor Cărbune and the Learn2Compress team for their support. We would like to thank Andrew Tomkins and Patrick Mcgregor for sponsoring this research project.

Read More

Meet the Maker: Mr. Fascinate Encourages Kids to Get on the Cool Bus and Study STEM

Meet the Maker: Mr. Fascinate Encourages Kids to Get on the Cool Bus and Study STEM

STEM is dope. That’s the simple message that Justin “Mr. Fascinate” Shaifer evangelizes to young people around the world.

Through social media and other platforms, Shaifer fascinates children with STEM projects — including those that can be created using AI with NVIDIA Jetson products — in hopes that more students from underrepresented groups will be inspired to dive into the field. NVIDIA Jetson embedded systems allow anyone to create their own AI-based projects.

Growing up on Chicago’s South Side, Shaifer didn’t know anyone with a career in STEM he could look up to — at least no one he could relate to. Now, he’s become that role model for thousands of kids, working to prove that STEM is cool and attainable for anyone who has a passion for it.

About the Maker

Shaifer is a STEM advocate, animator and TV host who educates students about the importance of STEM and diversity within it. He has a YouTube channel, gives keynote speeches and hosts the Escape Lab live science show on Twitch.

He’s also the founder of Fascinate Inc., a nonprofit with the mission of exciting underrepresented students about careers in STEM and providing schools and after-school programs with fun science curricula.

The organization also launched the Magic Cool Bus project, filling a real-life bus with cutting-edge tech gadgets and bringing it to schools so students can hop on board and explore.

Growing up in a single-parent home, Shaifer was fascinated by science, earning scholarships from NASA and NOAA that covered his expenses to study marine and environmental science at Hampton University. He’s currently working toward a Ph.D. in science education at Columbia University.

His Inspiration

Shaifer was inspired to transition from being a scientist in a lab to a science educator for others in 2017, while volunteering at a museum in Washington.

“I was freestyle rapping about a carbon cycle exhibit, and this nine-year-old Black kid came up to me and said, ‘What do you do, man?’” said Shaifer.

When Shaifer told him he was a scientist, the child said, “That’s so cool. When I grow up, I want to be a scientist just like you!”

“That made me reflect on the fact that at nine years old, I’d never seen an example of a scientist that looked like me,” said Shaifer. “I realized that students need to be exposed to a role model in STEM that they can identify with, at scale.”

Later that year, Shaifer founded Fascinate Inc.

His Favorite Jetson Projects

Shaifer is passionate about exposing students to the world of AI, and he says using NVIDIA Jetson platform is a great way to do so.

Watch him highlight Jetson products:

NVIDIA Jetson Xavier NX Unboxing and Impression

NVIDIA SparkFun JetBot AI Kit Unboxing and Impression

One of Shaifer’s favorite real-world applications that uses the NVIDIA Jetson Nano developer kit is Qrio. The bot, created by Agustinus Nalwan, recognizes a toddler’s toy and plays a relevant YouTube video.

“Especially since I work with young kids, I think that’s a really cool application that allows a child to be engaged, interactive and always learning as they play with their toys,” said Shaifer.

Where to Learn More 

Get fascinated by STEM on Shaifer’s website and YouTube channel.

Discover tools, inspiration and three easy steps to help kickstart your project with AI on our “Get AI, Learn AI, Build AI” page.

The post Meet the Maker: Mr. Fascinate Encourages Kids to Get on the Cool Bus and Study STEM appeared first on The Official NVIDIA Blog.

Read More

Improving the Accuracy of Genomic Analysis with DeepVariant 1.0

Improving the Accuracy of Genomic Analysis with DeepVariant 1.0

Posted by Andrew Carroll, Product Lead, and Pi-Chuan Chang, Technical Lead, Google Health

Sequencing genomes involves sampling short pieces of the DNA from the ~6 billion pairs of nucleobases — i.e., adenine (A), thymine (T), guanine (G), and cytosine (C) — we inherit from our parents. Genome sequencing is enabled by two key technologies: DNA sequencers (hardware) that “read” relatively small fragments of DNA, and variant callers (software) that combine the reads to identify where and how an individual’s genome differs from a reference genome, like the one assembled in the Human Genome Project. Such variants may be indicators of genetic disorders, such as an elevated risk for breast cancer, pulmonary arterial hypertension, or neurodevelopmental disorders.

In 2017, we released DeepVariant, an open-source tool which identifies genome variants in sequencing data using a convolutional neural network (CNN). The sequencing process begins with a physical sample being sequenced by any of a handful of instruments, depending on the end goal of the sequencing. The raw data, which consists of numerous reads of overlapping fragments of the genome, are then mapped to a reference genome. DeepVariant analyzes these mappings to identify variant locations and distinguish them from sequencing errors.

Soon after it was first published in 2018, DeepVariant underwent a number of updates and improvements, including significant changes to improve accuracy for whole exome sequencing and polymerase chain reaction (PCR) sequencing.

We are now releasing DeepVariant v1.0, which incorporates a large number of improvements for all sequencing types. DeepVariant v1.0 is an improved version of our submission to the PrecisionFDA v2 Truth Challenge, which achieved Best Overall accuracy for 3 of 4 instrument categories. Compared to previous state-of-the-art models, DeepVariant v1.0 significantly reduces the errors for widely-used sequencing data types, including Illumina and Pacific Biosciences. In addition, through a collaboration with the UCSC Genomics Institute, we have also released a model that combines DeepVariant with the UCSC’s PEPPER method, called PEPPER-DeepVariant, which extends coverage to Oxford Nanopore data for the first time.

Sequencing Technologies and DeepVariant
For the last decade, the majority of sequence data were generated using Illumina instruments, which produce short (75-250 bases) and accurate sequences. In recent years, new technologies have become available that can sequence much longer pieces, including Pacific Biosciences, which can produce long and accurate sequences up to ~15,000 bases in length, and Oxford Nanopore, which can produce reads up to 1 million bases long, but with higher error rates. The particular type of sequencing data a researcher might use depends on the ultimate use-case.

Because DeepVariant is a deep learning method, we can quickly re-train it for these new instrument types, ensuring highly accurate sequence identification. Accuracy is important because a missed variant call could mean missing the causal variant for a disorder, while a false positive variant call could lead to identifying an incorrect one. Earlier state-of-the-art methods could reach ~99.1% accuracy (~73,000 errors) on a 35-fold coverage Illumina whole genome, whereas an early version of DeepVariant (v0.10) had ~99.4% accuracy (46,000 errors), corresponding to a 38% error reduction. DeepVariant v1.0 reduces Illumina errors by another ~22% and PacBio errors by another ~52% relative to the last DeepVariant release (v0.10).

DeepVariant Overview
DeepVariant is a convolutional neural network (CNN) that treats the task of identifying genetic variants as an image classification problem. DeepVariant constructs tensors, essentially multi-channel images, where each channel represents an aspect of the sequence, such as the bases in the sequence (called read base), the quality of alignment between different reads (mapping quality), whether a given read supports an alternate allele (read supports variant), etc. It then analyzes these data and outputs three genotype likelihoods, corresponding to how many copies (0, 1, or 2) of a given alternate allele are present.

Example of DeepVariant data. Each row of pixels in each panel corresponds to a single read, i.e., a short genetic sequence. The top, middle, and bottom rows of panels present examples with a different number of variant alleles. Only two of the six data channels are shown: Read base — the pixel value is mapped to each of the four bases, A, C, G, or T; Read supports variant — white means that the read is consistent with a given allele and grey means it is not. Top: Classified by DeepVariant as a “2”, which means that both chromosomes match the variant allele. Middle: Classified as a “1”, meaning that one chromosome matches the variant allele. Bottom: Classified as a “0”, implying that the variant allele is missing from both chromosomes.

Technical Improvements in DeepVariant v1.0
Because DeepVariant uses the same codebase for each data type, improvements apply to each of Illumina, PacBio, and Oxford Nanopore. Below, we show the numbers for Illumina and PacBio for two types of small variants: SNPs (single nucleotide polymorphisms, which change a single base without changing sequence length) and INDELs (insertions and deletions).

  • Training on an extended truth set

    The Genome in a Bottle consortium from the National Institute of Standards and Technology (NIST) creates gold-standard samples with known variants covering the regions of the genome. These are used as labels to train DeepVariant. Using long-read technologies the Genome in a Bottle expanded the set of confident variants, increasing the regions described by the standard set from 85% of the genome to 92% of it. These more difficult regions were already used in training the PacBio models, and including them in the Illumina models reduced errors by 11%. By relaxing the filter for reads of lower mapping quality, we further reduced errors by 4% for Illumina and 13% for PacBio.

  • Haplotype sorting of long reads

    We inherit one copy of DNA from our mother and another from our father. PacBio and Oxford Nanopore sequences are long enough to separate sequences by parental origin, which is called a haplotype. By providing this information to the neural network, DeepVariant improves its identification of random sequence errors and can better determine whether a variant has a copy from one or both parents.

  • Re-aligning reads to the alternate (ALT) allele

    DeepVariant uses input sequence fragments that have been aligned to a reference genome. The optimal alignment for variants that include insertions or deletions could be different if the aligner knew they were present. To capture this information, we implemented an additional alignment step relative to the candidate variant. The figure below shows an additional second row where the reads are aligned to the candidate variant, which is a large insertion. You can see sequences that abruptly stop in the first row can now be fully aligned, providing additional information.

    Example of DeepVariant data with realignment to ALT allele. DeepVariant is presented the information in both rows of data for the same example. Only two of the six data channels are shown: Read base (channel #1) and Read supports variant (channel #5). Top: Shows the reads aligned to the reference (in DeepVariant v0.10 and earlier this is all DeepVariant sees). Bottom: Shows the reads aligned to the candidate variant, in this case a long insertion of sequence). The red arrow indicates where the inserted sequence begins.
  • Use a small network to post-process outputs

    Variants can have multiple alleles, with a different base inherited from each parent. DeepVariant’s classifier only generates a probability for one potential variant at a time. In previous versions, simple hand-written rules converted the probabilities into a composite call, but these rules failed in some edge cases. In addition, it also separated the way a final call was made from the backpropagation to train the network. By adding a small, fully-connected neural network to the post-processing step, we are able to better handle these tricky multi-allelic cases.

  • Adding data to train the release model

    The timeframe for the competition was compressed, so we trained only with data similar to the challenge data (PCR-Free NovaSeq) to speed model training. In our production releases, we seek high accuracy for multiple instruments as well as PCR+ preparations. Training with data from these diverse classes helps the model generalize, so our DeepVariant v1.0 release model outperforms the one submitted.

The charts below show the error reduction achieved by each improvement.

Training a Hybrid model
DeepVariant v1.0 also includes a hybrid model for PacBio and Illumina reads. In this case, the model leverages the strengths of both input types, without needing new logic.

Example of DeepVariant merging data from both PacBio and Illumina. Only two of the six data channels are shown: Read base (channel #1) and Read supports variant (channel #5). The longer PacBio reads (at the upper part of the image) span the region being called entirely, while the shorter Illumin reads span only a portion of the region.

We observed no change in SNP errors, suggesting that PacBio reads are strictly superior for SNP calling. We observed a further 49% reduction in Indel errors relative to the PacBio model, suggesting that the Indel error modes of Illumina and PacBio HiFi can be used in a complementary manner.

PEPPER-Deepvariant: A Pipeline for Oxford Nanopore Data Using DeepVariant
Until the PrecisionFDA competition, a DeepVariant model was not available for Oxford Nanopore data, because the higher base error rate created too many candidates for DeepVariant to classify. We partnered with the UC Santa Cruz Genomics Institute, which has extensive expertise with Nanopore data. They had previously trained a deep learning method called PEPPER, which could narrow down the candidates to a more tractable number. The larger neural network of DeepVariant can then accurately characterize the remaining candidates with a reasonable runtime.

The combined PEPPER-DeepVariant pipeline with the Oxford Nanopore model is open-source and available on GitHub. This pipeline was able to achieve a superior SNP calling accuracy to DeepVariant Illumina on the PrecisionFDA challenge, which is the first time anyone has shown Nanopore outperforming Illumina in this way.

Conclusion
DeepVariant v1.0 isn’t the end of development. We look forward to working with the genomics community to further maximize the value of genomic data to patients and researchers.

Read More