Brain Gain: NVIDIA DRIVE Orin Now Central Computer for Intelligent Vehicles

NVIDIA DRIVE Orin, our breakthrough autonomous vehicle system-on-a-chip, is the new mega brain of the software-defined vehicle.

Beyond self-driving features, NVIDIA CEO and founder Jensen Huang announced today during his GTC keynote that the SoC can power all the intelligent computing functions inside vehicles, including confidence view visualization of autonomous driving capabilities, digital clusters, infotainment and passenger interaction AI.

Slated for 2022 vehicle product lines, Orin processes more than 250 trillion operations per second while achieving systematic safety standards such as ISO 26262 ASIL-D.

Typically, vehicle functions are controlled by tens of electronic control units distributed throughout a vehicle. By centralizing control of these core domains, Orin can replace these components and simplify what has been an incredibly complex supply chain for automakers.

“The future is one central computer — four domains, virtualized and isolated, architected for functional safety and security, software-defined and upgradeable for the life of the car — in addition to super-smart AI and beautiful graphics,” Huang said.

Secure Computing for Every Need

Managing a system with multiple complex applications is incredibly difficult. And when it comes to automotive, safety is critical.

DRIVE Orin supports multiple operating systems, including Linux, QNX and Android, to enable this wide range of applications. As a high-performance compute platform architected for the highest level of safety, it does so in a way that is secure, virtualized and accelerated.

The digital cluster, driver monitoring system and AV confidence view are all crucial to ensuring the safety of a vehicle’s occupants. Each must be functionally secure, with the ability to update each application individually without requiring a system reboot.

DRIVE Orin is designed for software-defined operation, meaning it’s purpose-built to handle these continuous upgrades throughout the life of the vehicle.

The Highest Levels of Confidence

As vehicles become more and more autonomous, visualization within the cabin will be critical for building trust with occupants. And with the DRIVE Orin platform, manufacturers can integrate enhanced capability into their fleets over the life of their vehicles.

The confidence view is a rendering of the mind of the vehicle’s AI. It shows exactly what the sensor suite and perception system are detecting in real time and constructs it into a 3D surround model.

By incorporating this view in the cabin interior, the vehicle can communicate the accuracy and reliability of the autonomous driving system at every step of the journey. And occupants can gain a better understanding of how the vehicle’s AI sees the world.

As a high-performance AI compute platform, DRIVE Orin enables this visualization alongside the digital cluster, infotainment, and driver and occupant monitoring, while maintaining enough compute headroom to add new features that delight customers through the life of their vehicles.

The ability to support this multi-functionality safely and securely is what makes NVIDIA DRIVE Orin truly central to the next-generation intelligent vehicle experience.

The post Brain Gain: NVIDIA DRIVE Orin Now Central Computer for Intelligent Vehicles appeared first on The Official NVIDIA Blog.

Read More

NVIDIA Triton Tames the Seas of AI Inference

You don’t need a hunky sea god with a three-pronged spear to make AI work, but a growing group of companies from car makers to cloud service providers say you’ll feel a sea change if you sail with Triton.

More than half a dozen companies share hands-on experiences this week in deep learning with the NVIDIA Triton Inference Server, open-source software that takes AI into production by simplifying how models run in any framework on any GPU or CPU for all forms of inference.

For instance, in a talk at GTC (free with registration) Fabian Bormann, an AI engineer at Volkswagen Group, conducts a virtual tour through the Computer Vision Model Zoo, a repository of solutions curated from the company’s internal teams and future partners.

The car maker integrates Triton into its Volkswagen Computer Vision Workbench so users can make contributions to the Model Zoo without needing to worry about whether they are based on ONNX, PyTorch or TensorFlow frameworks. Triton simplifies model management and deployment, and that’s key for VW’s work serving up AI models in new and interesting environments, Bormann says in a description of his talk (session E32736) at GTC.

Salesforce Sold on Triton Benchmarks

A leader in customer-relationship management software and services, Salesforce recently benchmarked Triton’s performance on some of the world’s largest AI models — the transformers used for natural-language processing.

“Triton not only has excellent serving performance, but also comes included with several critical functions like dynamic batching, model management and model prioritization. It is quick and easy to set up and works for many deep learning frameworks including TensorFlow and PyTorch,” said Nitish Shirish Keskar, a senior research manager at Salesforce who’s presenting his work at GTC (session S32713).

Keskar described in a recent blog his work validating that Triton can handle 500-600 queries per second (QPS) while processing 100 concurrent threads and staying under 200ms latency on the well-known BERT models used to understand speech and text. He tested Triton on the much larger CTRL and GPT2-XL models, finding that despite their billions of neural-network nodes, Triton still cranked out an amazing 32-35 QPS.

A Model Collaboration with Hugging Face

More than 5,000 organizations turn to Hugging Face for help summarizing, translating and analyzing text with its 7,000 AI models for natural-language processing. Jeff Boudier, its product director, will describe at GTC (session S32003) how his team drove 100x improvements in AI inference on its models, thanks to a flow that included Triton.

“We have a rich collaboration with NVIDIA, so our users can have the most optimized performance running models on a GPU,” said Boudier.

Hugging Face aims to combine Triton with TensorRT, NVIDIA’s software for optimizing AI models, to drive the time to process an inference with a BERT model down to less than a millisecond. “That would push the state of the art, opening up new use cases with benefits for a broad market,” he said.

Deployed at Scale for AI Inference

American Express uses Triton in an AI service that operates within a 2ms latency requirement to detect fraud in real time across $1 trillion in annual transactions.

As for throughput, Microsoft uses Triton on its Azure cloud service to power the AI behind GrammarLink, its online editor for Microsoft Word that’s expected to serve as many as half a trillion queries a year.

Less well known but well worth noting, LivePerson, based in New York, plans to run thousands of models on Triton in a cloud service that provides conversational AI capabilities to 18,000 customers including GM Financial, Home Depot and European cellular provider Orange.

Triton Inference Server
Triton simplifies the job of executing multiple styles of inference with models based on various frameworks while maintaining highest throughput and system utilization.

And the chief technology officer of London-based Intelligent Voice will describe at GTC (session S31452) its LexIQal system, which uses Triton for AI inference to detect fraud in insurance and financial services.

They are among many companies using NVIDIA for AI inference today. In the past year alone, users downloaded the Triton software more than 50,000 times.

Triton’s Swiss Army Spear

Triton is getting traction in part because it can handle any kind of AI inference job, whether it’s one that runs in real time, batch mode, as a streaming service or even if it involves a chain or ensemble of models. That flexibility eliminates the need for users to adopt and manage custom inference servers for each type of task.

In addition, Triton assures high system utilization, distributing work evenly across GPUs whether inference is running in a cloud service, in a local data center or at the edge of the network. And it’s open, extensible code lets users customize Triton to their specific needs.

NVIDIA keeps improving Triton, too. A recently added model analyzer combs through all the options to show users the optimal batch size or instances-per-GPU for their job. A new tool automates the job of translating and validating a model trained in Tensorflow or PyTorch into a TensorRT format; in future, it will support translating models to and from any neural-network format.

Meet Our Inference Partners

Triton’s attracted several partners who support the software in their cloud services, including Amazon, Google, Microsoft and Tencent. Others such as Allegro, Seldon and Red Hat support Triton in the software for enterprise data centers for workflows including MLOps, the extension to DevOps for AI.

At GTC (session S33118), Arm will describe how it adapted Triton as part of its neural-network software that runs inference directly on edge gateways. Two engineers from Dell EMC will show how to boost performance in video analytics 6x using Triton (session S31437), and NetApp will talk about its work integrating Triton with its solid-state storage arrays (session S32187).

To learn more, register for GTC and check out one of two introductory sessions (S31114, SE2690) with NVIDIA experts on Triton for deep learning inference.

The post NVIDIA Triton Tames the Seas of AI Inference appeared first on The Official NVIDIA Blog.

Read More

Like Magic: NVIDIA Merlin Gains Adoption for Training and Inference

Recommenders personalize the internet. They suggest videos, foods, sneakers and advertisements that seem magically clairvoyant in knowing your tastes and interests.

It’s an AI that makes online experiences more enjoyable and efficient, quickly taking you to the things you want to see. While delivering content you like, it also targets tempting ads for jeans, or recommends comfort dishes that fit those midnight cravings.

But not all recommender systems can handle the data requirements to make smarter suggestions. That leads to slower training and less intuitive internet user experiences.

NVIDIA Merlin is turbocharging recommenders, boosting training and inference. Leaders in media, entertainment and on-demand delivery use the open source recommender framework for running accelerated deep learning on GPUs. Improving recommendations increases clicks, purchases — and satisfaction.

Merlin-Accelerated Recommenders 

NVIDIA Merlin enables businesses of all types to build recommenders accelerated by NVIDIA GPUs.

Its collection of libraries includes tools for building deep learning-based systems that provide better predictions than traditional methods and increase clicks. Each stage of the pipeline is optimized to support hundreds of terabytes of data, all accessible through easy-to-use APIs.

Merlin is in testing with hundreds of companies worldwide. Social media and video services are evaluating it for suggestions on next views and ads. And major on-demand apps and retailers are looking at it for suggestions on new items to purchase.

Videos with Snap

With Merlin, Snap is improving the customer experience with better load times by ranking content and ads 60% faster while also reducing their infrastructure costs. Using GPUs and Merlin provides Snap with additional compute capacity to explore more complex and accurate ranking models. These improvements allow Snap to deliver even more engaging experiences at a lower cost.

Tencent: Ads that Click

China’s leading online video media platform uses Merlin HugeCTR to help connect over 500 million monthly active users with ads that are relevant and engaging. With such a huge dataset, training speed matters and determines the performance of the recommender model. Tencent deployed its real-time training with Merlin and achieved more than a 7x speedup over the original TensorFlow solution on the same GPU platform. Tencent dives into this further at its GTC presentation.

Postmates Food Picks

Merlin was designed to streamline and support recommender workflows. Postmates uses recommenders to help people decide what’s for dinner. Postmates utilizes Merlin NVTabular to optimize training time, reducing it from 1 hour on CPUs to just 5 minutes on GPUs.

Using NVTabular for feature engineering, the company reduced training costs by 95 percent and is exploring more advanced deep learning models. Postmates delves more into this in its GTC presentation.

Merlin Streamlines Recommender Workflows at Scale

As Merlin is interoperable, it provides flexibility to accelerate recommender workflow pipelines.

The open beta release of the Merlin recommendation engine delivers leaps in data loading and training of deep learning systems.

NVTabular reduces data preparation time by GPU-accelerating feature transformations and preprocessing. NVTabular, which makes loading massive data lakes into training pipelines easier, gets multi-GPU support and improved interoperability with TensorFlow and PyTorch.

Merlin’s Magic for Training

Merlin HugeCTR is the main training component. It’s designed for training deep learning recommender systems and comes with its own optimized data loader, vastly outperforming generic deep learning frameworks. HugeCTR provides a parquet data reader to digest the NVTabular preprocessed data. HugeCTR is a deep neural network training framework specifically designed for recommender workflows capable of distributed training across multiple GPUs and nodes for maximum performance.

NVIDIA Triton Inference Server accelerates production inference on GPUs for feature transforms and neural network execution.

Learn more about the technology advances behind Merlin since its initial launch, including its support for NVTabular, HugeCTR and NVIDIA Triton Inference Server.

 

The post Like Magic: NVIDIA Merlin Gains Adoption for Training and Inference appeared first on The Official NVIDIA Blog.

Read More

NVIDIA Maxine Hits the Scene to Create Real-Time Video Experiences

The next time you’re in a virtual meeting or streaming a game, live event or TV program, the star of the show may be NVIDIA Maxine, which took center stage at GTC today when NVIDIA CEO Jensen Huang announced the availability of the GPU-accelerated software development kit during his keynote address.

Developers from video conferencing, content creation and streaming providers are using the Maxine SDK to create real-time video-based experiences. And it’s easily deployed to PCs, data centers or in the cloud.

Shift Towards Remote Work

Virtual collaboration continues to grow with 70 million hours of web meetings daily, and more global organizations are looking at technologies to support an increasingly remote workforce.

Pexip, a scalable video conferencing platform that enables interoperability between different video conferencing systems, was looking to push the boundaries of its video communications offering to meet this growing demand.

“We’re already using NVIDIA Maxine for audio noise removal and working on integrating virtual backgrounds to support premium video conferencing experiences for enterprises of all sizes,” said Giles Chamberlin, CTO and co-founder of Pexip.

Working with NVIDIA, Pexip aims to provide AI-powered video communications that support virtual meetings that are better than meetings in person.

It joins other companies in the video collaboration space like Avaya, which incorporated Maxine audio noise reduction into its Spaces app last October and has now implemented virtual background, which allows presenters to overlay their video over presentations.

Headroom uses AI to take distractions out of video conferencing, so participants can focus on interactions during meetings instead. This includes flagging when people have questions, note taking, transcription and smart meeting summarization.

Seeing Face Value for Virtual Events

Research has shown that there are over 1 million virtual events yearly, with more event marketers planning to invest in them in the future. As a result, everyone from event organizers to visual effects artists are looking for faster, more efficient ways to create digital experiences.

Among them is Touchcast, which combines AI and mixed reality to reimagine virtual events. It’s using Maxine’s super-resolution features to convert and deliver 1080p streams into 4K.

“NVIDIA Maxine is paving the future of video communications — a future where AI and neural networks enhance and enrich content in entirely new ways,” said Edo Segal, founder and CEO of Touchcast.

Another example is Notch, which creates tools that enable real-time visual effects and motion graphics for live events. Maxine provides it with real-time, AI-driven face and body tracking along with background removal.

Artists can track and mask performers in a live performance setting for a variety of creative use cases — all using a standard camera feed and eliminating the challenges of special hardware-tracking solutions.

“The integration of the Maxine SDK was very easy and took just a few days to complete,” said Matt Swoboda, founder and director of Notch.

Field of Streams

With nearly 10 million content creators on Twitch per month, becoming a live broadcaster has also never been easier. Live streamers are looking for powerful yet easy-to-use features to excite their audiences.

BeLive, which provides a platform for live streaming user-generated talk shows, is using Maxine to process its video streams in the cloud so customers don’t have to invest in expensive equipment. By running Maxine in the cloud, users can benefit from high-quality background replacement regardless of the hardware they’re running in the client.

With BeLive, live interactive call-in talk shows can be produced easily and streamed to YouTube or Facebook Live, with participants calling in from around the world.

OBS, the leading platform for streaming and recording, is a free and open source software solution broadly used for game streaming and live production. Users with NVIDIA RTX GPUs can now take advantage of noise removal, improving the clarity of their audio during production.

Maxine users
Developers are using the Maxine SDK for building virtual collaboration and content creation applications.

A Look Into NVIDIA Maxine

NVIDIA Maxine includes three AI SDKs covering video effects, audio effects and augmented reality — each with pre-trained deep learning models, so developers can quickly build or enhance their real-time applications.

Starting with the NVIDIA Video Effects SDK, enterprises can now apply AI effects to improve video quality without special cameras or other hardware. Features include super-resolution, generating 720p output live videos from 360p input videos along with artifact reduction to remove defects for crisper pictures.

Video noise removal eliminates low-light camera noise introduced in the video capture process while preserving all of the details. To hide messy rooms or other visual distractions, the Video Effects SDK removes the background of a webcam feed in real time, so only a user’s face and body show up in a livestream.

The NVIDIA Augmented Reality SDK enables real-time 3D face tracking using a standard web camera, delivering a more engaging virtual communication experience by automatically zooming into the face and keeping that face within view of the camera.

It’s now possible to detect human faces in images of video feeds, track the movement of facial expressions, create a 3D mesh representation of a person’s face, use video to track the movement of a  human body in 3D space, simulate eye contact through gaze estimation and much more.

The NVIDIA Audio Effects SDK uses AI to remove distracting background noise from incoming and outgoing audio feeds, improving the clarity and quality of any conversation.

This includes the removal of unwanted background noises — like a dog barking or baby crying — to make conversations easier to understand. For meetings in large spaces, it’s also possible to remove room echoes from the background to make voices clearer.

Developers can add Maxine AI effects into their existing applications or develop new pipelines from scratch using NVIDIA DeepStream, an SDK for building intelligent video analytics, and NVIDIA Video Codec, an SDK for accelerated video encode and decode on Windows and Linux.

Maxine can also be used with NVIDIA Jarvis, a framework for building conversational AI applications, to offer world-class language-based capabilities such as transcription and translation.

Availability

Get started with NVIDIA Maxine.

And don’t let the curtain close on the opportunity to learn more about NVIDIA Maxine during GTC, running April 12-16. Registration is free.

A full list of Maxine-focused sessions can be found here. Be sure to watch Huang’s keynote address on-demand. And check out a demo (below) of Maxine.

The post NVIDIA Maxine Hits the Scene to Create Real-Time Video Experiences appeared first on The Official NVIDIA Blog.

Read More

Fast Track to Enterprise AI: New NVIDIA Workflow Lets Any User Choose, Adapt, Deploy Models Easily

AI is the most powerful new technology of our time, but it’s been a force that’s hard to harness for many enterprises — until now.

Many companies lack the specialized skills, access to large datasets or accelerated computing that deep learning requires. Others are realizing the benefits of AI and want to spread them quickly across more products and services.

For both, there’s a new roadmap to enterprise AI. It leverages technology that’s readily available, then simplifies the AI workflow with NVIDIA TAO and NVIDIA Fleet Command to make the trip shorter and less costly.

Grab and Go AI Models

The journey begins with pre-trained models. You don’t have to design and train a neural network from scratch in 2021. You can choose one of many available today in our NGC catalog.

We’ve curated models that deliver skills to advance your business.  They span the spectrum of AI jobs from computer vision and conversational AI to natural-language understanding and more.

Models Show Their AI Resumes

So users know what they’re getting, many models in the catalog come with credentials. They’re like the resume for a prospective hire.

Model credentials show you the domain the model was trained for, the dataset that trained it, how often the model was deployed and how it’s expected to perform. They provide transparency and confidence you’re picking the right model for your use case.

Leveraging a Massive Investment

NVIDIA invested hundreds of millions of GPU compute hours over more than five years refining these models. We did this work so you don’t have to.

Here are three quick examples of the R&D you can leverage:

For computer vision, we devoted 3,700 person-years to labeling 500 million objects from 45 million frames. We used voice recordings to train our speech models on GPUs for more than a million hours. A database of biomedical papers packing 6.1 billion words educated our models for natural-language processing.

Transfer Learning, Your AI Tailor

Once you choose a model, you can fine tune it to fit your specific needs using NVIDIA TAO, the next stage of our expedited workflow for enterprise AI.

TAO enables transfer learning, a process that harvests features from an existing neural network and plants them in a new one using NVIDIA’s Transfer Learning Toolkit, an integrated part of TAO. It leverages small datasets users have on hand to give models a custom fit without the cost, time and massive datasets required to build and train a neural network from scratch.

Sometimes companies have an opportunity to further enhance models by training them across larger, more diverse datasets maintained by partners outside the walls of their data center.

TAO Lets Partners Collaborate with Privacy 

Federating learning, another part of TAO, lets different sites securely collaborate to refine a model for the highest accuracy. With this technique, users share components of models such as their partial weights. Datasets remain inside each company’s data center so data privacy is preserved.

In one recent example, 20 research sites collaborated to raise the accuracy of the so-called EXAM model that predicts whether a patient has COVID-19. After applying federated learning, the model also could predict the severity of the infection and whether the patient would need supplemental oxygen. Patient data stayed safely behind the walls of each partner.

Taking Enterprise AI to Production

Once a model is fine tuned, it needs to be optimized for deployment.

It’s a pruning process that makes models lean, yet robust, so they function efficiently on your target platform whether it’s an array of GPUs in a server or a Jetson-powered robot on the factory floor.

NVIDIA TensorRT, another part of TAO, dials a model’s mathematical coordinates to an optimal balance of the smallest size with the highest accuracy for the system it will run on. It’s a crucial step, especially for real-time services like speech recognition or fraud detection that won’t tolerate system latency.

Then, with the Triton Inference Server, users can select the optimal configuration to deploy, whatever the model’s architecture, the framework it uses or target CPU or GPU it will run on.

Once a model is optimized and ready for deployment, users can easily integrate it with whatever application framework that fits their use case or industry. For example, it could be Jarvis for conversational AI, Clara for healthcare, Metropolis for video analytics or Isaac for robotics to name just a few that NVIDIA provides.

NGC TAO Fleet Command workflow
Pre-trained models in NGC, along with TAO and Fleet Command for a simple, but powerful AI workflow.

With the chosen application framework, users can launch NVIDIA Fleet Command to deploy and manage the AI application across a variety of GPU-powered devices. It’s the last key step in the journey.

Zero to AI in Minutes

Fleet Command connects NVIDIA-Certified servers deployed at the network’s edge to the cloud. With it, users can work from a browser to securely pair, orchestrate and manage millions of servers, deploy AI to any remote location and update software as needed.

Administrators monitor health and update systems with one-click to simplify AI operations at scale.

Fleet Command uses end-to-end security protocols to ensure application data and intellectual property remain safe.

Data is sent between the edge and the cloud, fully encrypted, ensuring it’s protected. And applications are scanned for malware and vulnerabilities before they are deployed.

An AI Workflow That’s on the Job

Fleet Command and elements of TAO are already in use in warehouses, in retail, in hospitals and on the factory floor. Users include companies such as Accenture, BMW and Siemens Digital Industries

A demo (below) from the GTC keynote shows how the one-two-three combination of NGC models, TAO and Fleet Command can quickly tailor and deploy an application using multiple AI models.

You can sign up for Fleet Command today.

Core parts of TAO, such as the Transfer Learning Toolkit and federated learning, are available today. Apply now for early access to them all, fully integrated into TAO.

The post Fast Track to Enterprise AI: New NVIDIA Workflow Lets Any User Choose, Adapt, Deploy Models Easily appeared first on The Official NVIDIA Blog.

Read More

Dream State: Cybersecurity Vendors Detect Breaches in an Instant with NVIDIA Morpheus

In the geography of data center security, efforts have long focused on protecting north-south traffic — the data that passes between the data center and the rest of the network. But one of the greatest risks has become east-west traffic — network packets passing between servers within a data center.

That’s due to the growth of cloud-native applications built from microservices, whose connections across a data center are changing constantly. With a typical 1,000-server data center having over 1 billion network paths, it’s extremely difficult to write fixed rules that control the blast radius should a malicious actor get inside.

The new NVIDIA Morpheus AI application framework gives security teams complete visibility into security threats by bringing together unmatched AI processing and real-time monitoring on every packet through the data center. It lets them respond to anomalies and update policies immediately as threats are identified.

Combining the security superpowers of AI and NVIDIA BlueField data processing units (DPUs), Morpheus provides cybersecurity developers a highly optimized AI pipeline and pre-trained AI skills that, for the first time, allow them to instantaneously inspect all IP network communication through their data center fabric.

Bringing a new level of security to data centers, the framework provides dynamic protection, monitoring, adaptive policies and cyber defenses required to detect and remediate them.

Continuous AI Analytics on Network Traffic

Morpheus — which combines event streaming from NVIDIA Cumulus NetQ and GPU accelerated computing with RAPIDS data analytics pipelines, deep learning frameworks and Triton Inference Server, runs on mainstream NVIDIA-Certified enterprise servers — simplifies the analysis of computer logs and helps detect and mitigate security threats. Pre-trained AI models help find leaked credentials, keys, passwords, credit card numbers, bank account numbers and identify security policies that need to be hardened.

Integrating the framework into a third-party cybersecurity offering brings the world’s best AI computing to communication networks. Morpheus can receive rich telemetry feeds from every NVIDIA BlueField DPU-accelerated server in the data center without impacting server performance. BlueField-2 DPUs act both as a sensor to collect real-time packet flows and as a policy enforcement point to limit communication between any microservice container or virtual machine in a data center.

By placing BlueField-2 DPUs in servers across the data center, Morpheus can automatically write and change policies to immediately remediate security threats — from changing the logs being collected and altering the volume of ingesting, to dynamically redirecting certain log events, blocking traffic newly identified as malicious, rewriting rules to enforce policy updates, and more.

Accelerate and Secure the Data Center with NVIDIA BlueField DPUs 

The NVIDIA BlueField-2 DPU, available today, enables true software-defined, hardware-accelerated data center infrastructure. By having software-defined networking policies and telemetry collection run on the BlueField DPU before entering the server, the DPU offloads, accelerates, and isolates critical data center functions without burdening the server’s CPU. The DPU also extends the simple static security logging model and implements sophisticated dynamic telemetry that evolves with new policies being determined and adjusted.

Learn more about NVIDIA Morpheus and apply for early access, currently available in the U.S. and Israel.

The post Dream State: Cybersecurity Vendors Detect Breaches in an Instant with NVIDIA Morpheus appeared first on The Official NVIDIA Blog.

Read More

NVIDIA’s New CPU to ‘Grace’ World’s Most Powerful AI-Capable Supercomputer

NVIDIA’s new Grace CPU will power the world’s most powerful AI-capable supercomputer.

The Swiss National Computing Center’s (CSCS) new system will use Grace, a revolutionary Arm-based data center CPU introduced by NVIDIA today, to enable breakthrough research in a wide range of fields.

From climate and weather to materials sciences, astrophysics, computational fluid dynamics, life sciences, molecular dynamics, quantum chemistry and particle physics, as well as domains like economics and social sciences, Alps will play a key role in advancing science throughout Europe and worldwide when it comes online in 2023.

“We are thrilled to announce the Swiss National Supercomputing Center will build a supercomputer powered by Grace and our next-generation GPU,” NVIDIA CEO Jensen Huang said Monday during his keynote at NVIDIA’s GPU Technology Conference.

Alps will be built by Hewlett Packard Enterprise using the new HPE Cray EX supercomputer product line as well as the NVIDIA HGX supercomputing platform, including NVIDIA GPUs and the NVIDIA HPC SDK as well as the new Grace CPU.

The Alps system will replace CSCS’s existing Piz Daint supercomputer.

AI New Kind of Supercomputing

Alps is one of the new generation of machines that are expanding supercomputing beyond traditional modeling and simulation by taking advantage of GPU-accelerated deep learning.

“Deep learning is just an incredibly powerful set of tools that we add to the toolbox,” said CSCS Director Thomas Schulthess.

Taking advantage of the tight coupling between NVIDIA CPUs and GPUs, Alps is expected to be able to train GPT-3, the world’s largest natural language processing model, in only two days — 7x faster than NVIDIA’s 2.8-AI exaflops Selene supercomputer, currently recognized as the world’s leading supercomputer for AI by MLPerf.

CSCS users will be able to apply this incredible AI performance to a wide range of emerging scientific research that can benefit from natural language understanding.

This includes, for example, analyzing and understanding massive amounts of knowledge available in scientific papers and generating new molecules for drug discovery.

Soul of the New Machine

Based on the hyper-efficient Arm microarchitecture found in billions of smartphones and other edge computing devices, Grace will deliver 10x the performance of today’s fastest servers on the most complex AI and high-performance computing workloads.

Grace will support the next generation of NVIDIA’s coherent NVLink interconnect technology, allowing data to move more quickly between system memory, CPUs and GPUs.

And thanks to growing GPU support for data science acceleration at ever-larger scales, Alps will also be able to accelerate a bigger chunk of its users’ workflows, such as ingesting the vast quantities of data needed for modern supercomputing.

“The scientists will not only be able to carry out simulations, but also pre-process or post-process their data,” Schulthess said. “This makes the whole workflow more efficient for them.”

From Particle Physics to Weather Forecasts

CSCS has long supported scientists who are working at the cutting edge, particularly in materials science, weather forecasting and climate modeling, and understanding data streaming in from a new generation of scientific instruments.

CSCS designs and operates a dedicated system for numerical weather predictions (NWP) on behalf of MeteoSwiss, the Swiss meteorological service. This system has been running on GPUs since 2016.

That long-standing experience with operational NWP on GPUs will be key to future climate simulations as well — key not only to modeling long-term changes to climate, but to building models able to more accurately predict extreme weather events, saving lives.

One of that team’s goals is to run global climate models with a spatial resolution of 1 km that can map convective clouds such as thunderclouds.

The CSCS supercomputer is also used by Swiss scientists for the analysis of data from the Large Hadron Collider (LHC) at CERN, the European Council for Nuclear Research. It is the Swiss Tier-2 system in the World LHC Computing Grid.

Based in Geneva, the LHC — at $9 billion, one of the most expensive scientific instruments ever built — generates 90 petabytes of data a year.

Alps uses a new software-defined infrastructure that can support a wide range of projects.

As a result, in the future, different teams, such those from MeteoSwiss, will be able to use one or more partitions on a single, unified infrastructure, rather than different machines.

These can be virtual ad-hoc clusters for individual users or predefined clusters that research teams can put together with CSCS and then operate themselves.

 

 

 

 Featured image source: Steve Evans, from Citizen of the World.

 

The post NVIDIA’s New CPU to ‘Grace’ World’s Most Powerful AI-Capable Supercomputer appeared first on The Official NVIDIA Blog.

Read More

What Is Quantum Computing?

Twenty-seven years before Steve Jobs unveiled a computer you could put in your pocket, physicist Paul Benioff published a paper showing it was theoretically possible to build a much more powerful system you could hide in a thimble — a quantum computer.

Named for the subatomic physics it aimed to harness, the concept Benioff described in 1980 still fuels research today, including efforts to build the next big thing in computing: a system that could make a PC look in some ways quaint as an abacus.

Richard Feynman —  a Nobel Prize winner whose wit-laced lectures brought physics to a broad audience —  helped establish the field, sketching out how such systems could simulate quirky quantum phenomena more efficiently than traditional computers.

So, What Is Quantum Computing?

Quantum computing uses the physics that governs subatomic particles to perform sophisticated parallel calculations, replacing more simplistic transistors in today’s computers.

Quantum computers calculate using qubits, computing units that can be on, off or any value between, instead of the bits in traditional computers that are either on or off, one or zero. The qubit’s ability to live in the in-between state — called superposition — adds a powerful capability to the computing equation, making quantum computers superior for some kinds of math.

quantum computing definedWhat Does a Quantum Computer Do?

Using qubits, quantum computers could buzz through calculations that would take classical computers a loooong time — if they could even finish them.

For example, today’s computers use eight bits to represent any number between 0 and 255. Thanks to features like superposition, a quantum computer can use eight qubits to represent every number between 0 and 255, simultaneously.

It’s a feature like parallelism in computing: All possibilities are computed at once rather than sequentially, providing tremendous speedups.

So, while a classical computer steps through long division calculations one at a time to factor a humongous number, a quantum computer can get the answer in a single step. Boom!

That means quantum computers could reshape whole fields, like cryptography, that are based on factoring what are today impossibly large numbers.

A Big Role for Tiny Simulations

That could be just the start. Some experts believe quantum computers will bust through limits that now hinder simulations in chemistry, materials science and anything involving worlds built on the nano-sized bricks of quantum mechanics.

Quantum computers could even extend the life of semiconductors by helping engineers create more refined simulations of the quantum effects they’re starting to find in today’s smallest transistors.

Indeed, experts say quantum computers ultimately won’t replace classical computers, they’ll complement them. And some predict quantum computers will be used as accelerators much as GPUs accelerate today’s computers.

How Does Quantum Computing Work?

Don’t expect to build your own quantum computer like a DIY PC with parts scavenged from discount bins at the local electronics shop.

The handful of systems operating today typically require refrigeration that creates working environments just north of absolute zero. They need that computing arctic to handle the fragile quantum states that power these systems.

In a sign of how hard constructing a quantum computer can be, one prototype suspends an atom between two lasers to create a qubit. Try that in your home workshop!

Quantum computing takes nano-Herculean muscles to create something called entanglement. That’s when two or more qubits exist in a single quantum state, a condition sometimes measured by electromagnetic waves just a millimeter wide.

Crank up that wave with a hair too much energy and you lose entanglement or superposition, or both. The result is a noisy state called decoherence, the equivalent in quantum computing of the blue screen of death.

What’s the Status of Quantum Computers?

A handful of companies such as Alibaba, Google, Honeywell, IBM, IonQ and Xanadu operate early versions of quantum computers today.

Today they provide tens of qubits. But qubits can be noisy, making them sometimes unreliable. To tackle real-world problems reliably, systems need tens or hundreds of thousands of qubits.

Experts believe it could be a couple decades before we get to a high-fidelity era when quantum computers are truly useful.

quantum computing status
Quantum computers are slowly moving toward commercial use. (Source: ISSCC 2017 talk by Lieven Vandersypen.)

Predictions of when we reach so-called quantum computing supremacy — the time when quantum computers execute tasks classical ones can’t — is a matter of lively debate in the industry.

Accelerating Quantum Circuit Simulations Today

The good news is the world of AI and machine learning put a spotlight on accelerators like GPUs, which can perform many of the types of operations quantum computers would calculate with qubits.

So, classical computers are already finding ways to host quantum simulations with GPUs today. For example, NVIDIA ran a leading-edge quantum simulation on Selene, our in-house AI supercomputer.

NVIDIA announced in the GTC keynote the cuQuantum SDK to speed quantum circuit simulations running on GPUs. Early work suggests cuQuantum will be able to deliver orders of magnitude speedups.

The SDK takes an agnostic approach, providing a choice of tools users can pick to best fit their approach. For example, the state vector method provides high-fidelity results, but its memory requirements grow exponentially with the number of qubits.

That creates a practical limit of roughly 50 qubits on today’s largest classical supercomputers. Nevertheless we’ve seen great results (below) using cuQuantum to accelerate quantum circuit simulations that use this method.

quantum state vector results
State vector: 1,000 circuits, 36 qubits, depth m=10, complex 64 | CPU: Qiskit on dual AMD EPYC 7742 | GPU: Qgate on DGX A100

Researchers from the Jülich Supercomputing Centre will provide a deep dive on their work with the state vector method in session E31941 at GTC (free with registration).

A newer approach, tensor network simulations, use less memory and more computation to perform similar work.

Using this method, NVIDIA and Caltech accelerated a state-of-the-art quantum circuit simulator with cuQuantum running on NVIDIA A100 Tensor Core GPUs. It generated a sample from a full-circuit simulation of the Google Sycamore circuit in 9.3 minutes on Selene, a task that 18 months ago experts thought would take days using millions of CPU cores.

Quantum tensor chart
Tensor Network – 53 qubits, depth m=20 | CPU: Quimb on Dual AMD EPYC 7742 estimated | GPU: Quimb on DGX-A100

“Using the Cotengra/Quimb packages, NVIDIA’s newly announced cuQuantum SDK, and the Selene supercomputer, we’ve generated a sample of the Sycamore quantum circuit at depth m=20 in record time — less than 10 minutes,” said Johnnie Gray, a research scientist at Caltech.

“This sets the benchmark for quantum circuit simulation performance and will help advance the field of quantum computing by improving our ability to verify the behavior of quantum circuits,” said Garnet Chan, a chemistry professor at Caltech whose lab hosted the work.

NVIDIA expects the performance gains and ease of use of cuQuantum will make it a foundational element in every quantum computing framework and simulator at the cutting edge of this research.

Sign up to show early interest in cuQuantum here.

The post What Is Quantum Computing? appeared first on The Official NVIDIA Blog.

Read More

Drug Discovery Gets Jolt of AI via NVIDIA Collaborations with AstraZeneca, U of Florida Health

NVIDIA is collaborating with biopharmaceutical company AstraZeneca and the University of Florida’s academic health center, UF Health, on new AI research projects using breakthrough transformer neural networks.

Transformer-based neural network architectures — which have become available only in the last several years — allow researchers to leverage massive datasets using self-supervised training methods, avoiding the need for manually labeled examples during pre-training. These models, equally adept at learning the syntactic rules to describe chemistry as they are at learning the grammar of languages, are finding applications across research domains and modalities.

NVIDIA is collaborating with AstraZeneca on a transformer-based generative AI model for chemical structures used in drug discovery that will be among the very first projects to run on Cambridge-1, which is soon to go online as the UK’s largest supercomputer. The model will be open sourced, available to researchers and developers in the NVIDIA NGC software catalog, and deployable in the NVIDIA Clara Discovery platform for computational drug discovery.

Separately, UF Health is harnessing NVIDIA’s state-of-the-art Megatron framework and BioMegatron pre-trained model — available on NGC — to develop GatorTron, the largest clinical language model to date.

New NGC applications include AtacWorks, a deep learning model that identifies accessible regions of DNA, and MELD, a tool for inferring the structure of biomolecules from sparse, ambiguous or noisy data.

Megatron Model for Molecular Insights

The MegaMolBART drug discovery model being developed by NVIDIA and AstraZeneca is slated for use in reaction prediction, molecular optimization and de novo molecular generation. It’s based on AstraZeneca’s MolBART transformer model and is being trained on the ZINC chemical compound database — using NVIDIA’s Megatron framework to enable massively scaled-out training on supercomputing infrastructure.

The large ZINC database allows researchers to pretrain a model that understands chemical structure, bypassing the need for hand-labeled data. Armed with a statistical understanding of chemistry, the model will be specialized for a number of downstream tasks, including predicting how chemicals will react with each other and generating new molecular structures.

“Just as AI language models can learn the relationships between words in a sentence, our aim is that neural networks trained on molecular structure data will be able to learn the relationships between atoms in real-world molecules,” said Ola Engkvist, head of molecular AI, discovery sciences, and R&D at AstraZeneca. “Once developed, this NLP model will be open source, giving the scientific community a powerful tool for faster drug discovery.”

The model, trained using NVIDIA DGX SuperPOD, gives researchers ideas for molecules that don’t exist in databases but could be potential drug candidates. Computational methods, known as in-silico techniques, allow drug developers to search through more of the vast chemical space and optimize pharmacological properties before shifting to expensive and time-consuming lab testing.

This collaboration will use the NVIDIA DGX A100-powered Cambridge-1 and Selene supercomputers to run large workloads at scale. Cambridge-1 is the largest supercomputer in the U.K., ranking No. 3 on the Green500 and No. 29 on the TOP500 list of the world’s most powerful systems. NVIDIA’s Selene supercomputer topped the most recent Green500 and ranks fifth on the TOP500.

Language Models Speed Up Medical Innovation

UF Health’s GatorTron model — trained on records from more than 50 million interactions with 2 million patients — is a breakthrough that can help identify patients for lifesaving clinical trials, predict and alert health teams about life-threatening conditions, and provide clinical decision support to doctors.

“GatorTron leveraged over a decade of electronic medical records to develop a state-of-the-art model,” said Joseph Glover, provost at the University of Florida, which recently boosted its supercomputing facilities with NVIDIA DGX SuperPOD. “A tool of this scale will enable healthcare researchers to unlock insights and reveal previously inaccessible trends from clinical notes.”

Beyond clinical medicine, the model also accelerates drug discovery by making it easier to rapidly create patient cohorts for clinical trials and for studying the effect of a certain drug, treatment or vaccine.

It was created using BioMegatron, the largest biomedical transformer model ever trained, developed by NVIDIA’s applied deep learning research team using data from the PubMed corpus. BioMegatron is available on NGC through Clara NLP, a collection of NVIDIA Clara Discovery models pretrained on biomedical and clinical text.

“The GatorTron project is an exceptional example of the discoveries that happen when experts in academia and industry collaborate using leading-edge artificial intelligence and world-class computing resources,” said David R. Nelson, M.D., senior vice president for health affairs at UF and president of UF Health. “Our partnership with NVIDIA is crucial to UF emerging as a destination for artificial intelligence expertise and development.”

Powering Drug Discovery Platforms

NVIDIA Clara Discovery libraries and NVIDIA DGX systems have been adopted by computational drug discovery platforms, too, boosting pharmaceutical research.

  • Schrödinger, a leader in chemical simulation software development, today announced a strategic partnership with NVIDIA that includes research in scientific computing and machine learning, optimizing of Schrödinger applications on NVIDIA platforms, and a joint solution around NVIDIA DGX SuperPOD to evaluate billions of potential drug compounds within minutes.
  • Biotechnology company Recursion has installed BioHive-1, a supercomputer based on the NVIDIA DGX SuperPOD reference architecture that, as of January, is estimated to rank at No. 58 on the TOP500 list of the world’s most powerful computer systems. BioHive-1 will allow Recursion to run within a day deep learning projects that previously took a week to complete using its existing cluster.
  • Insilico Medicine, a partner in the NVIDIA Inception accelerator program, recently announced the discovery of a novel preclinical candidate to treat idiopathic pulmonary fibrosis — the first example of an AI-designed molecule for a new disease target nominated for clinical trials. Compounds were generated on a system powered by NVIDIA Tensor Core GPUs, taking less than 18 months and under $2 million from target hypothesis to preclinical candidate selection.
  • Vyasa Analytics, a member of the NVIDIA Inception accelerator program, is using Clara NLP and NVIDIA DGX systems to give its users access to pretrained models for biomedical research. The company’s GPU-accelerated Vyasa Layar Data Fabric is powering solutions for multi-institutional cancer research, clinical trial analytics and biomedical data harmonization.

Learn more about NVIDIA’s work in healthcare at this week’s GPU Technology Conference, which kicks off with a keynote address by NVIDIA CEO Jensen Huang. Registration is free. The healthcare track includes 16 live webinars, 18 special events and over 100 recorded sessions.

Subscribe to NVIDIA healthcare news and follow NVIDIA Healthcare on Twitter.

The post Drug Discovery Gets Jolt of AI via NVIDIA Collaborations with AstraZeneca, U of Florida Health appeared first on The Official NVIDIA Blog.

Read More

An Engine of Innovation: Sony Levels Up for the AI Era

If you want to know what the next big thing will be, ask someone at a company that invents it time and again.

“AI is a key tool for the next era, so we are providing the computing resources our developers need to generate great AI results,” said Yuichi Kageyama, general manager of Tokyo Laboratory 16, in R&D Center for Sony Group Corporation.

Called GAIA internally, the lab’s computing resources act as a digital engine serving all Sony Group companies. And it’s about to get a second fuel injection of accelerated computing for AI efforts across the corporation.

Sony’s engineers are packing machine-learning smarts into products from its Xperia smartphones, its entertainment robot, aibo, and a portfolio of imaging components for everything from professional and consumer cameras to factory automation and satellites. It’s even using AI to build the next generation of advanced imaging chips.

More Zip, Fewer Tolls

To move efficiently into the AI era, Sony is installing a cluster of NVIDIA DGX A100 systems linked on an NVIDIA Mellanox  InfiniBand network. It expands an existing system now running at near full utilization with NVIDIA V100 Tensor Core GPUs, commissioned in October when the company brought AI training in house.

“When we were using cloud services, AI developers worried about the costs, but now they can focus on AI development on GAIA,” said Kageyama.

An in-house AI engine torques performance, too. One team designed a deep-learning model for delivering super-resolution images and trained it nearly 16x faster by adding more resources to the job, shortening a month’s workload to a day.

“With the computing power of the DGX A100, its expanded GPU memory and faster InfiniBand networking, we expect to see even greater performance on larger datasets,” said Yoshiki Tanaka, who oversees HPC and distributed deep learning technologies for Sony’s developers.

Powering an AI Pipeline

Sony posted fast speeds in deep learning back in 2018, accelerating its Neural Network Libraries on a system at Japan’s National Institute of Advanced Industrial Science and Technology. And it’s already rolling out products powered with machine learning, such as its Airpeak drone for professional filmmakers shown at CES this year.

There’s plenty more to come.

“We will see good results in our fiscal 2021 because we have collaborations with many business teams who have started some good projects,” Kageyama said.

NVIDIA is putting its shoulder to the wheel with software and services to “build a culture of using GPUs,” he added.

For example, Sony developers use NGC, NVIDIA’s online container registry, for all the software components they need to get an AI app up and running.

Sony even created a container of its own, now available on NGC, sporting its Neural Network Libraries and other utilities. It supplements NVIDIA’s containers for work in popular environments like PyTorch and TensorFlow.

Drivers Give a Thumbs Up

Developers tell Kageyama’s team that having their code in one place helps simplify and speed their work.

Some researchers use the system for high performance computing, tapping into NVIDIA’s CUDA software that accelerates a diverse set of technical applications including AI.

To keep it all running smoothly, NVIDIA provided a job scheduler as well as additions for Sony to NVIDIA’s libraries for scaling apps across multiple GPUs.

“Good management software is important for achieving fairness and high utilization on such a complex system,” said Masahiro Hara, who leads development of the GAIA system.

An Eye Toward Analytics

NVIDIA also helped Sony create training programs on how to use its software on GAIA.

Looking ahead, Sony is interested in expanding its work in data analytics and simulations. It’s evaluating RAPIDS, open-source software NVIDIA helped design to let Python programmers access the power of GPUs for data science.

At the end of a work-from-home day keeping Sony ahead of the pack in AI, Kageyama enjoys playing with his kids who keep their dad on his digital toes. “I’m a beginner in Minecraft, and they’re much better than me,” he said.

The post An Engine of Innovation: Sony Levels Up for the AI Era appeared first on The Official NVIDIA Blog.

Read More