In its debut in the industry MLPerf benchmarks, NVIDIA Orin, a low-power system-on-chip based on the NVIDIA Ampere architecture, set new records in AI inference, raising the bar in per-accelerator performance at the edge.
Overall, NVIDIA with its partners continued to show the highest performance and broadest ecosystem for running all machine-learning workloads and scenarios in this fifth round of the industry metric for production AI.
In edge AI, a pre-production version of our NVIDIA Orin led in five of six performance tests. It ran up to 5x faster than our previous generation Jetson AGX Xavier, while delivering an average of 2x better energy efficiency.
NVIDIA Orin is available today in the NVIDIA Jetson AGX Orin developer kit for robotics and autonomous systems. More than 6,000 customers including Amazon Web Services, John Deere, Komatsu, Medtronic and Microsoft Azure use the NVIDIA Jetson platform for AI inference or other tasks.
It’s also a key component of our NVIDIA Hyperion platform for autonomous vehicles. China’s largest EV maker. BYD, is the latest automaker to announce it will use the Orin-based DRIVE Hyperion architecture for their next-generation automated EV fleets.
Servers and devices with NVIDIA GPUs including Jetson AGX Orin were the only edge accelerators to run all six MLPerf benchmarks.
With its JetPack SDK, Orin runs the full NVIDIA AI platform, a software stack already proven in the data center and the cloud. And it’s backed by a million developers using the NVIDIA Jetson platform.
NVIDIA and partners continue to show leading performance across all tests and scenarios in the latest MLPerf inference round.
The MLPerf benchmarks enjoy broad backing from organizations including Amazon, Arm, Baidu, Dell Technologies, Facebook, Google, Harvard, Intel, Lenovo, Microsoft, Stanford and the University of Toronto.
Most Partners, Submissions
The NVIDIA AI platform again attracted the largest number of MLPerf submissions from the broadest ecosystem of partners.
Azure followed up its solid December debut on MLPerf training tests with strong results in this round on AI inference, both using NVIDIA A100 Tensor Core GPUs. Azure’s ND96amsr_A100_v4 instance matched our highest performing eight-GPU submissions in nearly every inference test, demonstrating the power that’s readily available from the public cloud.
System makers ASUS and H3C made their MLPerf debut in this round with submissions using the NVIDIA AI platform. They joined returning system makers Dell Technologies, Fujitsu, GIGABYTE, Inspur, Lenovo, Nettrix, and Supermicro that submitted results on more than two dozen NVIDIA-Certified Systems.
Why MLPerf Matters
Our partners participate in MLPerf because they know it’s a valuable tool for customers evaluating AI platforms and vendors.
MLPerf’s diverse tests cover today’s most popular AI workloads and scenarios. That gives users confidence the benchmarks will reflect performance they can expect across the spectrum of their jobs.
Software Makes It Shine
All the software we used for our tests is available from the MLPerf repository.
Two key components that enabled our inference results — NVIDIA TensorRT for optimizing AI models and NVIDIA Triton Inference Server for deploying them efficiently — are available free on NGC, our catalog of GPU-optimized software.
Organizations around the world are embracing Triton, including cloud service providers such as Amazon and Microsoft.
We continuously fold all our optimizations into containers available on NGC. That way every user can get started putting AI into production with leading performance.
Square/Enix presents the fictional city of Midgar in Final Fantasy VII Remake at a filmic level of detail. Epic’s Fortnite bathes its environments in ray-traced sunlight, simulating how light bounces in the real world. And artists at Lucasfilm revolutionized virtual production techniques in The Mandalorian, using synchronized NVIDIA RTX GPUs to drive pixels on LED walls that act as photoreal backdrops.
In the eight years since Epic Games launched Unreal Engine 4, graphics has evolved at an unprecedented rate. UE4’s advances in world-building, animation, lighting and simulation enabled creators to bring to life environments only hinted at in the past.
In that same time, NVIDIA produced the optimal GPUs, libraries and APIs for supporting the new features the engine introduced. Tens of thousands of developers have enjoyed the benefits of pairing Unreal Engine with NVIDIA technology. That support continues with today’s debut of Unreal Engine 5.
Epic and NVIDIA: Building the Future of Graphics
From the launch of the GeForce GTX 680 in 2012 to the recent release of the RTX 30 Series, NVIDIA has supported UE4 developers in their quest to stay on the bleeding edge of technology.
At Game Developers Conference 2013, Epic showed off what Unreal Engine 4 could do on a single GTX 680 with their “Infiltrator” demo. It would be one of many times Unreal Engine and NVIDIA raised the bar.
In 2015, NVIDIA founder and CEO Jensen Huang appeared as a surprise guest at an Epic Games event to announce the GTX TITAN X. Onstage, Tim Sweeney was given the very first GTX TITAN X off the production line. It’s a moment in tech history that’s still discussed today.
At GDC 2018, the development community got their first look at real-time ray tracing running in UE4 with the reveal of “Reflections,” a Star Wars short video. The results were so convincing you’d have been forgiven for thinking the clip was pulled directly out of a J.J. Abrams movie.
Textured area lights, ray-traced area light shadows, reflections, and cinematic depth of field all combined to create a sequence that redefined what was possible with real-time graphics. It was shown on a NVIDIA DGX workstation powered by four Volta architecture GPUs.
Later in the year at GamesCom, that same demo was shown running on one consumer-grade GeForce RTX graphics card, thanks to the Turing architecture’s RT Cores, which greatly accelerate ray-tracing performance.
In 2019, Unreal Engine debuted a short called “Troll” (from Goodbye Kansas and Deep Forest Films), running on a GeForce RTX 2080 Ti. It showed what could be done with complex soft shadows and reflections. The short broke ground by rendering convincing human faces in real time, capturing a broad range of emotional states.
Epic and NVIDIA sponsored three installments in the DXR Spotlight Contest, which showed that even one-person teams could achieve remarkable results with DXR, Unreal Engine 4 and NVIDIA GeForce RTX.
One standout was “Attack from Outer Space,” a video demo developed solely by artist Christian Hecht.
Today, Epic debuts Unreal Engine 5. This launch introduces Nanite and Lumen, which enables developers to create games and apps that contain massive amounts of geometric detail with fully dynamic global illumination.
Nanite enables film-quality source art consisting of billions of polygons to be directly imported into Unreal Engine — all while maintaining a real-time frame rate and without sacrificing fidelity.
With Lumen, developers can create more dynamic scenes where indirect lighting adapts on the fly, such as changing the sun angle with the time of day, turning on a flashlight or opening an exterior door. Lumen removes the need for authoring lightmap UVs, waiting for lightmaps to bake or placing reflection captures, which results in crucial time savings in the development process.
NVIDIA is supporting Unreal Engine 5 with plugins for key technologies, including Deep Learning Super Sampling (DLSS), NVIDIA Reflex and RTX Global Illumination.
DLSS taps into the power of a deep learning neural network to boost frame rates and generate beautiful, sharp images. Reflex aligns CPU work to complete just in time for the GPU to start processing, minimizing latency and improving system responsiveness. RTX Global illumination computes multibounce indirect lighting without bake times, light leaks or expensive per-frame costs.
You can see DLSS and Reflex in action on Unreal Engine 5 by playing Epic’s Fortnite on an NVIDIA GeForce RTX-powered PC.
NVIDIA Ominiverse is the ideal companion to the next generation of Unreal Engine. The platform enables artists and developers to connect their 3D design tools for more collaborative workflows, build their own tools for 3D worlds, and use NVIDIA AI technologies. The Unreal Engine Connector enables creators and developers to achieve live-sync workflows between Omniverse and Unreal Engine. This connector will supercharge any game developer’s art pipeline.
A dozen companies today received NVIDIA’s highest award for partners, recognizing their impact on AI education and adoption across such industries as education, federal, healthcare and technology.
The winners of the 2021 NPN Americas Partner of the Year Awards have created a profound impact on AI by helping customers meet the demands of recommender systems, conversational AI applications, computer vision services and more.
“From systems to software, NVIDIA’s leadership in creating opportunities for its partner ecosystem is unmatched,” said Rob Enderle, president and principal analyst at the Enderle Group. “The winners of the 2021 NPN Awards reflect a diverse group of trusted technology providers who have cultivated deep expertise in NVIDIA-accelerated AI to serve their markets and industries.”
The past few years have brought new ways of working to every business. Companies have adopted new processes that apply AI to customer service, supply chain optimization, manufacturing, safety and more. NVIDIA’s accelerated computing platforms open new markets to create growth opportunities for our partner ecosystem.
The 2021 NPN award winners for the Americas are:
Cambridge Computer– awarded 2021 Americas Higher Education Partner of the Year for its continued focus on the higher-ed market, resulting in broad growth across platforms and NVIDIA DGX AI infrastructure solutions.
CDW Canada – awarded 2021 Canadian Partner of the Year for fostering extensive growth of AI in the Canadian market through strategic collaboration with NVIDIA and customers.
Colfax – awarded 2021 Americas Networking Partner of the Year for driving end-to-end NVIDIA AI solutions through a skilled team with robust resources, enabling the company to become a leader in the NVIDIA networking space across industries, including manufacturing, higher education, healthcare and life sciences.
Deloitte Consulting – awarded 2021 Americas Global Consulting Partner of the Year for building specialized practices around Omniverse Enterprise, NVIDIA Metropolis and new NVIDIA DGX-Ready Managed Services, plus adding the NVIDIA DGX POD to its Innovation Center.
Future Tech – awarded 2021 Americas Public Sector Partner of the Year for leading the federal government through the world’s largest AI transformation. Future Tech is the first company to bring Omniverse Enterprise real-time 3D design collaboration and simulation to federal customers, helping to improve their workflows in the physical world.
Insight Enterprises – awarded 2021 Americas Software Partner of the Year for the second year in a row, for broad collaboration with NVIDIA across AI, virtualization and simulation software, with leadership in making continued investment in NVIDIA technology with proof-of-concept labs, NVIDIA certifications, sales and technical training.
Lambda – awarded 2021 Americas Solution Integration Partner of the Year for the second consecutive year for its extensive expertise and commitment to providing the full NVIDIA portfolio with AI and deep learning hardware and software solutions across industries, including higher education and research, the federal and public sector, health and life sciences.
Mark III – awarded 2021 Americas Rising Star Partner of the Year – a new category added to recognize growing excellence in innovation, go-to-market strategies and growth in the AI business landscape. Mark III won for creatively setting the pace for NVIDIA partners as they guide clients toward architecting AI Centers of Excellence.
PNY – awarded 2021 Americas Distribution Partner of the Year for being a value-added partner and trusted advisor to the channel that has delivered NVIDIA’s accelerated computing platforms and software across the media and entertainment and healthcare industries, and many other vertical markets, as well as with cloud service providers.
Quantiphi – awarded 2021 Americas Service Delivery Partner of the Year for its diverse engineering services, application-first approach and commitment to solving customer problems using NVIDIA DGX and software development kits, positioning itself to capitalize on the rapidly growing field of data science enablement services.
World Wide Technology – awarded 2021 Americas AI Solution Provider of the Year for its leadership and commitment in driving adoption of the complete NVIDIA portfolio of AI and accelerated computing solutions, as well as continued investments in AI infrastructure for customer testing and labs in the WWT Advanced Technology Center.
World Wide Technology – also named 2021 Americas Healthcare Partner of the Year for expertise in driving NVIDIA AI solutions and accelerated computing to healthcare and life sciences organizations, demonstrating strong capabilities in end-to-end scalable AI solutions and professional development to support biopharma, genomics, medical imaging and more.
Congratulations to all of the 2021 NPN award winners in the Americas, and our thanks to all NVIDIA partners supporting customers worldwide as they work to integrate the transformative potential of AI into their businesses.
Editor’s note: This post is a part of our Meet the Omnivore series, which features individual creators and developers who use NVIDIA Omniverse to accelerate their 3D workflows and create virtual worlds.
Pekka Varis’s artistry has come a long way from his early days as a self-styled “punk activist” who spray painted during the “old school days of hip hop in Finland.”
More than 30 years later, he’s embracing the future of graphics as a videographer who uses NVIDIA Omniverse, a physically accurate 3D design collaboration and simulation platform, to create vibrant virtual worlds.
His freelance video production company Catchline translates complex technology services into attention-snagging, easily digestible video messages or commercials.
A recent project was an animated ad for Artio, an app that brings a global digital art gallery to individual smart devices.
In the commercial, a digital family searches for the perfect piece of art to enliven their new home, visualizing options on the walls in real time. Plus, the parents unpack boxes filled with detailed items, a child plays with block toys, the dog shakes its head and wags its tail — all in a physically accurate way, thanks to Omniverse.
“NVIDIA Omniverse has given me more artistic power and a totally new level of freedom,” Varis said. “It’s a trusty playground where I can bring in quality content and focus on creating designs, visuals, variations — all that fun stuff.”
Vivifying the Videography
The Helsinki-based videographer takes viewers through his creative process in a three-part tutorial that details his recent commercial project, in which Artio helps the digital family make their blank walls pop.
Watch the first installment of the tutorial series on demand and below:
His multi-app workflow — bringing characters to life with Reallusion software and perfecting the lighting with Lightmap HDR Light Studio — is made possible by Omniverse Connectors, plugins that connect third-party design apps with Omniverse, and NVIDIA Studio drivers.
He estimates a 90 percent savings in time and cost using Omniverse. “With Omniverse Connectors to Reallusion software like Character Creator and iClone, I can make commercial-quality characters quickly and easily, which I wouldn’t have even dreamed about before,” he said.
“I see myself using Omniverse with every single work commission I get moving forward,” Varis said. “And I hope my video tutorials inspire viewers to try using these superb tools to create their own artwork.”
When not creating videos, Varis makes music as a drummer who loves metal, rock and hip hop — and spends time with his six-year-old daughter, his source of inspiration.
Learn more about Omniverse by watching GTC sessions on demand — featuring visionaries from the Omniverse team, Adobe, Autodesk, Epic Games, Pixar, Unity and Walt Disney Studios.
In addition to GFN Thursday, it’s National Tater Day. Hooray!
To honor the spud-tacular holiday, we’re closing out March with seven new games streaming this week. And a loaded 20+ titles are coming to the GeForce NOW library in April to play — even on a potato PC, thanks to GeForce NOW.
Plus, the GeForce NOW app is available on Chromebook. Get the app today to instantly transform Chromebooks into gaming rigs capable of playing 1,000+ PC titles with and against millions of other players — without waiting for downloads, installs, patches or updates.
GeForce NOW is Your Ride or Fry
At its roots, GeForce NOW is about playing great games. The final seven titles of March are ready to stream today. Plus, keep your eyes peeled for what’s coming to the cloud in April with 20 games revealed today and some nice surprises to be announced throughout the month.
On top of the 27 titles announced in March, an extra eight ended up coming to the cloud. Check out all the additional games that were added last month:
The Legend of Heroes: Trails of Cold Steel II (Steam)
Finally, GeForce NOW is growing. GeForce NOW Powered by ABYA Free and Priority plans are available again, but only for a limited time and while supplies last in Brazil, Argentina, Uruguay, Paraguay and Chile. Access local servers for lightning fast gameplay with the same legendary GeForce NOW experience.
We have an extra fresh challenge for you this week. Let us know your answer on Twitter or in the comments below.
write a love letter to your potato PC/device in one sentence
we’ll start: “to my ride or fry, we’ll never say goodbye “
Four words: smart, sustainable, Super Bowl. Polestar’s commercial during the big game made it clear no-compromise electric vehicles are now mainstream.
Polestar Chief Operating Officer Dennis Nobelius sees driving enjoyment and autonomous-driving capabilities complementing one another in sustainable vehicles that keep driving — and the driver — front and center.
NVIDIA’s Katie Washabaugh spoke with Nobelius for the latest episode of the AI Podcast about the role the performance brand will play as vehicles become greener and more autonomous.
Nobelius touched on the sustainable automaker’s plans to unveil its third vehicle, the Polestar 3, the tech inside it, and what the company’s racing heritage brings to the intersection of smarts and sustainability.
Robots aren’t limited to the assembly line. Liila Torabi, senior product manager for Isaac Sim, a robotics and AI simulation platform powered by NVIDIA Omniverse, talks about where the field’s headed.
Humans playing games against machines is nothing new, but now computers can develop their own games for people to play. Programming enthusiast and social media influencer Harrison Kinsley created GANTheftAuto, an AI-based neural network that generates a playable chunk of the classic video game Grand Theft Auto V.
The neural networks powering autonomous vehicles require petabytes of driving data to learn how to operate. Nikita Jaipuria and Rohan Bhasin from Ford Motor Company explain how they use generative adversarial networks (GANs) to fill in the gaps of real-world data used in AV training.
Subscribe to the AI Podcast: Now available on Amazon Music
“I am a visionary,” says an AI, kicking off the latest installment of NVIDIA’s I AM AI video series.
Launched in 2017, I AM AI has become the iconic opening for GTC keynote addresses by NVIDIA founder and CEO Jensen Huang. Each video, with its AI-created narration and soundtrack, documents the newest advances in artificial intelligence and their impact on the world.
The latest, which debuted at GTC last week, showcases how NVIDIA technologies enable AI to take on complex tasks in the world’s most challenging environments, from farms and traffic intersections to museums and research labs.
Here’s a sampling of the groundbreaking AI innovations featured in the video.
Accuray Radiotherapy System Treats Lung Tumors
Lung tumors can move as much as two inches with every breath — making it difficult to protect healthy lung tissue while targeting the tumor for treatment.
Bay Area-based radiation therapy company Accuray offers Radixact, an AI-powered system that uses motion-tracking capabilities to follow a tumor’s movement and deliver treatment with sub-millimeter accuracy.
The system’s respiratory motion synchronization feature, which works in real time, matches treatment to the natural rhythm of patients’ breathing cycles, allowing them to breathe as normal during the process.
Radixact, which can take precise imagery of the tumor from any angle, is powered by NVIDIA RTX GPUs.
ANYmal Robots Learn to Walk on Their Own
The Robotic Systems Lab, at ETH Zurich, in collaboration with Swiss-Mile, is embracing the future of robotic mobility.
The Swiss research lab fitted the four-legged robot ANYmal with wheels so that it can learn to stand, walk and drive — all on its own and in a matter of minutes.
Built on the NVIDIA Jetson edge AI platform and trained with Isaac Gym, the robot’s combination of legs and wheels enables it to carry tools and overcome obstacles like steps or stairs. Its AI-powered cameras and processing of laser scanning data allow it to perceive and create maps of its environment — indoors or outdoors.
The robot can help with delivery services, search-and-rescue missions, industrial inspection and more.
Sanctuary AI Robots Give a Helping Hand
Canadian startup Sanctuary AI aims “to create the world’s first human-like intelligence in general-purpose robots to help people work more safely, efficiently and sustainably.”
Built using NVIDIA Isaac Sim, Sanctuary AI’s general-purpose robots are highly dexterous — that is, great with their hands. They use their human-like fingers for a myriad of complex, precision tasks like opening Ziploc bags, handling pills or using almost any hand tool designed for a person.
The robots’ built-in cognitive architecture enables them to observe, assess and act on any task humans might need help with. Sanctuary AI aims to one day see its technology help with construction on the moon.
Sanctuary AI is a member of NVIDIA Inception, a program designed to nurture cutting-edge startups. Every member receives a custom set of ongoing benefits, such as NVIDIA Deep Learning Institute credits, opportunities to connect with investors, awareness support and technology assistance.
Scopio Accelerates Blood Cell Analysis
Another NVIDIA Inception member, Scopio, uses NVIDIA RTX GPUs to perform real-time, super-resolution analysis of blood, searching for hidden threats in every cell.
The company is transforming cell morphology with its microscopy scanning devices and Full-Field Peripheral Blood Smear application, which for the first time gives hematology labs and clinicians access to full-field scans of blood, with all cells imaged at 100x resolution.
The application runs Scopio’s machine learning algorithms to detect, classify and quantify blood cells — and help flag abnormalities, which are automatically documented in a digital report. This enhances workflow efficiency for labs and clinicians by more than 60 percent.
To learn more about the latest AI innovations, watch NVIDIA founder and CEO Jensen Huang’s GTC keynote address in replay:
When Tanish Tyagi published his first research paper a year ago on deep learning to detect dementia, it started a family-driven pursuit.
Great-grandparents in his family had suffered from Parkinson’s, a genetic disease that affects more than 10 million people worldwide. So the now 16-year-old turned to that next, together with his sister, Riya, 14.
The siblings, from Short Hills, New Jersey, published a research paper in the fall about using machine learning to detect Parkinson’s disease by focusing on micrographia, a handwriting disorder that’s a marker for Parkinson’s.
They aim to make a model widely accessible so that early detection is possible for people around the world with limited access to clinics.
“Can we make some real change, can we not only impact our own family, but also see what’s out there and explore what we can do about something that might be a part of our lives in the future?” said Riya.
The Tyagis, who did the research over their summer break, attend prestigious U.S. boarding school Phillips Exeter Academy, alma mater to Mark Zuckerberg, Nobel Prize winners and one U.S. president.
When they aren’t busy with school or extracurricular research, they might be found pitching their STEM skills-focused board game (pictured above), available to purchase through Kickstarter.
Spotting Micrographia for Signs
Tanish decided to pursue research on Parkinson’s in February 2021, when he was just 15. He had recently learned about micrographia, a handwriting disorder that is a common symptom of Parkinson’s.
Micrographia in handwriting shows up as small text and exhibits tremors, involuntary muscle contractions and slow movement in the hands.
Not long after, Tanish heard a talk by Penn State University researchers Ming Wang and Lijun Zhang on Parkinson’s. So he sought their guidance on pursuing it for detection, and they agreed to supervise the project. Wang is also working with labs at Massachusetts General Hospital in connection with this research.
“Tanish and Riya’s work aims to enhance prediction of Micrographia by performing secondary analysis of public handwriting images and adopting state-of-art machine learning methods. The findings could help patients receive early diagnosis and treatment for better healthcare outcomes”, said Dr. Zhang, Associate Professor from Institute for Personalized Medicine at Penn State University.
In their paper, the Tyagis used NVIDIA GPU-driven machine learning for feature extraction of micrographia characteristics. Their dataset included open-source images of drawing exams from 53 healthy people and 105 Parkinson’s patients. They extracted several features from these images that allowed them to analyze tremors in writing.
“These are features that we had identified from different papers, and that we saw others had had success with,” said Riya.
With a larger and more balanced dataset, their high prediction accuracy of about 93 percent could get even better, said Tanish.
Developing a CNN for Diagnosis
Tanish had previously used his lab’s NVIDIA GeForce RTX 3080 GPU on a natural language processing project for dementia research. But neither sibling had much experience with computer vision before they began the Parkinson’s project.
“We’re working on processing the image from a user by feeding it into the model and then returning comprehensive results so that the user can really understand the diagnosis that the model is making,” Tanish said.
But first the Tyagis said they would like to increase the size of their dataset to improve the model’s accuracy. Their aim is to develop the model further and build a website. They want Parkinson’s detection to be so easy that people can fill out a handwriting assessment form and submit it for detection.
“It could be deployed to the general public and used in clinical settings, and that would be just amazing,” said Tanish.
If you want to ride the next big wave in AI, grab a transformer.
They’re not the shape-shifting toy robots on TV or the trash-can-sized tubs on telephone poles.
So, What’s a Transformer Model?
A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence.
Transformer models apply an evolving set of mathematical techniques, called attention or self-attention, to detect subtle ways even distant data elements in a series influence and depend on each other.
First described in a 2017 paper from Google, transformers are among the newest and one of the most powerful classes of models invented to date. They’re driving a wave of advances in machine learning some have dubbed transformer AI.
Stanford researchers called transformers “foundation models” in an August 2021 paper because they see them driving a paradigm shift in AI. The “sheer scale and scope of foundation models over the last few years have stretched our imagination of what is possible,” they wrote.
What Can Transformer Models Do?
Transformers are translating text and speech in near real-time, opening meetings and classrooms to diverse and hearing-impaired attendees.
They’re helping researchers understand the chains of genes in DNA and amino acids in proteins in ways that can speed drug design.
Transformers can detect trends and anomalies to prevent fraud, streamline manufacturing, make online recommendations or improve healthcare.
People use transformers every time they search on Google or Microsoft Bing.
The Virtuous Cycle of Transformer AI
Any application using sequential text, image or video data is a candidate for transformer models.
That enables these models to ride a virtuous cycle in transformer AI. Created with large datasets, transformers make accurate predictions that drive their wider use, generating more data that can be used to create even better models.
“Transformers made self-supervised learning possible, and AI jumped to warp speed,” said NVIDIA founder and CEO Jensen Huang in his keynote address this week at GTC.
Transformers Replace CNNs, RNNs
Transformers are in many cases replacing convolutional and recurrent neural networks (CNNs and RNNs), the most popular types of deep learning models just five years ago.
Indeed, 70 percent of arXiv papers on AI posted in the last two years mention transformers. That’s a radical shift from a 2017 IEEE study that reported RNNs and CNNs were the most popular models for pattern recognition.
No Labels, More Performance
Before transformers arrived, users had to train neural networks with large, labeled datasets that were costly and time-consuming to produce. By finding patterns between elements mathematically, transformers eliminate that need, making available the trillions of images and petabytes of text data on the web and in corporate databases.
In addition, the math that transformers use lends itself to parallel processing, so these models can run fast.
Transformers now dominate popular performance leaderboards like SuperGLUE, a benchmark developed in 2019 for language-processing systems.
How Transformers Pay Attention
Like most neural networks, transformer models are basically large encoder/decoder blocks that process data.
Small but strategic additions to these blocks (shown in the diagram below) make transformers uniquely powerful.
Transformers use positional encoders to tag data elements coming in and out of the network. Attention units follow these tags, calculating a kind of algebraic map of how each element relates to the others.
Attention queries are typically executed in parallel by calculating a matrix of equations in what’s called multi-headed attention.
With these tools, computers can see the same patterns humans see.
Self-Attention Finds Meaning
For example, in the sentence:
She poured water from the pitcher to the cup until it was full.
We know “it” refers to the cup, while in the sentence:
She poured water from the pitcher to the cup until it was empty.
We know “it” refers to the pitcher.
“Meaning is a result of relationships between things, and self-attention is a general way of learning relationships,” said Ashish Vaswani, a former senior staff research scientist at Google Brain who led work on the seminal 2017 paper.
“Machine translation was a good vehicle to validate self-attention because you needed short- and long-distance relationships among words,” said Vaswani.
“Now we see self-attention is a powerful, flexible tool for learning,” he added.
How Transformers Got Their Name
Attention is so key to transformers the Google researchers almost used the term as the name for their 2017 model. Almost.
“Attention Net didn’t sound very exciting,” said Vaswani, who started working with neural nets in 2011.
.Jakob Uszkoreit, a senior software engineer on the team, came up with the name Transformer.
“I argued we were transforming representations, but that was just playing semantics,” Vaswani said.
The Birth of Transformers
In the paper for the 2017 NeurIPS conference, the Google team described their transformer and the accuracy records it set for machine translation.
Thanks to a basket of techniques, they trained their model in just 3.5 days on eight NVIDIA GPUs, a small fraction of the time and cost of training prior models. They trained it on datasets with up to a billion pairs of words.
“It was an intense three-month sprint to the paper submission date,” recalled Aidan Gomez, a Google intern in 2017 who contributed to the work.
“The night we were submitting, Ashish and I pulled an all-nighter at Google,” he said. “I caught a couple hours sleep in one of the small conference rooms, and I woke up just in time for the submission when someone coming in early to work opened the door and hit my head.”
It was a wakeup call in more ways than one.
“Ashish told me that night he was convinced this was going to be a huge deal, something game changing. I wasn’t convinced, I thought it would be a modest gain on a benchmark, but it turned out he was very right,” said Gomez, now CEO of startup Cohere that’s providing a language processing service based on transformers.
A Moment for Machine Learning
Vaswani recalls the excitement of seeing the results surpass similar work published by a Facebook team using CNNs.
“I could see this would likely be an important moment in machine learning,” he said.
A year later, another Google team tried processing text sequences both forward and backward with a transformer. That helped capture more relationships among words, improving the model’s ability to understand the meaning of a sentence.
Their Bidirectional Encoder Representations from Transformers (BERT) model set 11 new records and became part of the algorithm behind Google search.
Within weeks, researchers around the world were adapting BERT for use cases across many languages and industries “because text is one of the most common data types companies have,” said Anders Arpteg, a 20-year veteran of machine learning research.
Putting Transformers to Work
Soon transformer models were being adapted for science and healthcare.
DeepMind, in London, advanced the understanding of proteins, the building blocks of life, using a transformer called AlphaFold2, described in a recent Nature article. It processed amino acid chains like text strings to set a new watermark for describing how proteins fold, work that could speed drug discovery.
AstraZeneca and NVIDIA developed MegaMolBART, a transformer tailored for drug discovery. It’s a version of pharmaceutical company’s MolBART transformer, trained on a large, unlabeled database of chemical compounds using the NVIDIA Megatron framework for building large-scale transformer models.
Reading Molecules, Medical Records
“Just as AI language models can learn the relationships between words in a sentence, our aim is that neural networks trained on molecular structure data will be able to learn the relationships between atoms in real-world molecules,” said Ola Engkvist, head of molecular AI, discovery sciences and R&D at AstraZeneca, when the work was announced last year.
Separately, the University of Florida’s academic health center collaborated with NVIDIA researchers to create GatorTron. The transformer model aims to extract insights from massive volumes of clinical data to accelerate medical research.
Transformers Grow Up
Along the way, researchers found larger transformers performed better.
For example, researchers from the Rostlab at the Technical University of Munich, which helped pioneer work at the intersection of AI and biology, used natural-language processing to understand proteins. In 18 months, they graduated from using RNNs with 90 million parameters to transformer models with 567 million parameters.
The OpenAI lab showed bigger is better with its Generative Pretrained Transformer (GPT). The latest version, GPT-3, has 175 billion parameters, up from 1.5 billion for GPT-2.
With the extra heft, GPT-3 can respond to a user’s query even on tasks it was not specifically trained to handle. It’s already being used by companies including Cisco, IBM and Salesforce.
Tale of a Mega Transformer
NVIDIA and Microsoft hit a high watermark in November, announcing the Megatron-Turing Natural Language Generation model (MT-NLG) with 530 billion parameters. It debuted along with a new framework, NVIDIA NeMo Megatron, that aims to let any business create its own billion- or trillion-parameter transformers to power custom chatbots, personal assistants and other AI applications that understand language.
MT-NLG had its public debut as the brain for TJ, the Toy Jensen avatar that gave part of the keynote at NVIDIA’s November 2021 GTC.
“When we saw TJ answer questions — the power of our work demonstrated by our CEO — that was exciting,” said Mostofa Patwary, who led the NVIDIA team that trained the model.
Creating such models is not for the faint of heart. MT-NLG was trained using hundreds of billions of data elements, a process that required thousands of GPUs running for weeks.
“Training large transformer models is expensive and time-consuming, so if you’re not successful the first or second time, projects might be canceled,” said Patwary.
Trillion-Parameter Transformers
Today, many AI engineers are working on trillion-parameter transformers and applications for them.
“We’re constantly exploring how these big models can deliver better applications. We also investigate in what aspects they fail, so we can build even better and bigger ones,” Patwary said.
To provide the computing muscle those models need, our latest accelerator — the NVIDIA H100 Tensor Core GPU — packs a Transformer Engine and supports a new FP8 format. That speeds training while preserving accuracy.
With those and other advances, “transformer model training can be reduced from weeks to days” said Huang at GTC.
MoE Means More for Transformers
Last year, Google researchers described the Switch Transformer, one of the first trillion-parameter models. It uses AI sparsity, a complex mixture-of experts (MoE) architecture and other advances to drive performance gains in language processing and up to 7x increases in pre-training speed.
Now some researchers aim to develop simpler transformers with fewer parameters that deliver performance similar to the largest models.
“I see promise in retrieval-based models that I’m super excited about because they could bend the curve,” said Gomez, of Cohere, noting the Retro model from DeepMind as an example.
Retrieval-based models learn by submitting queries to a database. “It’s cool because you can be choosy about what you put in that knowledge base,” he said.
The ultimate goal is to “make these models learn like humans do from context in the real world with very little data,” said Vaswani, now co-founder of a stealth AI startup.
He imagines future models that do more computation upfront so they need less data and sport better ways users can give them feedback.
“Our goal is to build models that will help people in their everyday lives,” he said of his new venture.
Safe, Responsible Models
Other researchers are studying ways to eliminate bias or toxicity if models amplify wrong or harmful language. For example, Stanford created the Center for Research on Foundation Models to explore these issues.
“These are important problems that need to be solved for safe deployment of models,” said Shrimai Prabhumoye, a research scientist at NVIDIA who’s among many across the industry working in the area.
“Today, most models look for certain words or phrases, but in real life these issues may come out subtly, so we have to consider the whole context,” added Prabhumoye.
“That’s a primary concern for Cohere, too,” said Gomez. “No one is going to use these models if they hurt people, so it’s table stakes to make the safest and most responsible models.”
Beyond the Horizon
Vaswani imagines a future where self-learning, attention-powered transformers approach the holy grail of AI.
“We have a chance of achieving some of the goals people talked about when they coined the term ‘general artificial intelligence’ and I find that north star very inspiring,” he said.
“We are in a time where simple methods like neural networks are giving us an explosion of new capabilities.”
When the first instant photo was taken 75 years ago with a Polaroid camera, it was groundbreaking to rapidly capture the 3D world in a realistic 2D image. Today, AI researchers are working on the opposite: turning a collection of still images into a digital 3D scene in a matter of seconds.
Known as inverse rendering, the process uses AI to approximate how light behaves in the real world, enabling researchers to reconstruct a 3D scene from a handful of 2D images taken at different angles. The NVIDIA Research team has developed an approach that accomplishes this task almost instantly — making it one of the first models of its kind to combine ultra-fast neural network training and rapid rendering.
NVIDIA applied this approach to a popular new technology called neural radiance fields, or NeRF. The result, dubbed Instant NeRF, is the fastest NeRF technique to date, achieving more than 1,000x speedups in some cases. The model requires just seconds to train on a few dozen still photos — plus data on the camera angles they were taken from — and can then render the resulting 3D scene within tens of milliseconds.
“If traditional 3D representations like polygonal meshes are akin to vector images, NeRFs are like bitmap images: they densely capture the way light radiates from an object or within a scene,” says David Luebke, vice president for graphics research at NVIDIA. “In that sense, Instant NeRF could be as important to 3D as digital cameras and JPEG compression have been to 2D photography — vastly increasing the speed, ease and reach of 3D capture and sharing.”
Showcased in a session at NVIDIA GTC this week, Instant NeRF could be used to create avatars or scenes for virtual worlds, to capture video conference participants and their environments in 3D, or to reconstruct scenes for 3D digital maps.
In a tribute to the early days of Polaroid images, NVIDIA Research recreated an iconic photo of Andy Warhol taking an instant photo, turning it into a 3D scene using Instant NeRF.
What Is a NeRF?
NeRFs use neural networks to represent and render realistic 3D scenes based on an input collection of 2D images.
Collecting data to feed a NeRF is a bit like being a red carpet photographer trying to capture a celebrity’s outfit from every angle — the neural network requires a few dozen images taken from multiple positions around the scene, as well as the camera position of each of those shots.
In a scene that includes people or other moving elements, the quicker these shots are captured, the better. If there’s too much motion during the 2D image capture process, the AI-generated 3D scene will be blurry.
From there, a NeRF essentially fills in the blanks, training a small neural network to reconstruct the scene by predicting the color of light radiating in any direction, from any point in 3D space. The technique can even work around occlusions — when objects seen in some images are blocked by obstructions such as pillars in other images.
Accelerating 1,000x With Instant NeRF
While estimating the depth and appearance of an object based on a partial view is a natural skill for humans, it’s a demanding task for AI.
Creating a 3D scene with traditional methods takes hours or longer, depending on the complexity and resolution of the visualization. Bringing AI into the picture speeds things up. Early NeRF models rendered crisp scenes without artifacts in a few minutes, but still took hours to train.
Instant NeRF, however, cuts rendering time by several orders of magnitude. It relies on a technique developed by NVIDIA called multi-resolution hash grid encoding, which is optimized to run efficiently on NVIDIA GPUs. Using a new input encoding method, researchers can achieve high-quality results using a tiny neural network that runs rapidly.
The model was developed using the NVIDIA CUDA Toolkit and the Tiny CUDA Neural Networks library. Since it’s a lightweight neural network, it can be trained and run on a single NVIDIA GPU — running fastest on cards with NVIDIA Tensor Cores.
The technology could be used to train robots and self-driving cars to understand the size and shape of real-world objects by capturing 2D images or video footage of them. It could also be used in architecture and entertainment to rapidly generate digital representations of real environments that creators can modify and build on.
Beyond NeRFs, NVIDIA researchers are exploring how this input encoding technique might be used to accelerate multiple AI challenges including reinforcement learning, language translation and general-purpose deep learning algorithms.