NVIDIA Research Wins CVPR Autonomous Grand Challenge for End-to-End Driving

NVIDIA Research Wins CVPR Autonomous Grand Challenge for End-to-End Driving

Making moves to accelerate self-driving car development, NVIDIA was today named an Autonomous Grand Challenge winner at the Computer Vision and Pattern Recognition (CVPR) conference, running this week in Seattle.

Building on last year’s win in 3D Occupancy Prediction, NVIDIA Research topped the leaderboard this year in the End-to-End Driving at Scale category with its Hydra-MDP model, outperforming more than 400 entries worldwide.

This milestone shows the importance of generative AI in building applications for physical AI deployments in autonomous vehicle (AV) development. The technology can also be applied to industrial environments, healthcare, robotics and other areas.

The winning submission received CVPR’s Innovation Award as well, recognizing NVIDIA’s approach to improving “any end-to-end driving model using learned open-loop proxy metrics.”

In addition, NVIDIA announced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines of every kind.

How End-to-End Driving Works

The race to develop self-driving cars isn’t a sprint but more a never-ending triathlon, with three distinct yet crucial parts operating simultaneously: AI training, simulation and autonomous driving. Each requires its own accelerated computing platform, and together, the full-stack systems purpose-built for these steps form a powerful triad that enables continuous development cycles, always improving in performance and safety.

To accomplish this, a model is first trained on an AI supercomputer such as NVIDIA DGX. It’s then tested and validated in simulation — using the NVIDIA Omniverse platform and running on an NVIDIA OVX system — before entering the vehicle, where, lastly, the NVIDIA DRIVE AGX platform processes sensor data through the model in real time.

Building an autonomous system to navigate safely in the complex physical world is extremely challenging. The system needs to perceive and understand its surrounding environment holistically, then make correct, safe decisions in a fraction of a second. This requires human-like situational awareness to handle potentially dangerous or rare scenarios.

AV software development has traditionally been based on a modular approach, with separate components for object detection and tracking, trajectory prediction, and path planning and control.

End-to-end autonomous driving systems streamline this process using a unified model to take in sensor input and produce vehicle trajectories, helping avoid overcomplicated pipelines and providing a more holistic, data-driven approach to handle real-world scenarios.

Watch a video about the Hydra-MDP model, winner of the CVPR Autonomous Grand Challenge for End-to-End Driving:

Navigating the Grand Challenge 

This year’s CVPR challenge asked participants to develop an end-to-end AV model, trained using the nuPlan dataset, to generate driving trajectory based on sensor data.

The models were submitted for testing inside the open-source NAVSIM simulator and were tasked with navigating thousands of scenarios they hadn’t experienced yet. Model performance was scored based on metrics for safety, passenger comfort and deviation from the original recorded trajectory.

NVIDIA Research’s winning end-to-end model ingests camera and lidar data, as well as the vehicle’s trajectory history, to generate a safe, optimal vehicle path for five seconds post-sensor input.

The workflow NVIDIA researchers used to win the competition can be replicated in high-fidelity simulated environments with NVIDIA Omniverse. This means AV simulation developers can recreate the workflow in a physically accurate environment before testing their AVs in the real world. NVIDIA Omniverse Cloud Sensor RTX microservices will be available later this year. Sign up for early access.

In addition, NVIDIA ranked second for its submission to the CVPR Autonomous Grand Challenge for Driving with Language. NVIDIA’s approach connects vision language models and autonomous driving systems, integrating the power of large language models to help make decisions and achieve generalizable, explainable driving behavior.

Learn More at CVPR 

More than 50 NVIDIA papers were accepted to this year’s CVPR, on topics spanning automotive, healthcare, robotics and more. Over a dozen papers will cover NVIDIA automotive-related research, including:

Sanja Fidler, vice president of AI research at NVIDIA, will speak on vision language models at the CVPR Workshop on Autonomous Driving.

Learn more about NVIDIA Research, a global team of hundreds of scientists and engineers focused on topics including AI, computer graphics, computer vision, self-driving cars and robotics.

Read More

NVIDIA Advances Physical AI at CVPR With Largest Indoor Synthetic Dataset

NVIDIA Advances Physical AI at CVPR With Largest Indoor Synthetic Dataset

NVIDIA contributed the largest ever indoor synthetic dataset to the Computer Vision and Pattern Recognition (CVPR) conference’s annual AI City Challenge — helping researchers and developers advance the development of solutions for smart cities and industrial automation.

The challenge, garnering over 700 teams from nearly 50 countries, tasks participants to develop AI models to enhance operational efficiency in physical settings, such as retail and warehouse environments, and intelligent traffic systems.

Teams tested their models on the datasets that were generated using NVIDIA Omniverse, a platform of application programming interfaces (APIs), software development kits (SDKs) and services that enable developers to build Universal Scene Description (OpenUSD)-based applications and workflows.

Creating and Simulating Digital Twins for Large Spaces

In large indoor spaces like factories and warehouses, daily activities involve a steady stream of people, small vehicles and future autonomous robots. Developers need solutions that can observe and measure activities, optimize operational efficiency, and prioritize human safety in complex, large-scale settings.

Researchers are addressing that need with computer vision models that can perceive and understand the physical world. It can be used in applications like multi-camera tracking, in which a model tracks multiple entities within a given environment.

To ensure their accuracy, the models must be trained on large, ground-truth datasets for a variety of real-world scenarios. But collecting that data can be a challenging, time-consuming and costly process.

AI researchers are turning to physically based simulations — such as digital twins of the physical world — to enhance AI simulation and training. These virtual environments can help generate synthetic data used to train AI models. Simulation also provides a way to run a multitude of “what-if” scenarios in a safe environment while addressing privacy and AI bias issues.

Creating synthetic data is important for AI training because it offers a large, scalable, and expandable amount of data. Teams can generate a diverse set of training data by changing many parameters including lighting, object locations, textures and colors.

Building Synthetic Datasets for the AI City Challenge

This year’s AI City Challenge consists of five computer vision challenge tracks that span traffic management to worker safety.

NVIDIA contributed datasets for the first track, Multi-Camera Person Tracking, which saw the highest participation, with over 400 teams. The challenge used a benchmark and the largest synthetic dataset of its kind — comprising 212 hours of 1080p videos at 30 frames per second spanning 90 scenes across six virtual environments, including a warehouse, retail store and hospital.

Created in Omniverse, these scenes simulated nearly 1,000 cameras and featured around 2,500 digital human characters. It also provided a way for the researchers to generate data of the right size and fidelity to achieve the desired outcomes.

The benchmarks were created using Omniverse Replicator in NVIDIA Isaac Sim, a reference application that enables developers to design, simulate and train AI for robots, smart spaces or autonomous machines in physically based virtual environments built on NVIDIA Omniverse.

Omniverse Replicator, an SDK for building synthetic data generation pipelines, automated many manual tasks involved in generating quality synthetic data, including domain randomization, camera placement and calibration, character movement, and semantic labeling of data and ground-truth for benchmarking.

Ten institutions and organizations are collaborating with NVIDIA for the AI City Challenge:

  • Australian National University, Australia
  • Emirates Center for Mobility Research, UAE
  • Indian Institute of Technology Kanpur, India
  • Iowa State University, U.S.
  • Johns Hopkins University, U.S.
  • National Yung-Ming Chiao-Tung University, Taiwan
  • Santa Clara University, U.S.
  • The United Arab Emirates University, UAE
  • University at Albany – SUNY, U.S.
  • Woven by Toyota, Japan

Driving the Future of Generative Physical AI 

Researchers and companies around the world are developing infrastructure automation and robots powered by physical AI — which are models that can understand instructions and autonomously perform complex tasks in the real world.

Generative physical AI uses reinforcement learning in simulated environments, where it perceives the world using accurately simulated sensors, performs actions grounded by laws of physics, and receives feedback to reason about the next set of actions.

Developers can tap into developer SDKs and APIs, such as the NVIDIA Metropolis developer stack — which includes a multi-camera tracking reference workflow — to add enhanced perception capabilities for factories, warehouses and retail operations. And with the latest release of NVIDIA Isaac Sim, developers can supercharge robotics workflows by simulating and training AI-based robots in physically based virtual spaces before real-world deployment.

Researchers and developers are also combining high-fidelity, physics-based simulation with advanced AI to bridge the gap between simulated training and real-world application. This helps ensure that synthetic training environments closely mimic real-world conditions for more seamless robot deployment.

NVIDIA is taking the accuracy and scale of simulations further with the recently announced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines.

This technology will allow autonomous systems, whether a factory, vehicle or robot, to gather essential data to effectively perceive, navigate and interact with the real world. Using these microservices, developers can run large-scale tests on sensor perception within realistic, virtual environments, significantly reducing the time and cost associated with real-world testing.

Omniverse Cloud Sensor RTX microservices will be available later this year. Sign up for early access.

Showcasing Advanced AI With Research

Participants submitted research papers for the AI City Challenge and a few achieved top rankings, including:

All accepted papers will be presented at the AI City Challenge 2024 Workshop, taking place on June 17.

At CVPR 2024, NVIDIA Research will present over 50 papers, introducing generative physical AI breakthroughs with potential applications in areas like autonomous vehicle development and robotics.

Papers that used NVIDIA Omniverse to generate synthetic data or digital twins of environments for model simulation, testing and validation include:

Read more about NVIDIA Research at CVPR, and learn more about the AI City Challenge.

Get started with NVIDIA Omniverse by downloading the standard license free, access OpenUSD resources and learn how Omniverse Enterprise can connect teams. Follow Omniverse on Instagram, Medium, LinkedIn and X. For more, join the Omniverse community on the forums, Discord server, Twitch and YouTube channels.

Read More

‘Believe in Something Unconventional, Something Unexplored,’ NVIDIA CEO Tells Caltech Grads

‘Believe in Something Unconventional, Something Unexplored,’ NVIDIA CEO Tells Caltech Grads

NVIDIA founder and CEO Jensen Huang on Friday encouraged Caltech graduates to pursue their craft with dedication and resilience — and to view setbacks as new opportunities.

“I hope you believe in something. Something unconventional, something unexplored. But let it be informed, and let it be reasoned, and dedicate yourself to making that happen,” he said. “You may find your GPU. You may find your CUDA. You may find your generative AI. You may find your NVIDIA.”

Trading his signature leather jacket for black and yellow academic regalia, Huang addressed the nearly 600 graduates at their commencement ceremony in Pasadena, Calif., starting with the tale of the computing industry’s decades-long evolution to reach this pivotal moment of AI transformation.

“Computers today are the single most important instrument of knowledge, and it’s foundational to every single industry in every field of science,” Huang said. “As you enter industry, it’s important you know what’s happening.”

He shared how, over a decade ago, NVIDIA — a small company at the time — bet on deep learning, investing billions of dollars and years of engineering resources to reinvent every computing layer.

“No one knew how far deep learning could scale, and if we didn’t build it, we’d never know,” Huang said. Referencing the famous line from Field of Dreams — if you build it, he will come — he said, “Our logic is: If we don’t build it, they can’t come.”

Looking to the future, Huang said, the next wave of AI is robotics, a field where NVIDIA’s journey resulted from a series of setbacks.

He reflected on a period in NVIDIA’s past when the company each year built new products that “would be incredibly successful, generate enormous amounts of excitement. And then one year later, we were kicked out of those markets.”

These roadblocks pushed NVIDIA to seek out untapped areas — what Huang refers to as “zero-billion-dollar markets.”

“With no more markets to turn to, we decided to build something where we are sure there are no customers,” Huang said. “Because one of the things you can definitely guarantee is where there are no customers, there are also no competitors.”

Robotics was that new market. NVIDIA built the first robotics computer, Huang said, processing a deep learning algorithm. Over a decade later, that pivot has given the company the opportunity to create the next wave of AI.

“One setback after another, we shook it off and skated to the next opportunity. Each time, we gain skills and strengthen our character,” Huang said. “No setback that comes our way doesn’t look like an opportunity these days.”

Huang stressed the importance of resilience and agility as superpowers that strengthen character.

“The world can be unfair and deal you with tough cards. Swiftly shake it off,” he said, with a tongue-in-cheek reference to one of Taylor Swift’s biggest hits. “There’s another opportunity out there — or create one.”

Huang concluded by sharing a story from his travels to Japan, where, as he watched a gardener painstakingly tending to Kyoto’s famous moss garden, he realized that when a person is dedicated to their craft and prioritizes doing their life’s work, they always have plenty of time.

“Prioritize your life,” he said, “and you will have plenty of time to do the important things.”

Main image courtesy of Caltech. 

Read More

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

NVIDIA today announced Nemotron-4 340B, a family of open models that developers can use to generate synthetic data for training large language models (LLMs) for commercial applications across healthcare, finance, manufacturing, retail and every other industry.

High-quality training data plays a critical role in the performance, accuracy and quality of responses from a custom LLM — but robust datasets can be prohibitively expensive and difficult to access.

Through a uniquely permissive open model license, Nemotron-4 340B gives developers a free, scalable way to generate synthetic data that can help build powerful LLMs.

The Nemotron-4 340B family includes base, instruct and reward models that form a pipeline to generate synthetic data used for training and refining LLMs. The models are optimized to work with NVIDIA NeMo, an open-source framework for end-to-end model training, including data curation, customization and evaluation. They’re also optimized for inference with the open-source NVIDIA TensorRT-LLM library.

Nemotron-4 340B can be downloaded now from Hugging Face. Developers will soon be able to access the models at ai.nvidia.com, where they’ll be packaged as an NVIDIA NIM microservice with a standard application programming interface that can be deployed anywhere.

Navigating Nemotron to Generate Synthetic Data

LLMs can help developers generate synthetic training data in scenarios where access to large, diverse labeled datasets is limited.

The Nemotron-4 340B Instruct model creates diverse synthetic data that mimics the characteristics of real-world data, helping improve data quality to increase the performance and robustness of custom LLMs across various domains.

Then, to boost the quality of the AI-generated data, developers can use the Nemotron-4 340B Reward model to filter for high-quality responses. Nemotron-4 340B Reward grades responses on five attributes: helpfulness, correctness, coherence, complexity and verbosity. It’s currently first place on the Hugging Face RewardBench leaderboard, created by AI2, for evaluating the capabilities, safety and pitfalls of reward models.

nemotron synthetic data generation pipeline diagram
In this synthetic data generation pipeline, (1) the Nemotron-4 340B Instruct model is first used to produce synthetic text-based output. An evaluator model, (2) Nemotron-4 340B Reward, then assesses this generated text — providing feedback that guides iterative improvements and ensures the synthetic data is accurate, relevant and aligned with specific requirements.

Researchers can also create their own instruct or reward models by customizing the Nemotron-4 340B Base model using their proprietary data, combined with the included HelpSteer2 dataset.

Fine-Tuning With NeMo, Optimizing for Inference With TensorRT-LLM

Using open-source NVIDIA NeMo and NVIDIA TensorRT-LLM, developers can optimize the efficiency of their instruct and reward models to generate synthetic data and to score responses.

All Nemotron-4 340B models are optimized with TensorRT-LLM to take advantage of tensor parallelism, a type of model parallelism in which individual weight matrices are split across multiple GPUs and servers, enabling efficient inference at scale.

Nemotron-4 340B Base, trained on 9 trillion tokens, can be customized using the NeMo framework to adapt to specific use cases or domains. This fine-tuning process benefits from extensive pretraining data and yields more accurate outputs for specific downstream tasks.

A variety of customization methods are available through the NeMo framework, including supervised fine-tuning and parameter-efficient fine-tuning methods such as low-rank adaptation, or LoRA.

To boost model quality, developers can align their models with NeMo Aligner and datasets annotated by Nemotron-4 340B Reward. Alignment is a key step in training LLMs, where a model’s behavior is fine-tuned using algorithms like reinforcement learning from human feedback (RLHF) to ensure its outputs are safe, accurate, contextually appropriate and consistent with its intended goals.

Businesses seeking enterprise-grade support and security for production environments can also access NeMo and TensorRT-LLM through the cloud-native NVIDIA AI Enterprise software platform, which provides accelerated and efficient runtimes for generative AI foundation models.

Evaluating Model Security and Getting Started

The Nemotron-4 340B Instruct model underwent extensive safety evaluation, including adversarial tests, and performed well across a wide range of risk indicators. Users should still perform careful evaluation of the model’s outputs to ensure the synthetically generated data is suitable, safe and accurate for their use case.

For more information on model security and safety evaluation, read the model card.

Download Nemotron-4 340B models via Hugging Face. For more details, read the research papers on the model and dataset.

See notice regarding software product information.

Read More

‘The Proudest Refugee’: How Veronica Miller Charts Her Own Path at NVIDIA

‘The Proudest Refugee’: How Veronica Miller Charts Her Own Path at NVIDIA

When she was five years old, Veronica Miller (née Teklai) and her family left their homeland of Eritrea, in the Horn of Africa, to escape an ongoing war with Ethiopia and create a new life in the U.S.

She grew up in East Orange, New Jersey, watching others judge her parents and turn them away from jobs they were qualified for because of their appearance, their accented English or their unfamiliar names.

After working in the shipping industry for 20 years, Miller’s dad eventually became a New York City cab driver, an often-dangerous job in the 1980s. Her mom, despite earning a computer science degree in the U.S., trained to become a home health aide, where jobs were more available.

“My parents’ resilience and courage made my life possible,” Miller said.

After graduating from Ramapo College of New Jersey with a degree in international business, Miller worked at large automotive companies in client support, production support and project management.

Now working as a technical program manager in product security at NVIDIA, she feels like her family’s journey has come full circle.

“It’s the honor of my life being here at NVIDIA: I’m the proudest refugee,” she said.

In her role, Miller functions like a conductor in an orchestra. She works with engineers to bridge gaps and understand challenges to define solutions — always trying to create opportunities to turn a “no” into a “yes” through collaboration.

At NVIDIA, Miller feels like she can be herself, helping her thrive. She no longer feels the pressure to conform to fit in, allowing her creativity to flow freely and solve problems.

“Previously in my career, I never wore my hair curly. After someone once asked to touch my curly hair, I believed it would be easier to make myself look like everyone else. I thought it was the best way to let my work be the focus instead of my hair,” she said. “NVIDIA is the first employer that encouraged me to bring my full self to work.”

Outside of work, Veronica and her husband, Nathan, are passionate about paying it forward and helping local youth in Trenton, New Jersey. Together, they’ve developed The Miller Family Foundation to help with community needs, including education. The foundation’s scholarship fund has donated $20,000 to low-income high school students to provide support for college tuition and career mentorship.

“I truly believe anyone could get here. There wasn’t anyone that showed me the path. It was belief in myself, a ton of research and endless hard work,” she said. “We’re in a special place where my husband and I can give the next generation some of the financial support and career guidance we didn’t have.”

Learn more about NVIDIA life, culture and careers

Read More

Cloud Ahoy! Treasure Awaits With ‘Sea of Thieves’ on GeForce NOW

Cloud Ahoy! Treasure Awaits With ‘Sea of Thieves’ on GeForce NOW

Set sail for adventure, pirates. Sea of Thieves makes waves in the cloud this week. It’s an adventure-filled GFN Thursday with four new games joining the GeForce NOW library.

#GreetingsfromGFN
#GreetingsFromGFN by Cloud Gaming Photography.

Plus, members are sharing their favorite locations they can access from the cloud. Follow along all month on @NVIDIAGFN social media accounts and post your own favorite cloud screenshots using #GreetingsfromGFN.

Seas the Day

Live the pirate life in the smash-hit pirate adventure game from Rare and Xbox Game Studios. Sea of Thieves takes place in an open world where players can explore the vast seas, engage in ship battles, hunt for treasure and embark on exciting quests.

Sea of Thieves on GeForce NOW
Come sea what’s possible in the cloud.

The Sea of Thieves environment is always changing, as various seasons bring new features to the game and offer rich rewards for pirates old and new. Visit uncharted islands in search of treasure, dive deep into narrative-focused Tall Tales, take part in events and forge a path to become a true Pirate Legend. The newest season features the mysterious Sunken City, Cursed Sloop skeleton ships and fresh cosmetics.

Every pirate needs a crew, so grab some mateys and carve a fearsome reputation across the open seas, or adventure solo to keep all the bountiful treasure. Make the journey more rewarding with a GeForce NOW Ultimate membership, and play with gamers across the world with up to eight-hour gaming sessions for a kraken good time.

New Games Zoom Onto the Cloud

Disney Speedstorm on GeForce NOW
Take the tracks by storm.

Drift into the ultimate hero-based combat racing game in Disney Speedstorm, a free-to-play kart-racing game that features characters and high-speed circuits inspired by beloved Disney and Pixar worlds. Customize racers and karts, master each character’s unique skills and engage in thrilling multiplayer races. Whether exploring the docks of the Pirates’ Island track from Pirates of the Caribbean or the wilds of the Jungle Ruins map from The Jungle Book, players can experience iconic environments in the game.

Check out the list of new games this week:

  • SunnySide (New release on Steam, June 14)
  • Disney Speedstorm (Steam and Xbox, available on PC Game Pass)
  • Sea of Thieves (Steam and Xbox, available on PC Game Pass)
  • Bodycam (Steam)

What are you planning to play this weekend? Let us know on X or in the comments below.

 

Read More

Every Company’s Data Is Their ‘Gold Mine,’ NVIDIA CEO Says at Databricks Data + AI Summit

Every Company’s Data Is Their ‘Gold Mine,’ NVIDIA CEO Says at Databricks Data + AI Summit

Accelerated computing is transforming data processing and analytics for enterprises, declared NVIDIA founder and CEO Jensen Huang Wednesday during an on-stage chat with Databricks cofounder and CEO Ali Ghodsi at the Databricks Data + AI Summit 2024.

“Every company’s business data is their gold mine,” Huang said, explaining that every company has enormous amounts of data, but extracting insights and distilling intelligence from it has been challenging.

Databricks Leverages NVIDIA’s Full Stack to Accelerate Generative AI Applications

To unlock all that intelligence, Huang and Ghodsi announced the integration of NVIDIA’s accelerated computing with Databricks Photon, Databricks’ engine for fast data processing, designed to power Databricks SQL with top-tier performance and cost efficiency.

“This is a big announcement,” Huang said, adding that accelerated computing and generative AI are the two most important technological trends today. “NVIDIA and Databricks are going to partner to combine our skills in these areas and bring them to all of you.”

Huang shared that it’s taken NVIDIA five years to build a set of libraries that make it possible to accelerate Photon, allowing users to “wrangle data faster, more cost-effectively and consume a lot less energy.”

“We are super-excited to partner with you to use GPU acceleration on the Photon engine to enhance core data processing and get them to also run on NVIDIA GPUs,” Ghodsi said.

Creating Generative AI Factories With NVIDIA NIM

NVIDIA and Databricks also announced that Databricks’ open-source model DBRX is now available as an NVIDIA NIM microservice hosted on the NVIDIA API catalog.

NVIDIA NIM inference microservices provide models as fully optimized, pre-built containers for deployment anywhere.

“Creating these endpoints is complicated,” Huang explained. “We optimized everything into a microservice, which runs on every cloud and on premises.”

Microservices dramatically increase enterprise developer productivity by providing a simple, standardized way to add generative AI models to applications.

Launched in March, DBRX was built entirely on top of Databricks, leveraging all the tools and techniques available to Databricks customers and partners, and was trained with NVIDIA DGX Cloud, a scalable end-to-end AI platform for developers.

Organizations can customize DBRX with enterprise data to create high-quality, organization-specific models or use it to build a custom DBRX-style mixture of expert models as a reference architecture.

Huang said that accelerating data processing is a huge opportunity, encouraging everyone to put accelerated computing and generative AI to work.

“Whatever you do, just start — you have to engage in this incredibly fast-moving train,” Huang said. “Remember, generative AI is growing exponentially — you don’t want to wait and observe an exponential trend, because in a couple of years, you’ll be so far behind.”

Joining the Conversation

Attendees at the summit are encouraged to participate in sessions and engage with NVIDIA experts to learn more about how NVIDIA and Databricks are driving the future of AI and data intelligence.

Key sessions, taking place June 13, include:

  • “Development and Deployment of Generative AI with NVIDIA” at 12:30 p.m. PT
  • “Architecture Analysis for ETL Processing: CPU vs. GPU” at 4:30 p.m. PT;
  • “Spark RAPIDS ML: GPU Accelerated Distributed ML in Spark Clusters” at 1:30 p.m. PT

Read More

Scaling to New Heights: NVIDIA MLPerf Training Results Showcase Unprecedented Performance and Elasticity

Scaling to New Heights: NVIDIA MLPerf Training Results Showcase Unprecedented Performance and Elasticity

The full-stack NVIDIA accelerated computing platform has once again demonstrated exceptional performance in the latest MLPerf Training v4.0 benchmarks.

NVIDIA more than tripled the performance on the large language model (LLM) benchmark, based on GPT-3 175B, compared to the record-setting NVIDIA submission made last year. Using an AI supercomputer featuring 11,616 NVIDIA H100 Tensor Core GPUs connected with NVIDIA Quantum-2 InfiniBand networking, NVIDIA  achieved this remarkable feat through larger scale — more than triple that of the 3,584 H100 GPU submission a year ago — and extensive full-stack engineering.

Thanks to the scalability of the NVIDIA AI platform, Eos can now train massive AI models like GPT-3 175B even faster, and this great AI performance translates into significant business opportunities. For example, in NVIDIA’s recent earnings call, we described how LLM service providers can turn a single dollar invested into seven dollars in just four years running the Llama 3 70B model on NVIDIA HGX H200 servers. This return assumes an LLM service provider serving Llama 3 70B at $0.60/M tokens, with an HGX H200 server throughput of 24,000 tokens/second.

NVIDIA H200 GPU Supercharges Generative AI and HPC 

The NVIDIA H200 Tensor GPU builds upon the strength of the Hopper architecture, with 141GB of HBM3 memory and over 40% more memory bandwidth compared to the H100 GPU. Pushing the boundaries of what’s possible in AI training, the NVIDIA H200 Tensor Core GPU extended the H100’s performance by up to 47% in its MLPerf Training debut.

NVIDIA Software Drives Unmatched Performance Gains

Additionally, our submissions using a 512 H100 GPU configuration are now up to 27% faster compared to just one year ago due to numerous optimizations to the NVIDIA software stack. This improvement highlights how continuous software enhancements can significantly boost performance, even with the same hardware.

This work also delivered nearly perfect scaling. As the number of GPUs increased by 3.2x — going from 3,584 H100 GPUs last year to 11,616 H100 GPUs with this submission — so did the delivered performance.

Learn more about these optimizations on the NVIDIA Technical Blog.

Excelling at LLM Fine-Tuning

As enterprises seek to customize pretrained large language models, LLM fine-tuning is becoming a key industry workload. MLPerf introduced a new LLM fine-tuning benchmark this round, based on the popular low-rank adaptation (LoRA) technique applied to Meta Llama 2 70B.

The NVIDIA platform excelled at this task, scaling from eight to 1,024 GPUs, with the largest-scale NVIDIA submission completing the benchmark in a record 1.5 minutes.

Accelerating Stable Diffusion and GNN Training

NVIDIA also accelerated Stable Diffusion v2 training performance by up to 80% at the same system scales submitted last round. These advances reflect numerous enhancements to the NVIDIA software stack, showcasing how software and hardware improvements go hand-in-hand to deliver top-tier performance.

On the new graph neural network (GNN) test based on R-GAT, the NVIDIA platform with H100 GPUs excelled at both small and large scales. The H200 delivered a 47% boost on single-node GNN training compared to the H100. This showcases the powerful performance and high efficiency of NVIDIA GPUs, which make them ideal for a wide range of AI applications.

Broad Ecosystem Support

Reflecting the breadth of the NVIDIA AI ecosystem, 10 NVIDIA partners submitted results, including ASUS, Dell Technologies, Fujitsu, GIGABYTE, Hewlett Packard Enterprise, Lenovo, Oracle, Quanta Cloud Technology, Supermicro and Sustainable Metal Cloud. This broad participation, and their own impressive benchmark results, underscores the widespread adoption and trust in NVIDIA’s AI platform across the industry.

MLCommons’ ongoing work to bring benchmarking best practices to AI computing is vital. By enabling peer-reviewed comparisons of AI and HPC platforms, and keeping pace with the rapid changes that characterize AI computing, MLCommons provides companies everywhere with crucial data that can help guide important purchasing decisions.

And with the NVIDIA Blackwell platform, next-level AI performance on trillion-parameter generative AI models for both training and inference is coming soon.

Read More

TOPS of the Class: Decoding AI Performance on RTX AI PCs and Workstations

TOPS of the Class: Decoding AI Performance on RTX AI PCs and Workstations

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for RTX PC users.

The era of the AI PC is here, and it’s powered by NVIDIA RTX and GeForce RTX technologies. With it comes a new way to evaluate performance for AI-accelerated tasks, and a new language that can be daunting to decipher when choosing between the desktops and laptops available.

While PC gamers understand frames per second (FPS) and similar stats, measuring AI performance requires new metrics.

Coming Out on TOPS

The first baseline is TOPS, or trillions of operations per second. Trillions is the important word here — the processing numbers behind generative AI tasks are absolutely massive. Think of TOPS as a raw performance metric, similar to an engine’s horsepower rating. More is better.

Compare, for example, the recently announced Copilot+ PC lineup by Microsoft, which includes neural processing units (NPUs) able to perform upwards of 40 TOPS. Performing 40 TOPS is sufficient for some light AI-assisted tasks, like asking a local chatbot where yesterday’s notes are.

But many generative AI tasks are more demanding. NVIDIA RTX and GeForce RTX GPUs deliver unprecedented performance across all generative tasks — the GeForce RTX 4090 GPU offers more than 1,300 TOPS. This is the kind of horsepower needed to handle AI-assisted digital content creation, AI super resolution in PC gaming, generating images from text or video, querying local large language models (LLMs) and more.

Insert Tokens to Play

TOPS is only the beginning of the story. LLM performance is measured in the number of tokens generated by the model.

Tokens are the output of the LLM. A token can be a word in a sentence, or even a smaller fragment like punctuation or whitespace. Performance for AI-accelerated tasks can be measured in “tokens per second.”

Another important factor is batch size, or the number of inputs processed simultaneously in a single inference pass. As an LLM will sit at the core of many modern AI systems, the ability to handle multiple inputs (e.g. from a single application or across multiple applications) will be a key differentiator. While larger batch sizes improve performance for concurrent inputs, they also require more memory, especially when combined with larger models.

The more you batch, the more (time) you save.

RTX GPUs are exceptionally well-suited for LLMs due to their large amounts of dedicated video random access memory (VRAM), Tensor Cores and TensorRT-LLM software.

GeForce RTX GPUs offer up to 24GB of high-speed VRAM, and NVIDIA RTX GPUs up to 48GB, which can handle larger models and enable higher batch sizes. RTX GPUs also take advantage of Tensor Cores — dedicated AI accelerators that dramatically speed up the computationally intensive operations required for deep learning and generative AI models. That maximum performance is easily accessed when an application uses the NVIDIA TensorRT software development kit (SDK), which unlocks the highest-performance generative AI on the more than 100 million Windows PCs and workstations powered by RTX GPUs.

The combination of memory, dedicated AI accelerators and optimized software gives RTX GPUs massive throughput gains, especially as batch sizes increase.

Text-to-Image, Faster Than Ever

Measuring image generation speed is another way to evaluate performance. One of the most straightforward ways uses Stable Diffusion, a popular image-based AI model that allows users to easily convert text descriptions into complex visual representations.

With Stable Diffusion, users can quickly create and refine images from text prompts to achieve their desired output. When using an RTX GPU, these results can be generated faster than processing the AI model on a CPU or NPU.

That performance is even higher when using the TensorRT extension for the popular Automatic1111 interface. RTX users can generate images from prompts up to 2x faster with the SDXL Base checkpoint — significantly streamlining Stable Diffusion workflows.

ComfyUI, another popular Stable Diffusion user interface, added TensorRT acceleration last week. RTX users can now generate images from prompts up to 60% faster, and can even convert these images to videos using Stable Video Diffuson up to 70% faster with TensorRT.

TensorRT acceleration can be put to the test in the new UL Procyon AI Image Generation benchmark, which delivers speedups of 50% on a GeForce RTX 4080 SUPER GPU compared with the fastest non-TensorRT implementation.

TensorRT acceleration will soon be released for Stable Diffusion 3 — Stability AI’s new, highly anticipated text-to-image model — boosting performance by 50%. Plus, the new TensorRT-Model Optimizer enables accelerating performance even further. This results in a 70% speedup compared with the non-TensorRT implementation, along with a 50% reduction in memory consumption.

Of course, seeing is believing — the true test is in the real-world use case of iterating on an original prompt. Users can refine image generation by tweaking prompts significantly faster on RTX GPUs, taking seconds per iteration compared with minutes on a Macbook Pro M3 Max. Plus, users get both speed and security with everything remaining private when running locally on an RTX-powered PC or workstation.

The Results Are in and Open Sourced

But don’t just take our word for it. The team of AI researchers and engineers behind the open-source Jan.ai recently integrated TensorRT-LLM into its local chatbot app, then tested these optimizations for themselves.

Source: Jan.ai

The researchers tested its implementation of TensorRT-LLM against the open-source llama.cpp inference engine across a variety of GPUs and CPUs used by the community. They found that TensorRT is “30-70% faster than llama.cpp on the same hardware,” as well as more efficient on consecutive processing runs. The team also included its methodology, inviting others to measure generative AI performance for themselves.

From games to generative AI, speed wins. TOPS, images per second, tokens per second and batch size are all considerations when determining performance champs.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More

Nerding About NeRFs: How Neural Radiance Fields Transform 2D Images Into Hyperrealistic 3D Models

Nerding About NeRFs: How Neural Radiance Fields Transform 2D Images Into Hyperrealistic 3D Models

Let’s talk about NeRFs — no, not the neon-colored foam dart blasters, but neural radiance fields, a technology that might just change the nature of images forever. In this episode of NVIDIA’s AI Podcast recorded live at GTC, host Noah Kravitz speaks with Michael Rubloff, founder and managing editor of radiancefields.com, about radiance field-based technologies. NeRFs allow users to take a series of 2D images or video to create a hyperrealistic 3D model — something like a photograph of a scene, but that can be looked at from multiple angles. Tune in to learn more about the technology’s creative and commercial applications and how it might transform the way people capture and experience the world.

Watch the replay of Rubloff’s GTC session on the intersection of generative AI and extended reality.

Time Stamps

1:18: What are NeRFs?
3:01: How do NeRFs work?
3:44: What’s the difference between NeRFs and Gaussian splatting?
4:36: How are NeRFs being used?
7:22: What is a radiance field?
14:18: How might radiance fields affect creative applications?
17:50: Examples of NeRFs in action in the media right now
21:00: Rubloff’s insight on where NeRFs will go in the future

You Might Also Like…

Media.Monks’ Lewis Smithingham on Enhancing Media and Marketing With AI – Ep. 222

Meet Media.Monks’ Wormhole, an alien-like, conversational robot with a quirky personality and the ability to offer keen marketing expertise. Lewis Smithingham, senior vice president of innovation and special ops at Media.Monks, a global marketing and advertising company, discusses the creation of Wormhole and AI’s potential to enhance media and entertainment.

Living Optics CEO Robin Wang on Democratizing Hyperspectral Imaging – Ep. 219

Step into the realm of the unseen with Robin Wang, CEO of Living Optics. The startup cofounder discusses the power of Living Optics’ hyperspectral imaging camera, which can capture visual data across 96 colors, reveals details invisible to the human eye.

Exploring Filmmaking With Cuebric’s AI: Insights from Pinar Seyhan Demirdag – Ep. 214

Pinar Seyhan Demirdag, co-founder and CEO of Cuebric, which is on a mission to offer new filmmaking and content creation solutions through immersive, two-and-a-half-dimensional cinematic environments.

Deepdub’s Ofir Krakowski on Redefining Dubbing From Hollywood to Bollywood – Ep. 202

Deepdub acts as a digital bridge, providing access to content by using generative AI to break down language and cultural barriers. The Israel-based startup’s co-founder and CEO, Ofir Krakowski, describes how Deepdub uses AI-driven dubbing to help entertainment companies boost efficiency and cut costs while increasing accessibility.

Subscribe to the AI Podcast

Get the AI Podcast through iTunes, Google Play, Amazon Music, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

Read More