NVIDIA AI Summit Panel Outlines Autonomous Driving Safety

NVIDIA AI Summit Panel Outlines Autonomous Driving Safety

The autonomous driving industry is shaped by rapid technological advancements and the need for standardization of guidelines to ensure the safety of both autonomous vehicles (AVs) and their interaction with human-driven vehicles.

At the NVIDIA AI Summit this week in Washington, D.C., industry experts shared viewpoints on this AV safety landscape from regulatory and technology perspectives.

Danny Shapiro, vice president of automotive at NVIDIA, led the wide-ranging conversation with Mark Rosekind, former administrator of the National Highway Traffic Safety Administration, and Marco Pavone, director of AV research at NVIDIA.

To frame the discussion, Shapiro kicked off with a sobering comment about the high number of crashes, injuries and fatalities on the world’s roadways. Human error remains a serious problem and the primary cause of these incidents.

“Improving safety on our roads is critical,” Shapiro said, noting that NVIDIA has been working for over two decades with the auto industry, including advanced driver assistance systems and fully autonomous driving technology development.

NVIDIA’s approach to AV development is centered on the integration of three computers: one for training the AI, one for simulation to test and validate the AI, and one in the vehicle to process sensor data in real time to make safe driving decisions. Together, these systems enable continuous development cycles, always improving the AV software in performance and safety.

Rosekind, a highly regarded automotive safety expert, spoke about the patchwork of regulations that exists across the U.S., explaining that federal agencies focus on the vehicle, while the states focus on the operator, including driver education, insurance and licensing.

Pavone commented on the emergence of new tools that allow researchers and developers to rethink how AV development is carried out, as a result of the explosion of new technologies related to generative AI and neural rendering, among others.

These technologies are enabling new developments in simulation, for example to generate complex scenarios aimed at stress testing vehicles for safety purposes. And they’re harnessing foundation models, such as vision language models, to allow developers to build more robust autonomy software, Pavone said.

NVIDIA AI Summit panelists

One of the relevant and timely topics discussed during the panel was an announcement made during the AI Summit by MITRE, a government-sponsored nonprofit research organization.

MITRE announced its partnership with Mcity at the University of Michigan to develop a virtual and physical AV validation platform for industry deployment.

MITRE will use Mcity’s simulation tools and a digital twin of its Mcity Test Facility, a real-world AV test environment in its digital proving ground. The jointly developed platform will deliver physically based sensor simulation enabled by NVIDIA Omniverse Cloud Sensor RTX applications programming interfaces.

By combining these simulation capabilities with the MITRE digital proving ground reporting and analysis framework, developers will be able to perform exhaustive testing in a simulated world to safely validate AVs before real-world deployment.

NVIDIA AI Summit AV safety panelists

Rosekind commented: The MITRE announcement “represents an opportunity to have a trusted source who’s done this in many other areas, especially in aviation, to create an independent, neutral setting to test safety assurance.”

“One of the most exciting things about this endeavor is that simulation is going to have a key role,” added Pavone. “Simulation allows you to test very dangerous conditions in a repeatable and varied way, so you can simulate different cases at scale.”

“That’s the beauty of simulation,” said Shapiro. “It’s repeatable, it’s controllable. We can control the weather in the simulation. We can change the time of day, and then we can control all the scenarios and inject hazards. Once the simulation is created, we can run it over and over, and as the software develops, we can ensure we are solving the problem, and can fine-tune as necessary.”

The panel wrapped up with a reminder that the key goal of autonomous driving is one that businesses and regulators alike share: to reduce death and injuries on our roadways.

Watch a replay of the session. (Registration required.)

To learn more about NVIDIA’s commitment to bringing safety to our roads, read the NVIDIA Self-Driving Safety Report.

Read More

Game-Changer: How the World’s First GPU Leveled Up Gaming and Ignited the AI Era

Game-Changer: How the World’s First GPU Leveled Up Gaming and Ignited the AI Era

In 1999, fans lined up at Blockbuster to rent chunky VHS tapes of The Matrix. Y2K preppers hoarded cash and canned Spam, fearing a worldwide computer crash. Teens gleefully downloaded Britney Spears and Eminem on Napster.

But amid the caffeinated fizz of turn-of-the-millennium tech culture, something more transformative was unfolding.

The release of NVIDIA’s GeForce 256 twenty-five years ago today, overlooked by all but hardcore PC gamers and tech enthusiasts at the time, would go on to lay the foundation for today’s generative AI.

The GeForce 256 wasn’t just another graphics card — it was introduced as the world’s first GPU, setting the stage for future advancements in both gaming and computing.

With hardware transform and lighting (T&L), it took the load off the CPU, a pivotal advancement. As Tom’s Hardware emphasized: “[The GeForce 256] can take the strain off the CPU, keep the 3D-pipeline from stalling, and allow game developers to use much more polygons, which automatically results in greatly increased detail.”

Where Gaming Changed Forever

For gamers, starting up Quake III Arena on a GeForce 256 was a revelation. “Immediately after firing up your favorite game, it feels like you’ve never even seen the title before this moment,” as the enthusiasts at AnandTech put it,

The GeForce 256 paired beautifully with breakthrough titles such Unreal Tournament, one of the first games with realistic reflections, which would go on to sell more than 1 million copies in its first year.

Over the next quarter-century, the collaboration between game developers and NVIDIA would continue to push boundaries, driving advancements such as increasingly realistic textures, dynamic lighting, and smoother frame rates — innovations that delivered far more than just immersive experiences for gamers.

NVIDIA’s GPUs evolved into a platform that transformed new silicon and software into powerful, visceral innovations that reshaped the gaming landscape.

In the decades to come, NVIDIA GPUs drove ever higher frame rates and visual fidelity, allowing for smoother, more responsive gameplay.

This leap in performance was embraced by platforms such as Twitch, YouTube Gaming, and Facebook, as gamers were able to stream content with incredible clarity and speed.

These performance boosts not only transformed the gaming experience but also turned players into entertainers. This helped fuel the global growth of esports.

Major events like The International (Dota 2), the League of Legends World Championship, and the Fortnite World Cup attracted millions of viewers, solidifying esports as a global phenomenon and creating new opportunities for competitive gaming.

From Gaming to AI: The GPU’s Next Frontier

As gaming worlds grew in complexity, so too did the computational demands.

The parallel power that transformed gaming graphics caught the attention of researchers, who realized these GPUs could also unlock massive computational potential in AI, enabling breakthroughs far beyond the gaming world.

Deep learning — a software model that relies on billions of neurons and trillions of connections — requires immense computational power.

Traditional CPUs, designed for sequential tasks, couldn’t efficiently handle this workload. But GPUs, with their massively parallel architecture, were perfect for the job.

By 2011, AI researchers had discovered NVIDIA GPUs and their ability to handle deep learning’s immense processing needs.

Researchers at Google, Stanford and New York University began using NVIDIA GPUs to accelerate AI development, achieving performance that previously required supercomputers.

In 2012, a breakthrough came when Alex Krizhevsky from the University of Toronto used NVIDIA GPUs to win the ImageNet image recognition competition. His neural network, AlexNet, trained on a million images, crushed the competition, beating handcrafted software written by vision experts.

This marked a seismic shift in technology. What once seemed like science fiction — computers learning and adapting from vast amounts of data — was now a reality, driven by the raw power of GPUs.

By 2015, AI had reached superhuman levels of perception, with Google, Microsoft and Baidu surpassing human performance in tasks like image recognition and speech understanding — all powered by deep neural networks running on GPUs.

In 2016, NVIDIA CEO Jensen Huang donated the first NVIDIA DGX-1 AI supercomputer — a system packed with eight cutting-edge GPUs — to OpenAI, which would harness GPUs to train ChatGPT, launched in November 2022.

In 2018, NVIDIA debuted GeForce RTX (20 Series) with RT Cores and Tensor Cores, designed specifically for real-time ray tracing and AI workloads.

This innovation accelerated the adoption of ray-traced graphics in games, bringing cinematic realism to gaming visuals and AI-powered features like NVIDIA DLSS, which enhanced gaming performance by leveraging deep learning.

Meanwhile, ChatGPT, launched in 2022, would go on to reach more than 100 million users within months of its launch, demonstrating how NVIDIA GPUs continue to drive the transformative power of generative AI.

Today, GPUs aren’t only celebrated in the gaming world — they’ve become icons of tech culture, appearing in Reddit memes, Twitch streams, T-shirts at Comic-Con and even being immortalized in custom PC builds and digital fan art.

Shaping the Future

This revolution that began with the GeForce 256 continues to unfold today in gaming and entertainment, in personal computing where AI powered by NVIDIA GPUs is now part of everyday life — and inside the trillion-dollar industries building next-generation AI into the core of their businesses.

GPUs are not just enhancing gaming but are designing the future of AI itself.

And now, with innovations like NVIDIA DLSS, which uses AI to boost gaming performance and deliver sharper images, and NVIDIA ACE, designed to bring more lifelike interactions to in-game characters, AI is once again reshaping the gaming world.

The GeForce 256 laid the bedrock for a future where gaming, computing, and AI are not just evolving — together, they’re transforming the world.

Read More

AI’ll Be by Your Side: Mental Health Startup Enhances Therapist-Client Connections

AI’ll Be by Your Side: Mental Health Startup Enhances Therapist-Client Connections

Half of the world’s population will experience a mental health disorder — but the median number of mental health workers per 100,000 people is just 13, according to the World Health Organization.

To help tackle this disparity — which can vary by over 40x between high-income and low-income countries — a Madrid-based startup is offering therapists AI tools to improve the delivery of mental health services.

Therapyside, a member of the NVIDIA Inception program for cutting-edge startups, is bolstering its online therapy platform using NVIDIA NIM inference microservices. These AI microservices serve as virtual assistants and notetakers, letting therapists focus on connecting with their clients.

“In a therapy setting, having a strong alliance between counselor and client is everything,” said Alessandro De Sario, founder and CEO of Therapyside. “When a therapist can focus on the session without worrying about note-taking, they can reach that level of trust and connection much quicker.”

For the therapists and clients who have opted in to test these AI tools, a speech recognition model transcribes their conversations. A large language model summarizes the session into clinical notes, saving time for therapists so they can speak with more clients and work more efficiently. Another model powers a virtual assistant, dubbed Maia, that can answer therapists’ questions using retrieval-augmented generation, aka RAG.

Therapyside aims to add features over time, such as support for additional languages and an offline version that can transcribe and summarize in-person therapy sessions.

“We’ve just opened the door,” said De Sario. “We want to make the tool much more powerful so it can handle administrative tasks like calendar management and patient follow-up, or remind therapists of topics they should cover in a given session.”

AI’s in Session: Enhancing Therapist-Client Relationships

Therapyside, founded in 2017, works with around 1,000 licensed therapists in Europe offering counseling in English, Italian and Spanish. More than 500,000 therapy sessions have been completed through its virtual platform to date.

The company’s AI tools are currently available through a beta program. Therapists who choose to participate can invite their clients to opt in to the AI features.

“It’s incredibly helpful to have a personalized summary with a transcription that highlights the most important points from each session I have with my patients,” said Alejandro A., one of the therapists participating in the beta program. “I’ve been pleasantly surprised by its ability to identify the most significant areas to focus on with each patient.”

Screen capture of Therapyside session with transcription running live
A speech recognition AI model can capture live transcriptions of sessions.

The therapists testing the tool rated the transcriptions and summaries as highly accurate, helping them focus on listening without worrying about note-taking.

“The recaps allow me to be fully present with the clients in my sessions,” said Maaria A., another therapist participating in the beta program.

During sessions, clients share details about their life experiences that are captured in the AI-powered transcriptions and summaries. Therapyside’s RAG-based Maia connects to these resources to help therapists quickly recall minutiae like the name of a client’s sibling, or track how a client’s main challenges have evolved over time. This information can help therapists pose more personalized questions and provide better support.

“Maia is a valuable tool to have when you’re feeling a little stuck,” said Maaria A. “I have clients all over the world, so Maia helps remind me where they live. And if I ask Maia to suggest exercises clients could do to boost their self-esteem, it helps me find resources I can send to them, which helps save time.”

Screen capture of a therapist Q&A with the Maia virtual assistant
Maia can answer therapist’s questions based on session transcripts and summaries.

Take Note: AI Microservices Enable Easy Deployment

Therapyside’s AI pipeline runs on NVIDIA GPUs in a secure cloud environment and is built with NVIDIA NIM, a set of easy-to-use microservices designed to speed up AI deployment.

For transcription, the pipeline uses NVIDIA Riva NIM microservices, which include NVIDIA Parakeet, a record-setting family of models, to deliver highly accurate automatic speech recognition. Flowchart illustrating Therapyside's AI pipeline

Once the transcript is complete, the text is processed by a NIM microservice for Meta’s Llama 3.1 family of open-source AI models to generate a summary that’s added to the client’s clinical history.

The Maia virtual assistant, which also uses a Llama 3.1 NIM microservice, accesses these clinical records using a RAG pipeline powered by NVIDIA NeMo Retriever NIM microservices. RAG techniques enable organizations to connect AI models to their private datasets to deliver contextually accurate responses.

Therapyside plans to further customize Maia with capabilities that support specific therapeutic methods, such as cognitive behavioral therapy and psychodynamic therapy. The team is also integrating NVIDIA NeMo Guardrails to further enhance the tools’ safety and security.

Kimberly Powell, vice president of healthcare at NVIDIA, will discuss Therapyside and other healthcare innovators in a keynote address at HLTH, a conference taking place October 20-23 in Las Vegas.

Learn more about NVIDIA Inception and get started with NVIDIA NIM microservices at ai.nvidia.com.

Read More

The Next Chapter Awaits: Dive Into ‘Diablo IV’s’ Latest Adventure ‘Vessel of Hatred’ on GeForce NOW

The Next Chapter Awaits: Dive Into ‘Diablo IV’s’ Latest Adventure ‘Vessel of Hatred’ on GeForce NOW

Prepare for a devilishly good time this GFN Thursday as the critically acclaimed Diablo IV: Vessel of Hatred downloadable content (DLC) joins the cloud, one of six new games available this week.

GeForce NOW also extends its game-library sync feature to Battle.net accounts, so members can seamlessly bring their favorite Blizzard games into their cloud-streaming libraries.

Hell’s Bells and Whistles

Get ready to rage. New DLC for the hit title Diablo IV: Vessel of Hatred is available to stream at launch this week, with thrilling content and gameplay for GeForce NOW members to experience.

Diablo IV Vessel of Hatred DLC on GeForce NOW
Hate is in the air.

Diablo IV: Vessel of Hatred DLC is the highly anticipated expansion of the latest installment in Blizzard’s iconic action role-playing game series. It introduces players to the lush and dangerous jungles of Nahantu. Teeming with both beauty and dangers, this new environment offers a fresh backdrop for action-packed battles against the demonic forces of Hell. A new playable class, the Spiritborn, offers unique gameplay mechanics tied to four guardian spirits: the eagle, gorilla, jaguar and centipede.

The DLC extends the main Diablo IV story and includes new features such as recruitable Mercenaries, a Player vs. Everyone co-op endgame activity, Party Finder to help members team up and take down challenges together, and more. Vessel of Hatred arrives alongside major updates including revamped leveling, a new difficulty system and Paragon adjustments that will continue to enhance the world of Diablo IV.

Ultimate members can experience the wrath at up to 4K resolution and 120 frames per second with support for NVIDIA DLSS and ray-tracing technologies. And members can jump right into the latest DLC without having to wait around for updates. Hell never looked so good, even on low-powered devices.

Let That Sync In

Battle.net game sync on GeForce NOW
Connection junction.

With game syncing for Blizzard’s Battle.net game library coming to GeForce NOW this week, members can connect their digital game store accounts so that all of their supported games are part of their streaming libraries.

Members can now easily find and stream popular titles such as StarCraft II, Overwatch 2, Call of Duty HQ and Hearthstone from their cloud gaming libraries, enhancing the games’ accessibility across a variety of devices.

Battle.net joins other digital storefronts that already have game sync support, including Steam, Epic Games Store, Xbox and Ubisoft Connect. This allows members to consolidate their gaming experiences in one place.

Plus, GeForce NOW members can play high-quality titles without the need for high-end hardware, streaming from GeForce RTX-powered servers in the cloud. Whether battling demons in Sanctuary or engaging in epic firefights, GeForce NOW members get a seamless gaming experience anytime, anywhere.

Hot and New

Europa on GeForce NOW
Soar through serenity and uncover destiny, all from the cloud.

Europa is a peaceful game of adventure, exploration and meditation from Future Friends Games, ready for members to stream at launch this week. On the moon Europa, a lush terraformed paradise in Jupiter’s shadow, an android named Zee sets out in search of answers. Run, glide and fly across the landscape, solve mysteries in the ruins of a fallen utopia, and discover the story of the last human alive.

Members can look for the following games available to stream in the cloud this week:

  • Empyrion – Galactic Survival (New release on Epic Games Store, Oct. 10)
  • Europa (New release on Steam, Oct. 11)
  • Dwarven Realms (Steam)
  • Star Trek Timelines (Steam)
  • Star Trucker (Steam)
  • Starcom: Unknown Space (Steam)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

AI Summit: US Energy Secretary Highlights AI’s Role in Science, Energy and Security

AI Summit: US Energy Secretary Highlights AI’s Role in Science, Energy and Security

AI can help solve some of the world’s biggest challenges — whether climate change, cancer or national security — U.S. Secretary of Energy Jennifer Granholm emphasized today during her remarks at the AI for Science, Energy and Security session at the NVIDIA AI Summit, in Washington, D.C.

Granholm went on to highlight the pivotal role AI is playing in tackling major national challenges, from energy innovation to bolstering national security.

“We need to use AI for both offense and defense — offense to solve these big problems and defense to make sure the bad guys are not using AI for nefarious purposes,” she said.

Granholm, who calls the Department of Energy “America’s Solutions Department,” highlighted the agency’s focus on solving the world’s biggest problems.

“Yes, climate change, obviously, but a whole slew of other problems, too … quantum computing and all sorts of next-generation technologies,” she said, pointing out that AI is a driving force behind many of these advances.

“AI can really help to solve some of those huge problems — whether climate change, cancer or national security,” she said. “The possibilities of AI for good are awesome, awesome.”

Following Granholm’s 15-minute address, a panel of experts from government, academia and industry took the stage to further discuss how AI accelerates advancements in scientific discovery, national security and energy innovation.

“AI is going to be transformative to our mission space.… We’re going to see these big step changes in capabilities,” said Helena Fu, director of the Office of Critical and Emerging Technologies at the Department of Energy, underscoring AI’s potential in safeguarding critical infrastructure and addressing cyber threats.

During her remarks, Granholm also stressed that AI’s increasing energy demands must be met responsibly.

“We are going to see about a 15% increase in power demand on our electric grid as a result of the data centers that we want to be located in the United States,” she explained.

However, the DOE is taking steps to meet this demand with clean energy.

“This year, in 2024, the United States will have added 30 Hoover Dams’ worth of clean power to our electric grid,” Granholm announced, emphasizing that the clean energy revolution is well underway.

AI’s Impact on Scientific Discovery and National Security

The discussion then shifted to how AI is revolutionizing scientific research and national security.

Tanya Das, director of the Energy Program at the Bipartisan Policy Center, pointed out that “AI can accelerate every stage of the innovation pipeline in the energy sector … starting from scientific discovery at the very beginning … going through to deployment and permitting.”

Das also highlighted the growing interest in Congress to support AI innovations, adding, “Congress is paying attention to this issue, and, I think, very motivated to take action on updating what the national vision is for artificial intelligence.”

Fu reiterated the department’s comprehensive approach, stating, “We cross from open science through national security, and we do this at scale.… Whether they be around energy security, resilience, climate change or the national security challenges that we’re seeing every day emerging.”

She also touched on the DOE’s future goals: “Our scientific systems will need access to AI systems,” Fu said, emphasizing the need to bridge both scientific reasoning and the new kinds of models we’ll need to develop for AI.

Collaboration Across Sectors: Government, Academia and Industry

Karthik Duraisamy, director of the Michigan Institute for Computational Discovery and Engineering at the University of Michigan, highlighted the power of collaboration in advancing scientific research through AI.

“Think about the scientific endeavor as 5% creativity and innovation and 95% intense labor. AI amplifies that 5% by a bit, and then significantly accelerates the 95% part,” Duraisamy explained. “That is going to completely transform science.”

Duraisamy further elaborated on the role AI could play as a persistent collaborator, envisioning a future where AI can work alongside scientists over weeks, months and years, generating new ideas and following through on complex projects.

“Instead of replacing graduate students, I think graduate students can be smarter than the professors on day one,” he said, emphasizing the potential for AI to support long-term research and innovation.

Learn more about how this week’s AI Summit highlights how AI is shaping the future across industries and how NVIDIA’s solutions are laying the groundwork for continued innovation. 

###END###

Read More

What’s the ROI? Getting the Most Out of LLM Inference

What’s the ROI? Getting the Most Out of LLM Inference

Large language models and the applications they power enable unprecedented opportunities for organizations to get deeper insights from their data reservoirs and to build entirely new classes of applications.

But with opportunities often come challenges.

Both on premises and in the cloud, applications that are expected to run in real time place significant demands on data center infrastructure to simultaneously deliver high throughput and low latency with one platform investment.

To drive continuous performance improvements and improve the return on infrastructure investments, NVIDIA regularly optimizes the state-of-the-art community models, including Meta’s Llama, Google’s Gemma, Microsoft’s Phi and our own NVLM-D-72B, released just a few weeks ago.

Relentless Improvements

Performance improvements let our customers and partners serve more complex models and reduce the needed infrastructure to host them. NVIDIA optimizes performance at every layer of the technology stack, including TensorRT-LLM, a purpose-built library to deliver state-of-the-art performance on the latest LLMs. With improvements to the open-source Llama 70B model, which delivers very high accuracy, we’ve already improved minimum latency performance by 3.5x in less than a year.

We’re constantly improving our platform performance and regularly publish performance updates. Each week, improvements to NVIDIA software libraries are published, allowing customers to get more from the very same GPUs. For example, in just a few months’ time, we’ve improved our low-latency Llama 70B performance by 3.5x.

Over the past 10 months, NVIDIA has increased performance on the Llama 70B model by 3.5x through a combination of optimized kernels, multi-head attention techniques and a variety of parallelization techniques.
NVIDIA has increased performance on the Llama 70B model by 3.5x.

In the most recent round of MLPerf Inference 4.1, we made our first-ever submission with the Blackwell platform. It delivered 4x more performance than the previous generation.

This submission was also the first-ever MLPerf submission to use FP4 precision. Narrower precision formats, like FP4, reduces memory footprint and memory traffic, and also boost computational throughput. The process takes advantage of Blackwell’s second-generation Transformer Engine, and with advanced quantization techniques that are part of TensorRT Model Optimizer, the Blackwell submission met the strict accuracy targets of the MLPerf benchmark.

MLPerf Inference v4.1 Closed, Data Center. Results retrieved from www.mlperf.org on August 28, 2024. Blackwell results measured on single GPU and retrieved from entry 4.1-0074 in the Closed, Preview category. H100 results from entry 4.1-0043 in the Closed, Available category on 8x H100 system and divided by GPU count for per GPU comparison. Per-GPU throughput is not a primary metric of MLPerf Inference. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.
Blackwell B200 delivers up to 4x more performance versus previous generation on MLPerf Inference v4.1’s Llama 2 70B workload.

Improvements in Blackwell haven’t stopped the continued acceleration of Hopper. In the last year, Hopper performance has increased 3.4x in MLPerf on H100 thanks to regular software advancements. This means that NVIDIA’s peak performance today, on Blackwell, is 10x faster than it was just one year ago on Hopper.

MLPerf Inference v4.1 Closed, Data Center. Results retrieved from www.mlperf.org from multiple dates and entries. The October 2023, December 2023, May 2024 and October 24 data points are from internal measurements. The remaining data points are from official submissions. All results using eight accelerators. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.
These results track progress on the MLPerf Inference Llama 2 70B Offline scenario over the past year.

Our ongoing work is incorporated into TensorRT-LLM, a purpose-built library to accelerate LLMs that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM is built on top of the TensorRT Deep Learning Inference library and leverages much of TensorRT’s deep learning optimizations with additional LLM-specific improvements.

Improving Llama in Leaps and Bounds

More recently, we’ve continued optimizing variants of Meta’s Llama models, including versions 3.1 and 3.2 as well as model sizes 70B and the biggest model, 405B. These optimizations include custom quantization recipes, as well as efficient use of parallelization techniques to more efficiently split the model across multiple GPUs, leveraging NVIDIA NVLink and NVSwitch interconnect technologies. Cutting-edge LLMs like Llama 3.1 405B are very demanding and require the combined performance of multiple state-of-the-art GPUs for fast responses.

Parallelism techniques require a hardware platform with a robust GPU-to-GPU interconnect fabric to get maximum performance and avoid communication bottlenecks. Each NVIDIA H200 Tensor Core GPU features fourth-generation NVLink, which provides a whopping 900GB/s of GPU-to-GPU bandwidth. Every eight-GPU HGX H200 platform also ships with four NVLink Switches, enabling every H200 GPU to communicate with any other H200 GPU at 900GB/s, simultaneously.

Many LLM deployments use parallelism over choosing to keep the workload on a single GPU, which can have compute bottlenecks. LLMs seek to balance low latency and high throughput, with the optimal parallelization technique depending on application requirements.

For instance, if lowest latency is the priority, tensor parallelism is critical, as the combined compute performance of multiple GPUs can be used to serve tokens to users more quickly. However, for use cases where peak throughput across all users is prioritized, pipeline parallelism can efficiently boost overall server throughput.

The table below shows that tensor parallelism can deliver over 5x more throughput in minimum latency scenarios, whereas pipeline parallelism brings 50% more performance for maximum throughput use cases.

For production deployments that seek to maximize throughput within a given latency budget, a platform needs to provide the ability to effectively combine both techniques like in TensorRT-LLM.

Read the technical blog on boosting Llama 3.1 405B throughput to learn more about these techniques.

This table shows that tensor parallelism can deliver over 5x more throughput in minimum latency scenarios, whereas pipeline parallelism brings 50% more performance for maximum throughput use cases.
Different scenarios have different requirements, and parallelism techniques bring optimal performance for each of these scenarios.

The Virtuous Cycle

Over the lifecycle of our architectures, we deliver significant performance gains from ongoing software tuning and optimization. These improvements translate into additional value for customers who train and deploy on our platforms. They’re able to create more capable models and applications and deploy their existing models using less infrastructure, enhancing their ROI.

As new LLMs and other generative AI models continue to come to market, NVIDIA will continue to run them optimally on its platforms and make them easier to deploy with technologies like NIM microservices and NIM Agent Blueprints.

Learn more with these resources:

Read More

Flux and Furious: New Image Generation Model Runs Fastest on RTX AI PCs and Workstations

Flux and Furious: New Image Generation Model Runs Fastest on RTX AI PCs and Workstations

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for GeForce RTX PC and NVIDIA RTX workstation users.

Image generation models — a popular subset of generative AI — can parse and understand written language, then translate words into images in almost any style.

Representing the cutting edge of what’s possible in image generation, a new series of models from Black Forest Labs — now available to try on PC and workstations — run fastest on GeForce RTX and NVIDIA RTX GPUs.

Fluxible Capabilities

FLUX.1 AI is a text-to-image generation model suite developed by Black Forest Labs. The models are built on the diffusion transformer (DiT) architecture, which allows models with a high number of parameters to maintain efficiency. The Flux models are trained on 12 billion parameters for high-quality image generation.

DiT models are efficient and computationally intensive — and NVIDIA RTX GPUs are essential for handling these new models, the largest of which can’t run on non-RTX GPUs without significant tweaking. Flux models now support the NVIDIA TensorRT software development kit, which improves their performance up to 20%. Users can try Flux and other models with TensorRT in ComfyUI.

Prompt: “A magazine photo of a monkey bathing in a hot spring in a snowstorm with steam coming off the water.” Source: NVIDIA

Flux Appeal

FLUX.1 excels in generating high-quality, diverse images with exceptional prompt adherence, which refers to how accurately the AI interprets and executes instructions. High prompt adherence means the generated image closely matches the text prompt’s described elements, style and mood. Low prompt adherence results in images that may partially or completely deviate from given instructions.

FLUX.1 is noted for its ability to render the human anatomy accurately, including for challenging, intricate features like hands and faces. FLUX.1 also significantly improves the generation of legible text within images, addressing another common challenge in text-to-image models. This makes FLUX.1 models suitable for applications that require precise text representation, such as promotional materials and book covers.

FLUX.AI is available in three variants, offering users choices to best fit their workflows without sacrificing quality:

  • FLUX.1 pro: State-of-the-art quality for enterprise users; accessible through an application programming interface.
  • FLUX.1 dev: A distilled, free version of FLUX.1 pro that still provides high quality.
  • FLUX.1 schnell: The fastest model, ideal for local development and personal use; has a permissive Apache 2.0 license.

The dev and schnell models are open source, and Black Forest Labs provides access to its weights on the popular platform Hugging Face. This encourages innovation and collaboration within the image generation community by allowing researchers and developers to build upon and enhance the models.

Embraced by the Community

The Flux models’ dev and schnell variants were downloaded more than 2 million times on HuggingFace in less than three weeks since their launch.

Users have praised FLUX.1 for its abilities to produce visually stunning images with exceptional detail and realism, as well as to process complex prompts without requiring extensive parameter adjustments.

Prompt: “A highly detailed professional close-up photo of an animorphic Bengal tiger wearing a white, ribbed tank top, sunglasses and headphones around his neck as a DJ with its paws on the turntable on stage at an outdoor electronic dance music concert in Ibiza at night; party atmosphere, wispy smoke with caustic lighting.” Source: NVIDIA

 

Prompt: “A photographic-quality image of a bustling city street during a rainy evening with a yellow taxi cab parked at the curb with its headlights on, reflecting off the wet pavement. A woman in a red coat is standing under a bright green umbrella, looking at her smartphone. On the left, there is a coffee shop with a neon sign that reads ‘Café Mocha’ in blue letters. The shop has large windows, through which people can be seen enjoying their drinks. Streetlights illuminate the area, casting a warm glow over the scene, while raindrops create a misty effect in the air. In the background, a tall building with a large digital clock displays the time as 8:45 p.m.” Source: NVIDIA

In addition, FLUX.1’s versatility in handling various artistic styles and efficiency in quickly generating images makes it a valuable tool for both personal and professional projects.

Get Started

Users can access FLUX.1 using popular community webpages like ComfyUI. The community-run ComfyUI Wiki includes step-by-step instructions for getting started.

Many YouTube creators also offer video tutorials on Flux models, like this one from MDMZ:

Share your generated images on social media using the hashtag #fluxRTX for a chance to be featured on NVIDIA AI’s channels.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More

NVIDIA AI Summit Highlights Game-Changing Energy Efficiency and AI-Driven Innovation

NVIDIA AI Summit Highlights Game-Changing Energy Efficiency and AI-Driven Innovation

Accelerated computing is sustainable computing, Bob Pette, NVIDIA’s vice president and general manager of enterprise platforms, explained in a keynote at the NVIDIA AI Summit on Tuesday in Washington, D.C.

NVIDIA’s accelerated computing isn’t just efficient. It’s critical to the next wave of industrial, scientific and healthcare transformations.

“We are in the dawn of a new industrial revolution,” Pette told an audience of policymakers, press, developers and entrepreneurs gathered for the event​. “I’m just here to tell you that we’re designing our systems with not just performance in mind, but with energy efficiency in mind.”

NVIDIA’s Blackwell platform has achieved groundbreaking energy efficiency in AI computing, reducing energy consumption by up to 2,000x over the past decade for training models like GPT-4.

NVIDIA accelerated computing is cutting energy use for token generation — the output from AI models — by 100,000x, underscoring the value of accelerated computing for sustainability amid the rapid adoption of AI worldwide.

“These AI factories produce product.  Those products are tokens, tokens are intelligence, and Intelligence is money,” Pette said. That’s what “will revolutionize every industry on this planet.”

NVIDIA’s CUDA libraries, which have been fundamental in enabling breakthroughs across industries, now power over 4,000 accelerated applications, Pette explained.

“CUDA enables acceleration…. It also turns out to be one of the most impressive ways to reduce energy consumption,” Pette said.

These libraries are central to the company’s energy-efficient AI innovations driving significant performance gains while minimizing power consumption.

Pette also detailed how NVIDIA’s AI software helps organizations deploy AI solutions quickly and efficiently, enabling businesses to innovate faster and solve complex problems across sectors.

Pette discussed the concept of agentic AI, which goes beyond traditional AI by enabling intelligent agents to perceive, reason and act autonomously.

Agentic AI is capable of ”reasoning, of learning, and taking action,” Pette said. It’s transforming industries like manufacturing, customer service, and healthcare,” Pette said.

These AI agents are transforming industries by automating complex tasks and accelerating innovation in sectors like manufacturing, customer service and healthcare, he explained.

He also described how AI agents empower businesses to drive innovation in healthcare, manufacturing, scientific research and climate modeling.

With agentic AI, “you can do in minutes what used to take days,” Pette said.

NVIDIA, in collaboration with its partners, is tackling some of the world’s greatest challenges, including improving diagnostics and healthcare delivery, advancing climate modeling efforts and even helping find signs of life beyond our planet.

NVIDIA is collaborating with SETI to conduct real-time AI searches for fast radio bursts from distant galaxies, helping continue the exploration of space, Pette said.

Pette emphasized that NVIDIA is unlocking a $10 trillion opportunity in healthcare.

Through AI, NVIDIA is accelerating innovations in diagnostics, drug discovery and medical imaging, helping transform patient care worldwide.

Solutions like the NVIDIA Clara medical imaging platform are revolutionizing diagnostics, Parabricks is enabling breakthroughs in genomics research and the MONAI AI framework is advancing medical imaging capabilities.

Pette highlighted partnerships with leading institutions, including Carnegie Mellon University and the University of Pittsburgh, fostering AI innovation and development.

Pette also described how NVIDIA’s collaboration with federal agencies illustrates the importance of public-private partnerships in advancing AI-driven solutions in healthcare, climate modeling and national security.

Pette also announced that a new NVIDIA NIM Agent Blueprint supports cybersecurity advancements, enabling industries to safeguard critical infrastructure with AI-driven solutions.

In cybersecurity, Pette highlighted the NVIDIA NIM Agent Blueprint, a powerful tool enabling organizations to safeguard critical infrastructure through real-time threat detection and analysis.

This blueprint reduces threat response times from days to seconds, representing a significant leap forward in protecting industries.

“Agentic systems can access tools and reason through full lines of thought to provide instant one-click assessments,” Pette said. “This boosts productivity by allowing security analysts to focus on the most critical tasks while AI handles the heavy lifting of analysis, delivering fast and actionable insights.

NVIDIA’s accelerated computing solutions are advancing climate research by enabling more accurate and faster climate modeling. This technology is helping scientists tackle some of the most urgent environmental challenges, from monitoring global temperatures to predicting natural disasters.

Pette described how the NVIDIA Earth 2 platform enables climate experts to import data from multiple sources, fusing them together for analysis using Nvidia Omniverse. “NVIDIA Earth 2 brings together the power of simulation AI and visualization to empower the climate, tech ecosystem,” Pette said.

NVIDIA’s Greg Estes on Building the AI Workforce of the Future

Following Pette’s keynote, Greg Estes, NVIDIA’s vice president of corporate marketing and developer programs, underscored the company’s dedication to workforce training through initiatives like the NVIDIA AI Tech Community.

And through its Deep Learning Institute, NVIDIA has already trained more than 600,000 people worldwide, equipping the next generation with the critical skills to navigate and lead in the AI-driven future.

SUBHEAD: Exploring AI’s Role in Cybersecurity and Sustainability

Throughout the week, industry leaders are exploring AI’s role in solving critical issues in fields like cybersecurity and sustainability.

Upcoming sessions will feature U.S. Secretary of Energy Jennifer Granholm, who will discuss how AI is advancing energy innovation and scientific discovery.

Other speakers will address AI’s role in climate monitoring and environmental management, further showcasing the technology’s ability to address global sustainability challenges.

Learn more about how this week’s AI Summit highlights how AI is shaping the future across industries and how NVIDIA’s solutions are laying the groundwork for continued innovation. 

Read More

US Healthcare System Deploys AI Agents, From Research to Rounds

US Healthcare System Deploys AI Agents, From Research to Rounds

The U.S. healthcare system is adopting digital health agents to harness AI across the board, from research laboratories to clinical settings.

The latest AI-accelerated tools — on display at the NVIDIA AI Summit taking place this week in Washington, D.C. — include NVIDIA NIM, a collection of cloud-native microservices that support AI model deployment and execution, and NVIDIA NIM Agent Blueprints, a catalog of pretrained, customizable workflows. 

These technologies are already in use in the public sector to advance the analysis of medical images, aid the search for new therapeutics and extract information from massive PDF databases containing text, tables and graphs. 

For example, researchers at the National Cancer Institute, part of the National Institutes of Health (NIH), are using several AI models built with NVIDIA MONAI for medical imaging — including the VISTA-3D NIM foundation model for segmenting and annotating 3D CT images. A team at NIH’s National Center for Advancing Translational Sciences (NCATS) is using the NIM Agent Blueprint for generative AI-based virtual screening to reduce the time and cost of developing novel drug molecules.

With NVIDIA NIM and NIM Agent Blueprints, medical researchers across the public sector can jump-start their adoption of state-of-the-art, optimized AI models to accelerate their work. The pretrained models are customizable based on an organization’s own data and can be continually refined based on user feedback.

NIM microservices and NIM Agent Blueprints are available at ai.nvidia.com and accessible through a wide variety of cloud service providers, global system integrators and technology solutions providers. 

Building With NIM Agent Blueprints

Dozens of NIM microservices and a growing set of NIM Agent Blueprints are available for developers to experience and download for free. They can be deployed in production with the NVIDIA AI Enterprise software platform.

  • The blueprint for generative virtual screening for drug discovery brings together three NIM microservices to help researchers search and optimize libraries of small molecules to identify promising candidates that bind to a target protein.
  • The multimodal PDF data extraction blueprint uses NVIDIA NeMo Retriever NIM microservices to extract insights from enterprise documents, helping developers build powerful AI agents and chatbots.
  • The digital human blueprint supports the creation of interactive, AI-powered avatars for customer service. These avatars have potential applications in telehealth and nonclinical aspects of patient care, such as scheduling appointments, filling out intake forms and managing prescriptions.

Two new NIM microservices for drug discovery are now available on ai.nvidia.com to help researchers understand how proteins bind to target molecules, a crucial step in drug design. By conducting more of this preclinical research digitally, scientists can narrow down their pool of drug candidates before testing in the lab — making the discovery process more efficient and less expensive. 

With the AlphaFold2-Multimer NIM microservice, researchers can accurately predict protein structure from their sequences in minutes, reducing the need for time-consuming tests in the lab. The RFdiffusion NIM microservice uses generative AI to design novel proteins that are promising drug candidates because they’re likely to bind with a target molecule. 

NCATS Accelerates Drug Discovery Research

ASPIRE, a research laboratory at NCATS, is evaluating the NIM Agent Blueprint for virtual screening and is using RAPIDS, a suite of open-source software libraries for GPU-accelerated data science, to accelerate its drug discovery research. Using the cuGraph library for graph data analytics and cuDF library for accelerating data frames, the lab’s researchers can map chemical reactions across the vast unknown chemical space. 

The NCATS informatics team reported that with NVIDIA AI, processes that used to take hours on CPU-based infrastructure are now done in seconds.

Massive quantities of healthcare data — including research papers, radiology reports and patient records — are unstructured and locked in PDF documents, making it difficult for researchers to quickly search for information. 

The Genetic and Rare Diseases Information Center, also run by NCATS, is exploring using the PDF data extraction blueprint to develop generative AI tools that enhance the center’s ability to glean information from previously unsearchable databases. These tools will help answer questions from those affected by rare diseases.

“The center analyzes data sources spanning the National Library of Medicine, the Orphanet database and other institutes and centers within the NIH to answer patient questions,” said Sam Michael, chief information officer of NCATS. “AI-powered PDF data extraction can make it massively easier to extract valuable information from previously unsearchable databases.”  

Mi-NIM-al Effort, Maximum Benefit: Getting Started With NIM 

A growing number of startups, cloud service providers and global systems integrators include NVIDIA NIM microservices and NIM Agent Blueprints as part of their platforms and services, making it easy for federal healthcare researchers to get started.   

Abridge, an NVIDIA Inception startup and NVentures portfolio company, was recently awarded a contract from the U.S. Department of Veterans Affairs to help transcribe and summarize clinical appointments, reducing the burden on doctors to document each patient interaction.

The company uses NVIDIA TensorRT-LLM to accelerate AI inference and NVIDIA Triton Inference Server for deploying its audio-to-text and content summarization models at scale, some of the same technologies that power NIM microservices.

The NIM Agent Blueprint for virtual screening is now available through AWS HealthOmics, a purpose-built service that helps customers orchestrate biological data analyses. 

Amazon Web Services (AWS) is a partner of the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability Initiative, aka STRIDES Initiative, which aims to modernize the biomedical research ecosystem by reducing economic and process barriers to accessing commercial cloud services. NVIDIA and AWS are collaborating to make NIM Agent Blueprints broadly accessible to the biomedical research community. 

ConcertAI, another NVIDIA Inception member, is an oncology AI technology company focused on research and clinical standard-of-care solutions. The company is integrating NIM microservices, NVIDIA CUDA-X microservices and the NVIDIA NeMo platform into its suite of AI solutions for large-scale clinical data processing, multi-agent models and clinical foundation models. 

NVIDIA NIM microservices are supporting ConcertAI’s high-performance, low-latency AI models through its CARA AI platform. Use cases include clinical trial design, optimization and patient matching — as well as solutions that can help boost the standard of care and augment clinical decision-making.

Global systems integrator Deloitte is bringing the NIM Agent Blueprint for virtual screening to its customers worldwide. With Deloitte Atlas AI, the company can help clients at federal health agencies easily use NIM to adopt and deploy the latest generative AI pipelines for drug discovery. 

Experience NVIDIA NIM microservices and NIM Agent Blueprints today.

NVIDIA AI Summit Highlights Healthcare Innovation

At the NVIDIA AI Summit in Washington, NVIDIA leaders, customers and partners are presenting over 50 sessions highlighting impactful work in the public sector. 

Register for a free virtual pass to hear how healthcare researchers are accelerating innovation with NVIDIA-powered AI in these sessions: 

Watch the AI Summit special address by Bob Pette, vice president of enterprise platforms at NVIDIA:

 

See notice regarding software product information.

Read More

Accelerated Computing Key to Yale’s Quantum Research

Accelerated Computing Key to Yale’s Quantum Research

A recently released joint research paper by Yale, Moderna and NVIDIA reviews how techniques from quantum machine learning (QML) may enhance drug discovery methods by better predicting molecular properties.

Ultimately, this could lead to the more efficient generation of new pharmaceutical therapies.

The review also emphasizes that the key tool for exploring these methods is GPU-accelerated simulation of quantum algorithms.

The study focuses on how future quantum neural networks can use quantum computing to enhance existing AI techniques.

Applied to the pharmaceutical industry, these advances offer researchers the ability to streamline complex tasks in drug discovery.

Researching how such quantum neural networks impact real-world use cases like drug discovery requires intensive, large-scale simulations of future noiseless quantum processing units (QPUs).

This is just one example of how, as quantum computing scales up, an increasing number of challenges are only approachable with GPU-accelerated supercomputing.

The review article explores how NVIDIA’s CUDA-Q quantum development platform provides a unique tool for running such multi-GPU accelerated simulations of QML workloads.

The study also highlights CUDA-Q’s ability to simulate multiple QPUs in parallel. This is a key ability for studying realistic large-scale devices, which, in this particular study, also allowed for the exploration of quantum machine learning tasks that batch training data.

Many of the QML techniques covered by the review — such as hybrid quantum convolution neural networks — also require CUDA-Q’s ability to write programs interweaving classical and quantum resources.

The increased reliance on GPU supercomputing demonstrated in this work is the latest example of NVIDIA’s growing involvement in developing useful quantum computers.

NVIDIA plans to further highlight its role in the future of quantum computing at the SC24 conference, Nov. 17-22 in Atlanta.

Read More