Unveiling a New Era of Local AI With NVIDIA NIM Microservices and AI Blueprints

Unveiling a New Era of Local AI With NVIDIA NIM Microservices and AI Blueprints

Over the past year, generative AI has transformed the way people live, work and play, enhancing everything from writing and content creation to gaming, learning and productivity. PC enthusiasts and developers are leading the charge in pushing the boundaries of this groundbreaking technology.

Countless times, industry-defining technological breakthroughs have been invented in one place — a garage. This week marks the start of the RTX AI Garage series, which will offer routine content for developers and enthusiasts looking to learn more about NVIDIA NIM microservices and AI Blueprints, and how to build AI agents, creative workflow, digital human, productivity apps and more on AI PCs. Welcome to the RTX AI Garage.

This first installment spotlights announcements made earlier this week at CES, including new AI foundation models available on NVIDIA RTX AI PCs that take digital humans, content creation, productivity and development to the next level.

These models — offered as NVIDIA NIM microservices — are powered by new GeForce RTX 50 Series GPUs. Built on the NVIDIA Blackwell architecture, RTX 50 Series GPUs deliver up to 3,352 trillion AI operations per second of performance, 32GB of VRAM and feature FP4 compute, doubling AI inference performance and enabling generative AI to run locally with a smaller memory footprint.

NVIDIA also introduced NVIDIA AI Blueprints — ready-to-use, preconfigured workflows, built on NIM microservices, for applications like digital humans and content creation.

NIM microservices and AI Blueprints empower enthusiasts and developers to build, iterate and deliver AI-powered experiences to the PC faster than ever. The result is a new wave of compelling, practical capabilities for PC users.

Fast-Track AI With NVIDIA NIM

There are two key challenges to bringing AI advancements to PCs. First, the pace of AI research is breakneck, with new models appearing daily on platforms like Hugging Face, which now hosts over a million models. As a result, breakthroughs quickly become outdated.

Second, adapting these models for PC use is a complex, resource-intensive process. Optimizing them for PC hardware, integrating them with AI software and connecting them to applications requires significant engineering effort.

NVIDIA NIM helps address these challenges by offering prepackaged, state-of-the-art AI models optimized for PCs. These NIM microservices span model domains, can be installed with a single click, feature application programming interfaces (APIs) for easy integration, and harness NVIDIA AI software and RTX GPUs for accelerated performance.

At CES, NVIDIA announced a pipeline of NIM microservices for RTX AI PCs, supporting use cases spanning large language models (LLMs), vision-language models, image generation, speech, retrieval-augmented generation (RAG), PDF extraction and computer vision.

The new Llama Nemotron family of open models provide high accuracy on a wide range of agentic tasks. The Llama Nemotron Nano model, which will be offered as a NIM microservice for RTX AI PCs and workstations, excels at agentic AI tasks like instruction following, function calling, chat, coding and math.

Soon, developers will be able to quickly download and run these microservices on Windows 11 PCs using Windows Subsystem for Linux (WSL).

To demonstrate how enthusiasts and developers can use NIM to build AI agents and assistants, NVIDIA previewed Project R2X, a vision-enabled PC avatar that can put information at a user’s fingertips, assist with desktop apps and video conference calls, read and summarize documents, and more. Sign up for Project R2X updates.

By using NIM microservices, AI enthusiasts can skip the complexities of model curation, optimization and backend integration and focus on creating and innovating with cutting-edge AI models.

What’s in an API?

An API is the way in which an application communicates with a software library. An API defines a set of “calls” that the application can make to the library and what the application can expect in return. Traditional AI APIs require a lot of setup and configuration, making AI capabilities harder to use and hampering innovation.

NIM microservices expose easy-to-use, intuitive APIs that an application can simply send requests to and get a response. In addition, they’re designed around the input and output media for different model types. For example, LLMs take text as input and produce text as output, image generators convert text to image, speech recognizers turn speech to text and so on.

The microservices are designed to integrate seamlessly with leading AI development and agent frameworks such as AI Toolkit for VSCode, AnythingLLM, ComfyUI, Flowise AI, LangChain, Langflow and LM Studio. Developers can easily download and deploy them from build.nvidia.com.

By bringing these APIs to RTX, NVIDIA NIM will accelerate AI innovation on PCs.

Enthusiasts are expected to be able to experience a range of NIM microservices using an upcoming release of the NVIDIA ChatRTX tech demo.

A Blueprint for Innovation

By using state-of-the-art models, prepackaged and optimized for PCs, developers and enthusiasts can quickly create AI-powered projects. Taking things a step further, they can combine multiple AI models and other functionality to build complex applications like digital humans, podcast generators and application assistants.

NVIDIA AI Blueprints, built on NIM microservices, are reference implementations for complex AI workflows. They help developers connect several components, including libraries, software development kits and AI models, together in a single application.

AI Blueprints include everything that a developer needs to build, run, customize and extend the reference workflow, which includes the reference application and source code, sample data, and documentation for customization and orchestration of the different components.

At CES, NVIDIA announced two AI Blueprints for RTX: one for PDF to podcast, which lets users generate a podcast from any PDF, and another for 3D-guided generative AI, which is based on FLUX.1 [dev] and expected be offered as a NIM microservice, offers artists greater control over text-based image generation.

With AI Blueprints, developers can quickly go from AI experimentation to AI development for cutting-edge workflows on RTX PCs and workstations.

Built for Generative AI

The new GeForce RTX 50 Series GPUs are purpose-built to tackle complex generative AI challenges, featuring fifth-generation Tensor Cores with FP4 support, faster G7 memory and an AI-management processor for efficient multitasking between AI and creative workflows.

The GeForce RTX 50 Series adds FP4 support to help bring better performance and more models to PCs. FP4 is a lower quantization method, similar to file compression, that decreases model sizes. Compared with FP16 — the default method that most models feature — FP4 uses less than half of the memory, and 50 Series GPUs provide over 2x performance compared with the previous generation. This can be done with virtually no loss in quality with advanced quantization methods offered by NVIDIA TensorRT Model Optimizer.

For example, Black Forest Labs’ FLUX.1 [dev] model at FP16 requires over 23GB of VRAM, meaning it can only be supported by the GeForce RTX 4090 and professional GPUs. With FP4, FLUX.1 [dev] requires less than 10GB, so it can run locally on more GeForce RTX GPUs.

With a GeForce RTX 4090 with FP16, the FLUX.1 [dev] model can generate images in 15 seconds with 30 steps. With a GeForce RTX 5090 with FP4, images can be generated in just over five seconds.

Get Started With the New AI APIs for PCs

NVIDIA NIM microservices and AI Blueprints are expected to be available starting next month, with initial hardware support for GeForce RTX 50 Series, GeForce RTX 4090 and 4080, and NVIDIA RTX 6000 and 5000 professional GPUs. Additional GPUs will be supported in the future.

NIM-ready RTX AI PCs are expected to be available from Acer, ASUS, Dell, GIGABYTE, HP, Lenovo, MSI, Razer and Samsung, and from local system builders Corsair, Falcon Northwest, LDLC, Maingear, Mifcon, Origin PC, PCS and Scan.

GeForce RTX 50 Series GPUs and laptops deliver game-changing performance, power transformative AI experiences, and enable creators to complete workflows in record time. Rewatch NVIDIA CEO Jensen Huang’s  keynote to learn more about NVIDIA’s AI news unveiled at CES.

See notice regarding software product information.

Read More

Why World Foundation Models Will Be Key to Advancing Physical AI

Why World Foundation Models Will Be Key to Advancing Physical AI

In the fast-evolving landscape of AI, it’s becoming increasingly important to develop models that can accurately simulate and predict outcomes in physical, real-world environments to enable the next generation of physical AI systems.

Ming-Yu Liu, vice president of research at NVIDIA and an IEEE Fellow, joined the NVIDIA AI Podcast to discuss the significance of world foundation models (WFM) — powerful neural networks that can simulate physical environments. WFMs can generate detailed videos from text or image input data and predict how a scene evolves by combining its current state (image or video) with actions (such as prompts or control signals).

“World foundation models are important to physical AI developers,” said Liu. “They can imagine many different environments and can simulate the future, so we can make good decisions based on this simulation.”

This is particularly valuable for physical AI systems, such as robots and self-driving cars, which must interact safely and efficiently with the real world.

Why Are World Foundation Models Important?

Building world models often requires vast amounts of data, which can be difficult and expensive to collect. WFMs can generate synthetic data, providing a rich, varied dataset that enhances the training process.

In addition, training and testing physical AI systems in the real world can be resource-intensive. WFMs provide virtual, 3D environments where developers can simulate and test these systems in a controlled setting without the risks and costs associated with real-world trials.

Open Access to World Foundation Models

At the CES trade show, NVIDIA announced NVIDIA Cosmos, a platform of generative WFMs that accelerate the development of physical AI systems such as robots and self-driving cars.

The platform is designed to be open and accessible, and includes pretrained WFMs based on diffusion and auto-regressive architectures, along with tokenizers that can compress videos into tokens for transformer models.

Liu explained that with these open models, enterprises and developers have all the ingredients they need to build large-scale models. The open platform also provides teams with the flexibility to explore various options for training and fine-tuning models, or build their own based on specific needs.

Enhancing AI Workflows Across Industries

WFMs are expected to enhance AI workflows and development in various industries. Liu sees particularly significant impacts in two areas:

“The self-driving car industry and the humanoid [robot] industry will benefit a lot from world model development,” said Liu. “[WFMs] can simulate different environments that will be difficult to have in the real world, to make sure the agent behaves respectively.”

For self-driving cars, these models can simulate environments that allow for comprehensive testing and optimization. For example, a self-driving car can be tested in various simulated weather conditions and traffic scenarios to help ensure it performs safely and efficiently before deployment on roads.

In robotics, WFMs can simulate and verify the behavior of robotic systems in different environments to make sure they perform tasks safely and efficiently before deployment.

NVIDIA is collaborating with companies like 1X, Huobi and XPENG to help address challenges in physical AI development and advance their systems.

“We are still in the infancy of world foundation model development — it’s useful, but we need to make it more useful,” Liu said. “We also need to study how to best integrate these world models into the physical AI systems in a way that can really benefit them.”

Listen to the podcast with Ming-Yu Liu, or read the transcript.

Learn more about NVIDIA Cosmos and the latest announcements in generative AI and robotics by watching the CES opening keynote by NVIDIA founder and CEO Jensen Huang, as well as joining NVIDIA sessions at the show.

Read More

Why Enterprises Need AI Query Engines to Fuel Agentic AI

Why Enterprises Need AI Query Engines to Fuel Agentic AI

Data is the fuel of AI applications, but the magnitude and scale of enterprise data often make it too expensive and time-consuming to use effectively.

According to IDC’s Global DataSphere1, enterprises will generate 317 zettabytes of data annually by 2028 — including the creation of 29 zettabytes of unique data — of which 78% will be unstructured data and 44% of that will be audio and video. Because of the extremely high volume and various data types, most generative AI applications use a fraction of the total amount of data being stored and generated.

For enterprises to thrive in the AI era, they must find a way to make use of all of their data. This isn’t possible using traditional computing and data processing techniques. Instead, enterprises need an AI query engine.

What Is an AI Query Engine?

Simply, an AI query engine is a system that connects AI applications, or AI agents, to data. It’s a critical component of agentic AI, as it serves as a bridge between an organization’s knowledge base and AI-powered applications, enabling more accurate, context-aware responses.

AI agents form the basis of an AI query engine, where they can gather information and do work to assist human employees. An AI agent will gather information from many data sources, plan, reason and take action. AI agents can communicate with users, or they can work in the background, where human feedback and interaction will always be available.

In practice, an AI query engine is a sophisticated system that efficiently processes large amounts of data, extracts and stores knowledge, and performs semantic search on that knowledge, which can be quickly retrieved and used by AI.

Diagram showing how an AI agent ingests data and uses it for decision-making.
An AI query engine processes, stores and retrieves data — connecting AI agents to insights.

AI Query Engines Unlock Intelligence in Unstructured Data

An enterprise’s AI query engine will have access to knowledge stored in many different formats, but being able to extract intelligence from unstructured data is one of the most significant advancements it enables.

To generate insights, traditional query engines rely on structured queries and data sources, such as relational databases. Users must formulate precise queries using languages like SQL, and results are limited to predefined data formats.

In contrast, AI query engines can process structured, semi-structured and unstructured data. Common unstructured data formats are PDFs, log files, images and video, and are stored on object stores, file servers and parallel file systems. AI agents communicate with users and with each other using natural language. This enables them to interpret user intent, even when it’s ambiguous, by accessing diverse data sources. These agents can deliver results in a conversational format, so that users can interpret results.

This capability makes it possible to derive more insights and intelligence from any type of data — not just data that fits neatly into rows and columns.

For example, companies like DataStax and NetApp are building AI data platforms that enable their customers to have an AI query engine for their next-generation applications.

Key Features of AI Query Engines

AI query engines possess several crucial capabilities:

  • Diverse data handling: AI query engines can access and process various data types, including structured, semi-structured and unstructured data from multiple sources, including text, PDF, image, video and specialty data types.
  • Scalability: AI query engines can efficiently handle petabyte-scale data, making all enterprise knowledge available to AI applications quickly.
  • Accurate retrieval: AI query engines provide high-accuracy, high-performance embedding, vector search and reranking of knowledge from multiple sources.
  • Continuous learning: AI query engines can store and incorporate feedback from AI-powered applications, creating an AI data flywheel in which the feedback is used to refine models and increase the effectiveness of the applications over time.

Retrieval-augmented generation is a component of AI query engines. RAG uses the power of generative AI models to act as a natural language interface to data, allowing models to access and incorporate relevant information from large datasets during the response generation process.

Using RAG, any business or other organization can turn its technical information, policy manuals, videos and other data into useful knowledge bases. An AI query engine can then rely on these sources to support such areas as customer relations, employee training and developer productivity.

Additional information-retrieval techniques and ways to store knowledge are in research and development, so the capabilities of an AI query engine are expected to rapidly evolve.

The Impact of AI Query Engines

Using AI query engines, enterprises can fully harness the power of AI agents to connect their workforces to vast amounts of enterprise knowledge, improve the accuracy and relevance of AI-generated responses, process and utilize previously untapped data sources, and create data-driven AI flywheels that continuously improve their AI applications.

Some examples include an AI virtual assistant that provides personalized, 24/7 customer service experiences, an AI agent for searching and summarizing video, an AI agent for analyzing software vulnerabilities or an AI research assistant.

Bridging the gap between raw data and AI-powered applications, AI query engines will grow to play a crucial role in helping organizations extract value from their data.

NVIDIA Blueprints can help enterprises get started connecting AI to their data. Learn more about NVIDIA Blueprints and try them in the NVIDIA API catalog.

  1.  IDC, Global DataSphere Forecast, 2024.

Read More

CES 2025: AI Advancing at ‘Incredible Pace,’ NVIDIA CEO Says

CES 2025: AI Advancing at ‘Incredible Pace,’ NVIDIA CEO Says

NVIDIA founder and CEO Jensen Huang kicked off CES 2025 with a 90-minute keynote that included new products to advance gaming, autonomous vehicles, robotics, and agentic AI.

AI has been “advancing at an incredible pace,” he said before an audience of more than 6,000 packed into the Michelob Ultra Arena in Las Vegas.

“It started with perception AI — understanding images, words, and sounds. Then generative AI — creating text, images and sound,” Huang said. Now, we’re entering the era of “physical AI, AI that can proceed, reason, plan and act.”

NVIDIA GPUs and platforms are at the heart of this transformation, Huang explained, enabling breakthroughs across industries, including gaming, robotics and autonomous vehicles (AVs).

Huang’s keynote showcased how NVIDIA’s latest innovations are enabling this new era of AI, with several groundbreaking announcements, including:

Huang started off his talk by reflecting on NVIDIA’s three-decade journey. In 1999, NVIDIA invented the programmable GPU. Since then, modern AI has fundamentally changed how computing works, he said. “Every single layer of the technology stack has been transformed, an incredible transformation, in just 12 years.”

Revolutionizing Graphics With GeForce RTX 50 Series

“GeForce enabled AI to reach the masses, and now AI is coming home to GeForce,” Huang said.

With that, he introduced the NVIDIA GeForce RTX 5090 GPU, the most powerful GeForce RTX GPU so far, with 92 billion transistors and delivering 3,352 trillion AI operations per second (TOPS).

“Here it is — our brand-new GeForce RTX 50 series, Blackwell architecture,” Huang said, holding the blacked-out GPU aloft and noting how it’s able to harness advanced AI to enable breakthrough graphics. “The GPU is just a beast.”

“Even the mechanical design is a miracle,” Huang said, noting that the graphics card has two cooling fans.

More variations in the GPU series are coming. The GeForce RTX 5090 and GeForce RTX 5080 desktop GPUs are scheduled to be available Jan. 30. The GeForce RTX 5070 Ti and the GeForce RTX 5070 desktops are slated to be available starting in February. Laptop GPUs are expected in March.

DLSS 4 introduces Multi Frame Generation, working in unison with the complete suite of DLSS technologies to boost performance by up to 8x. NVIDIA also unveiled NVIDIA Reflex 2, which can reduce PC latency by up to 75%.

The latest generation of DLSS can generate three additional frames for every frame we calculate, Huang explained. “As a result, we’re able to render at incredibly high performance, because AI does a lot less computation.”

RTX Neural Shaders use small neural networks to improve textures, materials and lighting in real-time gameplay. RTX Neural Faces and RTX Hair advance real-time face and hair rendering, using generative AI to animate the most realistic digital characters ever. RTX Mega Geometry increases the number of ray-traced triangles by up to 100x, providing more detail.

Advancing Physical AI With Cosmos|

In addition to advancements in graphics, Huang introduced the NVIDIA Cosmos world foundation model platform, describing it as a game-changer for robotics and industrial AI.

The next frontier of AI is physical AI, Huang explained. He likened this moment to the transformative impact of large language models on generative AI.

“The ChatGPT moment for general robotics is just around the corner,” he explained.

Like large language models, world foundation models are fundamental to advancing robot and AV development, yet not all developers have the expertise and resources to train their own, Huang said.

Cosmos integrates generative models, tokenizers, and a video processing pipeline to power physical AI systems like AVs and robots.

Cosmos aims to bring the power of foresight and multiverse simulation to AI models, enabling them to simulate every possible future and select optimal actions.

Cosmos models ingest text, image or video prompts and generate virtual world states as videos, Huang explained. “Cosmos generations prioritize the unique requirements of AV and robotics use cases like real-world environments, lighting and object permanence.”

Leading robotics and automotive companies, including 1X, Agile Robots, Agility, Figure AI, Foretellix, Fourier, Galbot, Hillbot, IntBot, Neura Robotics, Skild AI, Virtual Incision, Waabi and XPENG, along with ridesharing giant Uber, are among the first to adopt Cosmos.

In addition, Hyundai Motor Group is adopting NVIDIA AI and Omniverse to create safer, smarter vehicles, supercharge manufacturing and deploy cutting-edge robotics.

Cosmos is open license and available on GitHub.

Empowering Developers With AI Foundation Models

Beyond robotics and autonomous vehicles, NVIDIA is empowering developers and creators with AI foundation models.

Huang introduced AI foundation models for RTX PCs that supercharge digital humans, content creation, productivity and development.

“These AI models run in every single cloud because NVIDIA GPUs are now available in every single cloud,” Huang said. “It’s available in every single OEM, so you could literally take these models, integrate them into your software packages, create AI agents and deploy them wherever the customers want to run the software.”

These models — offered as NVIDIA NIM microservices — are accelerated by the new GeForce RTX 50 Series GPUs.

The GPUs have what it takes to run these swiftly, adding support for FP4 computing, boosting AI inference by up to 2x and enabling generative AI models to run locally in a smaller memory footprint compared with previous-generation hardware.

Huang explained the potential of new tools for creators: “We’re creating a whole bunch of blueprints that our ecosystem could take advantage of. All of this is completely open source, so you could take it and modify the blueprints.”

Top PC manufacturers and system builders are launching NIM-ready RTX AI PCs with GeForce RTX 50 Series GPUs. “AI PCs are coming to a home near you,” Huang said.

While these tools bring AI capabilities to personal computing, NVIDIA is also advancing AI-driven solutions in the automotive industry, where safety and intelligence are paramount.

Innovations in Autonomous Vehicles

Huang announced the NVIDIA DRIVE Hyperion AV platform, built on the new NVIDIA AGX Thor system-on-a-chip (SoC), designed for generative AI models and delivering advanced functional safety and autonomous driving capabilities.

“The autonomous vehicle revolution is here,” Huang said. “Building autonomous vehicles, like all robots, requires three computers: NVIDIA DGX to train AI models, Omniverse to test drive and generate synthetic data, and DRIVE AGX, a supercomputer in the car.”

DRIVE Hyperion, the first end-to-end AV platform, integrates advanced SoCs, sensors, and safety systems for next-gen vehicles, a sensor suite and an active safety and level 2 driving stack, with adoption by automotive safety pioneers such as Mercedes-Benz, JLR and Volvo Cars.

Huang highlighted the critical role of synthetic data in advancing autonomous vehicles. Real-world data is limited, so synthetic data is essential for training the autonomous vehicle data factory, he explained.

Powered by NVIDIA Omniverse AI models and Cosmos, this approach “generates synthetic driving scenarios that enhance training data by orders of magnitude.”

Using Omniverse and Cosmos, NVIDIA’s AI data factory can scale “hundreds of drives into billions of effective miles,” Huang said, dramatically increasing the datasets needed for safe and advanced autonomous driving.

“We are going to have mountains of training data for autonomous vehicles,” he added.

Toyota, the world’s largest automaker, will build its next-generation vehicles on the NVIDIA DRIVE AGX Orin, running the safety-certified NVIDIA DriveOS operating system, Huang said.

“Just as computer graphics was revolutionized at such an incredible pace, you’re going to see the pace of AV development increasing tremendously over the next several years,” Huang said. These vehicles will offer functionally safe, advanced driving assistance capabilities.

Agentic AI and Digital Manufacturing

NVIDIA and its partners have launched AI Blueprints for agentic AI, including PDF-to-podcast for efficient research and video search and summarization for analyzing large quantities of video and images — enabling developers to build, test and run AI agents anywhere.

AI Blueprints empower developers to deploy custom agents for automating enterprise workflows This new category of partner blueprints integrates NVIDIA AI Enterprise software, including NVIDIA NIM microservices and NVIDIA NeMo, with platforms from leading providers like CrewAI, Daily, LangChain, LlamaIndex and Weights & Biases.

Additionally, Huang announced new Llama Nemotron.

Developers can use NVIDIA NIM microservices to build AI agents for tasks like customer support, fraud detection, and supply chain optimization.

Available as NVIDIA NIM microservices, the models can supercharge AI agents on any accelerated system.

NVIDIA NIM microservices streamline video content management, boosting efficiency and audience engagement in the media industry.

Moving beyond digital applications, NVIDIA’s innovations are paving the way for AI to revolutionize the physical world with robotics.

“All of the enabling technologies that I’ve been talking about are going to make it possible for us in the next several years to see very rapid breakthroughs, surprising breakthroughs, in general robotics.”

In manufacturing, the NVIDIA Isaac GR00T Blueprint for synthetic motion generation will help developers generate exponentially large synthetic motion data to train their humanoids using imitation learning.

Huang emphasized the importance of training robots efficiently, using NVIDIA’s Omniverse to generate millions of synthetic motions for humanoid training.

The Mega blueprint enables large-scale simulation of robot fleets, adopted by leaders like Accenture and KION for warehouse automation.

These AI tools set the stage for NVIDIA’s latest innovation: a personal AI supercomputer called Project DIGITS.

NVIDIA Unveils Project Digits

Putting NVIDIA Grace Blackwell on every desk and at every AI developer’s fingertips, Huang unveiled NVIDIA Project DIGITS.

“I have one more thing that I want to show you,” Huang said. “None of this would be possible if not for this incredible project that we started about a decade ago. Inside the company, it was called Project DIGITS — deep learning GPU intelligence training system.”

Huang highlighted the legacy of NVIDIA’s AI supercomputing journey, telling the story of how in 2016 he delivered the first NVIDIA DGX system to OpenAI. “And obviously, it revolutionized artificial intelligence computing.”

The new Project DIGITS takes this mission further. “Every software engineer, every engineer, every creative artist — everybody who uses computers today as a tool — will need an AI supercomputer,” Huang said.

Huang revealed that Project DIGITS, powered by the GB10 Grace Blackwell Superchip, represents NVIDIA’s smallest yet most powerful AI supercomputer. “This is NVIDIA’s latest AI supercomputer,” Huang said, showcasing the device. “It runs the entire NVIDIA AI stack — all of NVIDIA software runs on this. DGX Cloud runs on this.”

The compact yet powerful Project DIGITS is expected to be available in May.

A Year of Breakthroughs

“It’s been an incredible year,” Huang said as he wrapped up the keynote. Huang highlighted NVIDIA’s major achievements: Blackwell systems, physical AI foundation models, and breakthroughs in agentic AI and robotics

“I want to thank all of you for your partnership,” Huang said.

See notice regarding software product information.

Read More

NVIDIA Unveils ‘Mega’ Omniverse Blueprint for Building Industrial Robot Fleet Digital Twins

NVIDIA Unveils ‘Mega’ Omniverse Blueprint for Building Industrial Robot Fleet Digital Twins

According to Gartner, the worldwide end-user spending on all IT products for 2024 was $5 trillion. This industry is built on a computing fabric of electrons, is fully software-defined, accelerated — and now generative AI-enabled. While huge, it’s a fraction of the larger physical industrial market that relies on the movement of atoms.

Today’s 10 million factories, nearly 200,000 warehouses and 40 million miles of highways form the “computing” fabric of our physical world. But that vast network of production facilities and distribution centers is still laboriously and manually designed, operated and optimized.

In warehousing and distribution, operators face highly complex decision optimization problems — matrices of variables and interdependencies across human workers, robotic and agentic systems and equipment. Unlike the IT industry, the physical industrial market is still waiting for its own software-defined moment.

That moment is coming.

Virtual facility with people, machinery and robots all moving around the facility floor. Digital representations of the pathways and sensor inputs can be visualized with colorful arrays.
Choreographed integration of human workers, robotic and agentic systems and equipment in a facility digital twin. Image courtesy of Accenture, KION Group.

NVIDIA today at CES announced “Mega,” an Omniverse Blueprint for developing, testing and optimizing physical AI and robot fleets at scale in a digital twin before deployment into real-world facilities.

Advanced warehouses and factories use fleets of hundreds of autonomous mobile robots, robotic arm manipulators and humanoids working alongside people. With implementations of increasingly complex systems of sensor and robot autonomy, it requires coordinated training in simulation to optimize operations, help ensure safety and avoid disruptions.

Mega offers enterprises a reference architecture of NVIDIA accelerated computing, AI, NVIDIA Isaac and NVIDIA Omniverse technologies to develop and test digital twins for testing AI-powered robot brains that drive robots, video analytics AI agents, equipment and more for handling enormous complexity and scale. The new framework brings software-defined capabilities to physical facilities, enabling continuous development, testing, optimization and deployment.

Developing AI Brains With World Simulator for Autonomous Orchestration

With Mega-driven digital twins, including a world simulator that coordinates all robot activities and sensor data, enterprises can continuously update facility robot brains for intelligent routes and tasks for operational efficiencies.

The blueprint uses Omniverse Cloud Sensor RTX APIs that enable robotics developers to render sensor data from any type of intelligent machine in the factory, simultaneously, for high-fidelity large-scale sensor simulation. This allows robots to be tested in an infinite number of scenarios within the digital twin, using synthetic data in a software-in–the-loop pipeline with NVIDIA Isaac ROS.

Digital facility with workers and robots moving around the floor. Images on either side of this view are tapped into various sensors mounted on the virtual robots moving around the facility.
Operational efficiency is gained with sensor simulation. Image courtesy of Accenture, KION Group.

Supply chain solutions company KION Group is collaborating with Accenture and NVIDIA as the first to adopt Mega for optimizing operations in retail, consumer packaged goods, parcel services and more.

Jensen Huang, founder and CEO of NVIDIA, offered a glimpse into the future of this collaboration on stage at CES, demonstrating how enterprises can navigate a complex web of decisions using the Mega Omniverse Blueprint.

“At KION, we leverage AI-driven solutions as an integral part of our strategy to optimize our customers’ supply chains and increase their productivity,” said Rob Smith, CEO of KION GROUP AG. “With NVIDIA’s AI leadership and Accenture’s expertise in digital technologies, we are reinventing warehouse automation. Bringing these strong partners together, we are creating a vision for future warehouses that are part of a smart agile system, evolve with the world around them and can handle nearly any supply chain challenge.”

Creating Operational Efficiencies With Mega Omniverse Blueprint

Creating operational efficiencies, KION and Accenture are embracing the Mega Omniverse Blueprint to build next-generation supply chains for KION and its customers. KION can capture and digitalize a warehouse digital twin in Omniverse by using computer-aided design files, video, lidar, image and AI-generated data.

KION uses the Omniverse digital twin as a virtual training and testing environment for its industrial AI’s robot brains, powered by NVIDIA Isaac, tapping into smart cameras, forklifts, robotic equipment and digital humans. Integrating the Omniverse digital twin, KION’s warehouse management software can create and assign missions for robot brains, like moving a load from one place to another.

Digital facility with workers and robots moving around the floor. Dashboard metrics are placed over the viewport of the digital twin, which showcase various throughput and productivity metrics related to the scene.
Graphical data is easily introduced into the Omniverse viewport showcasing productivity and throughput among other desired metrics. Image courtesy of Accenture, KION Group.

These simulated robots can carry out tasks by perceiving and reasoning in environments, and they’re capable of planning next motions and then taking actions that are simulated in the digital twin. The robot brains perceive the results deciding the next action, and this cycle continues with Mega precisely tracking the state and position of all the assets in the digital twin.

Delivering Services With Mega for Facilities Everywhere

Accenture, global leader in professional services, is adopting Mega as part of its AI Refinery for Simulation and Robotics, built on NVIDIA AI and Omniverse, to help organizations use AI simulation to reinvent factory and warehouse design and ongoing operations.

With the blueprint, Accenture is delivering new services — including Custom Robotics and Manufacturing Foundation Model Training and Finetuning; Intelligent Humanoid Robotics; and AI-Powered Industrial Manufacturing and Logistics Simulation and Optimization — to expand the power of physical AI and  simulation to the world’s factories and warehouse operators.  Now, for example, an organization can explore numerous options for their warehouse before choosing and implementing the best one.

“As organizations enter the age of industrial AI, we are helping them use AI-powered simulation and autonomous robots to reinvent the process of designing new facilities and optimizing existing operations,” said Julie Sweet, chair and CEO of Accenture. “Our collaboration with NVIDIA and KION will help our clients plan their operations in digital twins, where they can run hundreds of options and quickly select the best for current or changing market conditions, such as seasonal market demand or workforce availability.  This represents a new frontier of value for our clients to achieve using technology, data and AI.”

Join NVIDIA at CES

See notice regarding software product information.

Read More

Building Smarter Autonomous Machines: NVIDIA Announces Early Access for Omniverse Sensor RTX

Building Smarter Autonomous Machines: NVIDIA Announces Early Access for Omniverse Sensor RTX

Generative AI and foundation models let autonomous machines generalize beyond the operational design domains on which they’ve been trained. Using new AI techniques such as tokenization and large language and diffusion models, developers and researchers can now address longstanding hurdles to autonomy.

These larger models require massive amounts of diverse data for training, fine-tuning and validation. But collecting such data — including from rare edge cases and potentially hazardous scenarios, like a pedestrian crossing in front of an autonomous vehicle (AV) at night or a human entering a welding robot work cell — can be incredibly difficult and resource-intensive.

To help developers fill this gap, NVIDIA Omniverse Cloud Sensor RTX APIs enable physically accurate sensor simulation for generating datasets at scale. The application programming interfaces (APIs) are designed to support sensors commonly used for autonomy — including cameras, radar and lidar — and can integrate seamlessly into existing workflows to accelerate the development of autonomous vehicles and robots of every kind.

Omniverse Sensor RTX APIs are now available to select developers in early access. Organizations such as Accenture, Foretellix, MITRE and Mcity are integrating these APIs via domain-specific blueprints to provide end customers with the tools they need to deploy the next generation of industrial manufacturing robots and self-driving cars.

Powering Industrial AI With Omniverse Blueprints

In complex environments like factories and warehouses, robots must be orchestrated to safely and efficiently work alongside machinery and human workers. All those moving parts present a massive challenge when designing, testing or validating operations while avoiding disruptions.

Mega is an Omniverse Blueprint that offers enterprises a reference architecture of NVIDIA accelerated computing, AI, NVIDIA Isaac and NVIDIA Omniverse technologies. Enterprises can use it to develop digital twins and test AI-powered robot brains that drive robots, cameras, equipment and more to handle enormous complexity and scale.

Integrating Omniverse Sensor RTX, the blueprint lets robotics developers simultaneously render sensor data from any type of intelligent machine in a factory for high-fidelity, large-scale sensor simulation.

With the ability to test operations and workflows in simulation, manufacturers can save considerable time and investment, and improve efficiency in entirely new ways.

International supply chain solutions company KION Group and Accenture are using the Mega blueprint to build Omniverse digital twins that serve as virtual training and testing environments for industrial AI’s robot brains, tapping into data from smart cameras, forklifts, robotic equipment and digital humans.

The robot brains perceive the simulated environment with physically accurate sensor data rendered by the Omniverse Sensor RTX APIs. They use this data to plan and act, with each action precisely tracked with Mega, alongside the state and position of all the assets in the digital twin. With these capabilities, developers can continuously build and test new layouts before they’re implemented in the physical world.

Driving AV Development and Validation

Autonomous vehicles have been under development for over a decade, but barriers in acquiring the right training and validation data and slow iteration cycles have hindered large-scale deployment.

To address this need for sensor data, companies are harnessing the NVIDIA Omniverse Blueprint for AV simulation, a reference workflow that enables physically accurate sensor simulation. The workflow uses Omniverse Sensor RTX APIs to render the camera, radar and lidar data necessary for AV development and validation.

AV toolchain provider Foretellix has integrated the blueprint into its Foretify AV development toolchain to transform object-level simulation into physically accurate sensor simulation.

The Foretify toolchain can generate any number of testing scenarios simultaneously. By adding sensor simulation capabilities to these scenarios, Foretify can now enable  developers to evaluate the completeness of their AV development, as well as train and test at the levels of fidelity and scale needed to achieve large-scale and safe deployment. In addition, Foretellix will use the newly announced NVIDIA Cosmos platform to generate an even greater diversity of scenarios for verification and validation.

Nuro, an autonomous driving technology provider with one of the largest level 4 deployments in the U.S., is using the Foretify toolchain to train, test and validate its self-driving vehicles before deployment.

In addition, research organization MITRE is collaborating with the University of Michigan’s Mcity testing facility to build a digital AV validation framework for regulatory use, including a digital twin of Mcity’s 32-acre proving ground for autonomous vehicles. The project uses the AV simulation blueprint to render physically accurate sensor data at scale in the virtual environment, boosting training effectiveness.

The future of robotics and autonomy is coming into sharp focus, thanks to the power of high-fidelity sensor simulation. Learn more about these solutions at CES by visiting Accenture at Ballroom F at the Venetian and Foretellix booth 4016 in the West Hall of Las Vegas Convention Center.

Learn more about the latest in automotive and generative AI technologies by joining NVIDIA at CES.

See notice regarding software product information.

Read More

Now See This: NVIDIA Launches Blueprint for AI Agents That Can Analyze Video

Now See This: NVIDIA Launches Blueprint for AI Agents That Can Analyze Video

The next big moment in AI is in sight — literally.

Today, more than 1.5 billion enterprise level cameras deployed worldwide are generating roughly 7 trillion hours of video per year. Yet, only a fraction of it gets analyzed.

It’s estimated that less than 1% of video from industrial cameras is watched live by humans, meaning critical operational incidents can go largely unnoticed.

This comes at a high cost. For example, manufacturers are losing trillions of dollars annually to poor product quality or defects that they could’ve spotted earlier, or even predicted, by using AI agents that can perceive, analyze and help humans take action.

Interactive AI agents with built-in visual perception capabilities can serve as always-on video analysts, helping factories run more efficiently, bolster worker safety, keep traffic running smoothly and even up an athlete’s game.

To accelerate the creation of such agents, NVIDIA today announced early access to a new version of the NVIDIA AI Blueprint for video search and summarization. Built on top of the NVIDIA Metropolis platform — and now supercharged by NVIDIA Cosmos Nemotron vision language models (VLMs), NVIDIA Llama Nemotron large language models (LLMs) and NVIDIA NeMo Retriever — the blueprint provides developers with the tools to build and deploy AI agents that can analyze large quantities of video and image content.

The blueprint integrates the NVIDIA AI Enterprise software platform — which includes NVIDIA NIM microservices for VLMs, LLMs and advanced AI frameworks for retrieval-augmented generation — to enable batch video processing that’s 30x faster than watching it in real time.

The blueprint contains several agentic AI features — such as chain-of-thought reasoning, task planning and tool calling — that can help developers streamline the creation of powerful and diverse visual agents to solve a range of problems.

AI agents with video analysis abilities can be combined with other agents with different skill sets to enable even more sophisticated agentic AI services. Enterprises have the flexibility to build and deploy their AI agents from the edge to the cloud.

How Video Analyst AI Agents Can Help Industrial Businesses 

AI agents with visual perception and analysis skills can be fine-tuned to help businesses with industrial operations by:

  • Increasing productivity and reducing waste: Agents can help ensure standard operating procedures are followed during complex industrial processes like product assembly. They can also be fine-tuned to carefully watch and understand nuanced actions, and the sequence in which they’re implemented.
  • Boosting asset management efficiency through better space utilization: Agents can help optimize inventory storage in warehouses by performing 3D volume estimation and centralizing understanding across various camera streams.
  • Improving safety through auto-generation of incident reports and summaries: Agents can process huge volumes of video and summarize it into contextually informative reports of accidents. They can also help ensure personal protective equipment compliance in factories, improving worker safety in industrial settings.
  • Preventing accidents and production problems: AI agents can identify atypical activity to quickly mitigate operational and safety risks, whether in a warehouse, factory or airport, or at a traffic intersection or other municipal setting.
  • Learning from the past: Agents can search through operations video archives, find relevant information from the past and use it to solve problems or create new processes.

Video Analysts for Sports, Entertainment and More

Another industry where video analysis AI agents stand to make a mark is sports — a $500 billion market worldwide, with hundreds of billions in projected growth over the next several years.

Coaches, teams and leagues — whether professional or amateur — rely on video analytics to evaluate and enhance player performance, prioritize safety and boost fan engagement through player analytics platforms and data visualization. With visually perceptive AI agents, athletes now have unprecedented access to deeper insights and opportunities for improvement.

During his CES opening keynote, NVIDIA founder and CEO Jensen Huang demonstrated an AI video analytics agent that assessed the fastball pitching skills of an amateur baseball player compared with a professional’s. Using video captured from the ceremonial first pitch that Huang threw for the San Francisco Giants baseball team, the video analytics AI agent was able to suggest areas for improvement.

The $3 trillion media and entertainment industry is also poised to benefit from video analyst AI agents. Through the NVIDIA Media2 initiative, these agents will help drive the creation of smarter, more tailored and more impactful content that can adapt to individual viewer preferences.

Worldwide Adoption and Availability 

Partners from around the world are integrating the blueprint for building AI agents for video analysis into their own developer workflows, including Accenture, Centific, Deloitte, EY, Infosys, Linker Vision, Pegatron, TATA Consultancy Services (TCS), Telit Cinterion and VAST.

Apply for early access to the NVIDIA Blueprint for video search and summarization.

See notice regarding software product information.

Editor’s note: Omdia is the source for 1.5 billion enterprise-level cameras deployed.   

Read More

New GeForce RTX 50 Series GPUs Double Creative Performance in 3D, Video and Generative AI

New GeForce RTX 50 Series GPUs Double Creative Performance in 3D, Video and Generative AI

GeForce RTX 50 Series Desktop and Laptop GPUs, unveiled today at the CES trade show, are poised to power the next era of generative and agentic AI content creation — offering new tools and capabilities for video, livestreaming, 3D and more.

Built on the NVIDIA Blackwell architecture, GeForce RTX 50 Series GPUs can run creative generative AI models up to 2x faster in a smaller memory footprint, compared with the previous generation. They feature ninth-generation NVIDIA encoders for advanced video editing and livestreaming, and come with NVIDIA DLSS 4 and up to 32GB of VRAM  to tackle massive 3D projects.

These GPUs come with various software updates, including two new AI-powered NVIDIA Broadcast effects, updates to RTX Video and RTX Remix, and NVIDIA NIM microservices — prepackaged and optimized models built to jumpstart AI content creation workflows on RTX AI PCs.

Built for the Generative AI Era

Generative AI can create sensational results for creators, but with models growing in both complexity and scale, generative AI can be difficult to run even on the latest hardware.

The GeForce RTX 50 Series adds FP4 support to help address this issue. FP4 is a lower quantization method, similar to file compression, that decreases model sizes. Compared with FP16 — the default method that most models feature — FP4 uses less than half of the memory and 50 Series GPUs provide over 2x performance compared to the previous generation. This can be done with virtually no loss in quality with advanced quantization methods offered by NVIDIA TensorRT Model Optimizer.

For example, Black Forest Labs’ FLUX.1 [dev] model at FP16 requires over 23GB of VRAM, meaning it can only be supported by the GeForce RTX 4090 and professional GPUs. With FP4, FLUX.1 [dev] requires less than 10GB, so it can run locally on more GeForce RTX GPUs.

With a GeForce RTX 4090 with FP16, the FLUX.1 [dev] model can generate images in 15 seconds with 30 steps. With a GeForce RTX 5090 with FP4, images can be generated in just over five seconds.

A new NVIDIA AI Blueprint for 3D-guided generative AI based on FLUX.1 [dev], which will be offered as an NVIDIA NIM microservice, offers artists greater control over text-based image generation. With this blueprint, creators can use simple 3D objects — created by hand or generated with AI — and lay them out in a 3D renderer like Blender to guide AI image generation.

A prepackaged workflow powered by the FLUX NIM microservice and ComfyUI can then generate high-quality images that match the 3D scene’s composition.

The NVIDIA Blueprint for 3D-guided generative AI is expected to be available through GitHub using a one-click installer in February.

Stability AI announced that its Stable Point Aware 3D, or SPAR3D, model will be available this month on RTX AI PCs. Thanks to RTX acceleration, the new model from Stability AI will help transform 3D design, delivering exceptional control over 3D content creation by enabling real-time editing and the ability to generate an object in less than a second from a single image.

Professional-Grade Video for All

GeForce RTX 50 Series GPUs deliver a generational leap in NVIDIA encoders and decoders with support for the 4:2:2 pro-grade color format, multiview-HEVC (MV-HEVC) for 3D and virtual reality (VR) video, and the new AV1 Ultra High Quality mode.

Most consumer cameras are confined to 4:2:0 color compression, which reduces the amount of color information. 4:2:0 is typically sufficient for video playback on browsers, but it can’t provide the color depth needed for advanced video editors to color grade videos. The 4:2:2 format provides double the color information with just a 1.3x increase in RAW file size — offering an ideal balance for video editing workflows.

Decoding 4:2:2 video can be challenging due to the increased file sizes. GeForce RTX 50 Series GPUs include 4:2:2 hardware support that can decode up to eight times the 4K 60 frames per second (fps) video sources per decoder, enabling smooth multi-camera video editing.

The GeForce RTX 5090 GPU is equipped with three encoders and two decoders, the GeForce RTX 5080 GPU includes two encoders and two decoders, the 5070 Ti GPUs has two encoders with a single decoder, and the GeForce RTX 5070 GPU includes a single encoder and decoder. These multi-encoder and decoder setups, paired with faster GPUs, enable the GeForce RTX 5090 to export video 60% faster than the GeForce RTX 4090 and at 4x speed compared with the GeForce RTX 3090.

GeForce RTX 50 Series GPUs also feature the ninth-generation NVIDIA video encoder, NVENC, that offers a 5% improvement in video quality on HEVC and AV1 encoding (BD-BR), as well as a new AV1 Ultra Quality mode that achieves 5% more compression at the same quality. They also include the sixth-generation NVIDIA decoder, with 2x the decode speed for H.264 video.

NVIDIA is collaborating with Adobe Premiere Pro, Blackmagic Design’s DaVinci Resolve, Capcut and Wondershare Filmora to integrate these technologies, starting in February.

3D video is starting to catch on thanks to the growth of VR, AR and mixed reality headsets. The new RTX 50 Series GPUs also come with support for MV-HEVC codecs to unlock such formats in the near future.

Livestreaming Enhanced

Livestreaming is a juggling act, where the streamer has to entertain the audience, produce a show and play a video game — all at the same time. Top streamers can afford to hire producers and moderators to share the workload, but most have to manage these responsibilities on their own and often in long shifts — until now.

Streamlabs, a Logitech brand and leading provider of broadcasting software and tools for content creators, is collaborating with NVIDIA and Inworld AI to create the Streamlabs Intelligent Streaming Assistant.

Streamlabs Intelligent Streaming Assistant is an AI agent that can act as a sidekick, producer and technical support. The sidekick that can join streams as a 3D avatar to answer questions, comment on gameplay or chats, or help initiate conversations during quiet periods. It can help produce streams, switching to the most relevant scenes and playing audio and video cues during interesting gameplay moments. It can even serve as an IT assistant that helps configure streams and troubleshoot issues.

Streamlabs Intelligent Streaming Assistant is powered by NVIDIA ACE technologies for creating digital humans and Inworld AI, an AI framework for agentic AI experiences. The assistant will be available later this year.

Millions have used the NVIDIA Broadcast app to turn offices and dorm rooms into home studios using AI-powered features that improve audio and video quality — without needing expensive, specialized equipment.

Two new AI-powered beta effects are being added to the NVIDIA Broadcast app.

The first, Studio Voice, enhances the sound of a user’s microphone to match that of a high-quality microphone. The other, Virtual Key Light, can relight a subject’s face to deliver even coverage as if it were well-lit by two lights.

Because they harness demanding AI models, these beta features are recommended for video conferencing or non-gaming livestreams using a GeForce RTX 5080 GPU or higher. NVIDIA is working to expand these features to more GeForce RTX GPUs in future updates.

The NVIDIA Broadcast upgrade also includes an updated user interface that allows users to apply more effects simultaneously, as well as improvements to the background noise removal, virtual background and eye contact effects.

The updated NVIDIA Broadcast app will be available in February.

Livestreamers can also benefit from NVENC — 5% BD-BR video quality improvement for HEVC and AV1 — in the latest beta of Twitch’s Enhanced Broadcast feature in OBS, and the improved AV1 encoder for streaming in Discord or YouTube.

RTX Video — an AI feature that enhances video playback on popular internet browsers like Google Chrome and Microsoft Edge, and locally with Video Super Resolution and HDR — is getting an update to decrease GPU usage by 30%, expanding the lineup of GeForce RTX GPUs that can run Video Super Resolution with higher quality.

The RTX Video update is slated for a future NVIDIA App release.

Unprecedented 3D Render Performance

The GeForce RTX 5090 GPU offers 32GB of GPU memory — the largest of any GeForce RTX GPU ever, marking a 33% increase over the GeForce RTX 4090 GPU. This lets 3D artists build larger, richer worlds while using multiple applications simultaneously. Plus, new RTX 50 Series fourth-generation RT Cores can run 3D applications 40% faster.

DLSS 4 debuts Multi Frame Generation to boost frame rates by using AI to generate up to three frames per rendered frame. This enables animators to smoothly navigate a scene with 4x as many frames, or render 3D content at 60 fps or more.

D5 Render and Chaos Vantage, two popular professional-grade 3D apps for architects and designers, will add support for DLSS 4 in February.

3D artists have adopted generative AI to boost productivity in generating draft 3D meshes, HDRi maps or even animations to prototype a scene. At CES, Stability AI announced SPAR3D, its new 3D model that can generate 3D meshes from images in seconds with RTX acceleration.

NVIDIA RTX Remix — a modding platform that lets modders capture game assets, automatically enhance materials with generative AI tools and create stunning RTX remasters with full ray tracing — supports DLSS 4, increasing graphical fidelity and frame rates to maximize realism and immersion during gameplay.

RTX Remix soon plans to support Neural Radiance Cache, a neural shader that uses AI to train on live game data and estimate per-pixel accurate indirect lighting. RTX Remix creators can also expect access to RTX Skin in their mods, the first ray-traced sub-surface scattering implementation in games. With RTX Skin, RTX Remix mods expect to feature characters with new levels of realism, as light will reflect and propagate through their skin, grounding them in the worlds they inhabit.

GeForce RTX 5090 and 5080 GPUs will be available for purchase starting Jan. 30 — followed by GeForce RTX 5070 Ti and 5070 GPUs in February and RTX 50 Series laptops in March.

All systems equipped with GeForce RTX GPUs include the NVIDIA Studio platform optimizations, with over 130 GPU-accelerated content creation apps, as well as NVIDIA Studio Drivers, tested extensively and released monthly to enhance performance and maximize stability in popular creative applications.

Stay tuned for more updates on the GeForce RTX 50 Series. Learn more about how the GeForce RTX 50 Series supercharges gaming, and check out all of NVIDIA’s announcements at CES

Every month brings new creative app updates and optimizations powered by the NVIDIA Studio 

Follow NVIDIA Studio on Instagram, X and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter

See notice regarding software product information.

Read More

NVIDIA Announces Nemotron Model Families to Advance Agentic AI

NVIDIA Announces Nemotron Model Families to Advance Agentic AI

Artificial intelligence is entering a new era — agentic AI — where teams of specialized agents can help people solve complex problems and automate repetitive tasks.

With custom AI agents, enterprises across industries can manufacture intelligence and achieve unprecedented productivity. These advanced AI agents require a system of multiple generative AI models optimized for agentic AI functions and capabilities. This complexity means that the need for powerful, efficient, enterprise-grade models has never been greater.

To provide a foundation for enterprise agentic AI, NVIDIA today announced the Llama Nemotron family of open large language models (LLMs). Built with Llama, the models can help developers create and deploy AI agents across a range of applications — including customer support, fraud detection, and product supply chain and inventory management optimization.

To be effective, many AI agents need both language skills and the ability to perceive the world and respond with the appropriate action.

With new NVIDIA Cosmos Nemotron vision language models (VLMs) and NVIDIA NIM microservices for video search and summarization, developers can build agents that analyze and respond to images and video from autonomous machines, hospitals, stores and warehouses, as well as sports events, movies and news. For developers seeking to generate physics-aware videos for robotics and autonomous vehicles, NVIDIA today separately announced NVIDIA Cosmos world foundation models.

Open Llama Nemotron Models Optimize Compute Efficiency, Accuracy for AI Agents

Built with Llama foundation models — one of the most popular commercially viable open-source model collections, downloaded over 650 million times — NVIDIA Llama Nemotron models provide optimized building blocks for AI agent development. This builds on NVIDIA’s commitment to developing state-of-the-art models, such as Llama 3.1 Nemotron 70B, now available through the NVIDIA API catalog.

Llama Nemotron models are pruned and trained with NVIDIA’s latest techniques and high-quality datasets for enhanced agentic capabilities. They excel at instruction following, chat, function calling, coding and math, while being size-optimized to run on a broad range of NVIDIA accelerated computing resources.

“Agentic AI is the next frontier of AI development, and delivering on this opportunity requires full-stack optimization across a system of LLMs to deliver efficient, accurate AI agents,” said Ahmad Al-Dahle, vice president and head of GenAI at Meta. “Through our collaboration with NVIDIA and our shared commitment to open models, the NVIDIA Llama Nemotron family built on Llama can help enterprises quickly create their own custom AI agents.”

Leading AI agent platform providers including SAP and ServiceNow are expected to be among the first to use the new Llama Nemotron models.

“AI agents that collaborate to solve complex tasks across multiple lines of the business will unlock a whole new level of enterprise productivity beyond today’s generative AI scenarios,” said Philipp Herzig, chief AI officer at SAP. “Through SAP’s Joule, hundreds of millions of enterprise users will interact with these agents to accomplish their goals faster than ever before. NVIDIA’s new open Llama Nemotron model family will foster the development of multiple specialized AI agents to transform business processes.”

“AI agents make it possible for organizations to achieve more with less effort, setting new standards for business transformation,” said Jeremy Barnes, vice president of platform AI at ServiceNow. “The improved performance and accuracy of NVIDIA’s open Llama Nemotron models can help build advanced AI agent services that solve complex problems across functions, in any industry.”

The NVIDIA Llama Nemotron models use NVIDIA NeMo for distilling, pruning and alignment. Using these techniques, the models are small enough to run on a variety of computing platforms while providing high accuracy as well as increased model throughput.

The Llama Nemotron model family will be available as downloadable models and as NVIDIA NIM microservices that can be easily deployed on clouds, data centers, PCs and workstations. They offer enterprises industry-leading performance with reliable, secure and seamless integration into their agentic AI application workflows.

Customize and Connect to Business Knowledge With NVIDIA NeMo

The Llama Nemotron and Cosmos Nemotron model families are coming in Nano, Super and Ultra sizes to provide options for deploying AI agents at every scale.

  • Nano: The most cost-effective model optimized for real-time applications with low latency, ideal for deployment on PCs and edge devices.
  • Super: A high-accuracy model offering exceptional throughput on a single GPU.
  • Ultra: The highest-accuracy model, designed for data-center-scale applications demanding the highest performance.

Enterprises can also customize the models for their specific use cases and domains with NVIDIA NeMo microservices to simplify data curation, accelerate model customization and evaluation, and apply guardrails to keep responses on track.

With NVIDIA NeMo Retriever, developers can also integrate retrieval-augmented generation capabilities to connect models to their enterprise data.

And using NVIDIA Blueprints for agentic AI, enterprises can quickly create their own applications using NVIDIA’s advanced AI tools and end-to-end development expertise. In fact, NVIDIA Cosmos Nemotron, NVIDIA Llama Nemotron and NeMo Retriever supercharge the new NVIDIA Blueprint for video search and summarization, announced separately today.

NeMo, NeMo Retriever and NVIDIA Blueprints are all available with the NVIDIA AI Enterprise software platform.

Availability

Llama Nemotron and Cosmos Nemotron models will be available soon as hosted application programming interfaces and for download on build.nvidia.com and Hugging Face. Access for development, testing and research is free for members of the NVIDIA Developer Program.

Enterprises can run Llama Nemotron and Cosmos Nemotron NIM microservices in production with the NVIDIA AI Enterprise software platform on accelerated data center and cloud infrastructure.

Sign up to get notified about Llama Nemotron and Cosmos Nemotron models, and join NVIDIA at CES.

See notice regarding software product information.

Read More

NVIDIA Enhances Three Computer Solution for Autonomous Mobility With Cosmos World Foundation Models

NVIDIA Enhances Three Computer Solution for Autonomous Mobility With Cosmos World Foundation Models

Autonomous vehicle (AV) development is made possible by three distinct computers: NVIDIA DGX systems for training the AI-based stack in the data center, NVIDIA Omniverse running on NVIDIA OVX systems for simulation and synthetic data generation, and the NVIDIA AGX in-vehicle computer to process real-time sensor data for safety.

Together, these purpose-built, full-stack systems enable continuous development cycles, speeding improvements in performance and safety.

At the CES trade show, NVIDIA today announced a new part of the equation: NVIDIA Cosmos, a platform comprising state-of-the-art generative world foundation models (WFMs), advanced tokenizers, guardrails and an accelerated video processing pipeline built to advance the development of physical AI systems such as AVs and robots.

With Cosmos added to the three-computer solution, developers gain a data flywheel that can turn thousands of human-driven miles into billions of virtually driven miles — amplifying training data quality.

“The AV data factory flywheel consists of fleet data collection, accurate 4D reconstruction and AI to generate scenes and traffic variations for training and closed-loop evaluation,” said Sanja Fidler, vice president of AI research at NVIDIA. “Using the NVIDIA Omniverse platform, as well as Cosmos and supporting AI models, developers can generate synthetic driving scenarios to amplify training data by orders of magnitude.”

“Developing physical AI models has traditionally been resource-intensive and costly for developers, requiring acquisition of real-world datasets and filtering, curating and preparing data for training,” said Norm Marks, vice president of automotive at NVIDIA. “Cosmos accelerates this process with generative AI, enabling smarter, faster and more precise AI model development for autonomous vehicles and robotics.”

Transportation leaders are using Cosmos to build physical AI for AVs, including:

  • Waabi, a company pioneering generative AI for the physical world, will use Cosmos for the search and curation of video data for AV software development and simulation.
  • Wayve, which is developing AI foundation models for autonomous driving, is evaluating Cosmos as a tool to search for edge and corner case driving scenarios used for safety and validation.
  • AV toolchain provider Foretellix will use Cosmos, alongside NVIDIA Omniverse Sensor RTX APIs, to evaluate and generate high-fidelity testing scenarios and training data at scale.
  • In addition, ridesharing giant Uber is partnering with NVIDIA to accelerate autonomous mobility. Rich driving datasets from Uber, combined with the features of the Cosmos platform and NVIDIA DGX Cloud, will help AV partners build stronger AI models even more efficiently.

Availability

Cosmos WFMs are now available under an open model license on Hugging Face and the NVIDIA NGC catalog. Cosmos models will soon be available as fully optimized NVIDIA NIM microservices.

Get started with Cosmos and join NVIDIA at CES.

See notice regarding software product information.

Read More