Research Galore From 2024: Recapping AI Advancements in 3D Simulation, Climate Science and Audio Engineering

Research Galore From 2024: Recapping AI Advancements in 3D Simulation, Climate Science and Audio Engineering

The pace of technology innovation has accelerated in the past year, most dramatically in AI. And in 2024, there was no better place to be a part of creating those breakthroughs than NVIDIA Research.

NVIDIA Research is comprised of hundreds of extremely bright people pushing the frontiers of knowledge, not just in AI, but across many areas of technology.

In the past year, NVIDIA Research laid the groundwork for future improvements in GPU performance with major research discoveries in circuits, memory architecture and sparse arithmetic. The team’s invention of novel graphics techniques continues to raise the bar for real-time rendering. And we developed new methods for improving the efficiency of AI — requiring less energy, taking fewer GPU cycles and delivering even better results.

But the most exciting developments of the year have been in generative AI.

We’re now able to generate, not just images and text, but 3D models, music and sounds. We’re also developing better control over what is generated: to generate realistic humanoid motion and to generate sequences of images with consistent subjects.

The application of generative AI to science has resulted in high-resolution weather forecasts that are more accurate than conventional numerical weather models. AI models have given us the ability to accurately predict how blood glucose levels respond to different foods. Embodied generative AI is being used to develop autonomous vehicles and robots.

And that was just this year. What follows is a deeper dive into some of NVIDIA Research’s greatest generative AI work in 2024. Of course, we continue to develop new models and methods for AI, and expect even more exciting results next year.

ConsiStory: AI-Generated Images With Main Character Energy

ConsiStory, a collaboration between researchers at NVIDIA and Tel Aviv University, makes it easier to generate multiple images with a consistent main character — an essential capability for storytelling use cases such as illustrating a comic strip or developing a storyboard.

The researchers’ approach introduced a technique called subject-driven shared attention, which reduces the time it takes to generate consistent imagery from 13 minutes to around 30 seconds.

Read the ConsiStory paper.

Panels of multiple AI-generated images featuring the same character
ConsiStory is capable of generating a series of images featuring the same character.

Edify 3D: Generative AI Enters a New Dimension

NVIDIA Edify 3D is a foundation model that enables developers and content creators to quickly generate 3D objects that can be used to prototype ideas and populate virtual worlds.

Edify 3D helps creators quickly ideate, lay out and conceptualize immersive environments with AI-generated assets. Novice and experienced content creators can use text and image prompts to harness the model, which is now part of the NVIDIA Edify multimodal architecture for developing visual generative AI.

Read the Edify 3D paper and watch the video on YouTube.

Fugatto: Flexible AI Sound Machine for Music, Voices and More

A team of NVIDIA researchers recently unveiled Fugatto, a foundational generative AI model that can create or transform any mix of music, voices and sounds based on text or audio prompts.

The model can, for example, create music snippets based on text prompts, add or remove instruments from existing songs, modify the accent or emotion in a voice recording, or generate completely novel sounds. It could be used by music producers, ad agencies, video game developers or creators of language learning tools.

Read the Fugatto paper.

GluFormer: AI Predicts Blood Sugar Levels Four Years Out

Researchers from the Weizmann Institute of Science, Tel Aviv-based startup Pheno.AI and NVIDIA led the development of GluFormer, an AI model that can predict an individual’s future glucose levels and other health metrics based on past glucose monitoring data.

The researchers showed that, after adding dietary intake data into the model, GluFormer can also predict how a person’s glucose levels will respond to specific foods and dietary changes, enabling precision nutrition. The research team validated GluFormer across 15 other datasets and found it generalizes well to predict health outcomes for other groups, including those with prediabetes, type 1 and type 2 diabetes, gestational diabetes and obesity.

Read the GluFormer paper.

LATTE3D: Enabling Near-Instant Generation, From Text to 3D Shape 

Another 3D generator released by NVIDIA Research this year is LATTE3D, which converts text prompts into 3D representations within a second — like a speedy, virtual 3D printer. Crafted in a popular format used for standard rendering applications, the generated shapes can be easily served up in virtual environments for developing video games, ad campaigns, design projects or virtual training grounds for robotics.

Read the LATTE3D paper.

MaskedMimic: Reconstructing Realistic Movement for Humanoid Robots

To advance the development of humanoid robots, NVIDIA researchers introduced MaskedMimic, an AI framework that applies inpainting — the process of reconstructing complete data from an incomplete, or masked, view — to descriptions of motion.

Given partial information, such as a text description of movement, or head and hand position data from a virtual reality headset, MaskedMimic can fill in the blanks to infer full-body motion. It’s become part of NVIDIA Project GR00T, a research initiative to accelerate humanoid robot development.

Read the MaskedMimic paper.

StormCast: Boosting Weather Prediction, Climate Simulation 

In the field of climate science, NVIDIA Research announced StormCast, a generative AI model for emulating atmospheric dynamics. While other machine learning models trained on global data have a spatial resolution of about 30 kilometers and a temporal resolution of six hours, StormCast achieves a 3-kilometer, hourly scale.

The researchers trained StormCast on approximately three-and-a-half years of NOAA climate data from the central U.S. When applied with precipitation radars, StormCast offers forecasts with lead times of up to six hours that are up to 10% more accurate than the U.S. National Oceanic and Atmospheric Administration’s state-of-the-art 3-kilometer regional weather prediction model.

Read the StormCast paper, written in collaboration with researchers from Lawrence Berkeley National Laboratory and the University of Washington.

NVIDIA Research Sets Records in AI, Autonomous Vehicles, Robotics

Through 2024, models that originated in NVIDIA Research set records across benchmarks for AI training and inference, route optimization, autonomous driving and more.

NVIDIA cuOpt, an optimization AI microservice used for logistics improvements, has 23 world-record benchmarks. The NVIDIA Blackwell platform demonstrated world-class performance on MLPerf industry benchmarks for AI training and inference.

In the field of autonomous vehicles, Hydra-MDP, an end-to-end autonomous driving framework by NVIDIA Research, achieved first place on the End-To-End Driving at Scale track of the Autonomous Grand Challenge at CVPR 2024.

In robotics, FoundationPose, a unified foundation model for 6D object pose estimation and tracking, obtained first place on the BOP leaderboard for model-based pose estimation of unseen objects.

Learn more about NVIDIA Research, which has hundreds of scientists and engineers worldwide. NVIDIA Research teams are focused on topics including AI, computer graphics, computer vision, self-driving cars and robotics.

Read More

Have You Heard? 5 AI Podcast Episodes Listeners Loved in 2024

Have You Heard? 5 AI Podcast Episodes Listeners Loved in 2024

NVIDIA’s AI Podcast gives listeners the inside scoop on the ways AI is transforming nearly every industry.  Since the show’s debut in 2016, it’s garnered more than 6 million listens across 200-plus episodes, covering how generative AI is used to power applications including assistive technology for the visually impaired, wildfire alert systems and the Roblox online game platform.   Here are the top five episodes of 2024:

Driving Energy Efficiency, Sustainability

AI and accelerated computing are key tools in the push for sustainability. Joshua Parker, senior director of corporate sustainability at NVIDIA, discusses how these technologies are contributing to a more sustainable future by improving energy efficiency and helping address climate challenges.

Zooming In on AI for a Productivity Boost

Zoom helped change the way people work, playing a pivotal role for many during the COVID-19 pandemic. The company’s CTO, Xuedong Huang, shares how the company is reshaping productivity with AI.

Driving the Future of Computing

Alan Chalker, the director of strategic programs at the Ohio Supercomputer Center, shares how the center empowers Ohio higher education institutions and industries with accessible, reliable and secure computational services — and works with client companies like NASCAR, which is simulating race car designs virtually.

Supercharging Cinematic Content Creation

Generative AI can help anyone become a content creator by rapidly bringing ideas to life. Pinar Seyhan Demirdag, cofounder and CEO of Cuebric, discusses how the company’s AI-powered application makes high-quality production more accessible and affordable.

Bringing Clarity to Cardiology

Dr. Keith Channon, cofounder and chief medical officer at health tech startup Caristo Diagnostics, discusses an AI-powered solution for detecting coronary inflammation — a key indicator of heart disease — in cardiac CT scans. These insights could help physicians improve treatment plans and risk predictions.

Subscribe to the AI Podcast

Get the AI Podcast through Amazon Music, Apple Podcasts, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, SoundCloud, Spotify, Stitcher and TuneIn.

Read More

Cheers to 2024: GeForce NOW Recaps Year of Ultimate Cloud Gaming

Cheers to 2024: GeForce NOW Recaps Year of Ultimate Cloud Gaming

This GFN Thursday wraps up another incredible year for cloud gaming. Take a look back at the top games and new features that made 2024 a standout for GeForce NOW members.

Enjoy it all with three new games to close the year.

Remember to mark the calendar for the CES opening keynote, to be delivered by NVIDIA founder and CEO Jensen Huang on Monday, Jan. 6.

That’s a Wrap

In another ultimate year of high-performance cloud gaming, GeForce NOW introduced new features for cloud gamers and reached a significant milestone by surpassing 2,000 games in its library, thanks to strong collaborations with celebrated publishers.

Day Pass on GeForce NOW
Don’t pass up the chance to get all the premium benefits of the cloud for 24 hours.

GeForce NOW also launched new data centers in Japan and Poland this year, bringing GeForce RTX 4080-powered servers to gamers in the regions. Day Passes were introduced to offer gamers more flexible ways to access the cloud, with the ability to enjoy premium benefits of Ultimate and Performance memberships for 24 hours at a time.

Members who wanted to stream their favorite PC games to Valve’s Steam Deck were provided a new beta installation method that let them automatically install Google Chrome to the device, along with all the settings needed to log in to GeForce NOW to stream their favorite games. And GeForce NOW brought an upgraded streaming experience for Performance members, providing up to 1440p resolution — an increase from the original 1080p limit.

Addons in the cloud for WoW
Conquering in the cloud never looked so good.

The rollout of Xbox automatic sign-in streamlined the gaming experience, enabling members to link their Xbox profile to their GeForce NOW account for seamless access to their game libraries. GeForce NOW also partnered with CurseForge to bring mod for World of Warcraft, enabling Ultimate and Priority members to easily enable and customize over 25 top WoW Addons in the cloud, enhancing their gameplay experience across various devices.

Indiana Jones on GeForce NOW
Everything is great in the cloud.

The highly anticipated Indiana Jones and the Great Circle made its debut in the cloud, offering players a thrilling globetrotting adventure with stunning ray-traced graphics and NVIDIA DLSS 3 support. Fans could uncover one of history’s greatest mysteries as they traveled from the pyramids of Egypt to the sunken temples of Sukhothai, all while enjoying the game’s immersive action and intriguing puzzles.

PoE2 on GeForce NOW
The cloud is the path of least resistance.

Early access for Path of Exile 2 arrived with deep customization options and improved visuals. Dragon Age: The Veilguard captivated players with BioWare’s rich fantasy world, while Black Myth: Wukong pushed cloud gaming graphics to new heights with its stunning take on Chinese mythology.

The long-awaited S.T.A.L.K.E.R. 2: Heart of Chornobyl brought its intense survival horror to the GeForce NOW cloud, and Call of Duty: Black Ops 6 delivered fast-paced multiplayer action to members on day one.

Diablo IV on GeForce NOW
Raining terror on any device.

GeForce NOW also welcomed Activision Blizzard games and Battle.net integration. Members gained access to blockbuster titles like Diablo IV, Overwatch 2, Call of Duty: Warzone, Hearthstone and more, adding to the cloud gaming library some of the most popular multiplayer titles.

And it doesn’t stop there — check back in each week to see what’s in store for GeForce NOW in the new year.

Toast to New Adventures

Look for the following games available to stream in the cloud this week:

  • Headquarters: World War II (Steam)
  • Supermarket Together (Steam)
  • Ys X: Nordics (Steam)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

From Generative to Agentic AI, Wrapping the Year’s AI Advancements

From Generative to Agentic AI, Wrapping the Year’s AI Advancements

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for GeForce RTX PC and NVIDIA RTX workstation users.

The AI Decoded series over the past year has broken down all things AI — from simplifying the complexities of large language models (LLMs) to highlighting the power of RTX AI PCs and workstations.

Recapping the latest AI advancements, this roundup highlights how the technology has changed the way people write, game, learn and connect with each other online.

NVIDIA GeForce RTX GPUs offer the power to deliver these experiences on PC laptops, desktops and workstations. They feature specialized AI Tensor Cores that can deliver more than 1,300 trillion operations per second (TOPS) of processing power for cutting-edge performance in gaming, creating, everyday productivity and more. For workstations, NVIDIA RTX GPUs deliver over 1,400 TOPS, enabling next-level AI acceleration and efficiency.

Unlocking Productivity and Creativity With AI-Powered Chatbots

AI Decoded earlier this year explored what LLMs are, why they matter and how to use them.

For many, tools like ChatGPT were their first introduction to AI. LLM-powered chatbots have transformed computing from basic, rule-based interactions to dynamic conversations. They can suggest vacation ideas, write customer service emails, spin up original poetry and even write code for users.

Introduced in March, ChatRTX is a demo app that lets users personalize a GPT LLM with their own content, such as documents, notes and images.

With features like retrieval-augmented generation (RAG), NVIDIA TensorRT-LLM and RTX acceleration, ChatRTX enables users to quickly search and ask questions about their own data. And since the app runs locally on RTX PCs or workstations, results are both fast and private.

NVIDIA offers the broadest selection of foundation models for enthusiasts and developers, including Gemma 2, Mistral and Llama-3. These models can run locally on NVIDIA GeForce and RTX GPUs for fast, secure performance without needing to rely on cloud services.

Download ChatRTX today.

Introducing RTX-Accelerated Partner Applications

AI is being incorporated into more and more apps and use cases, including games, content creation apps, software development and productivity tools.

This expansion is fueled by the wide selection of RTX-accelerated developer and community tools, software development kits, models and frameworks have made it easier than ever to run models locally in popular applications.

AI Decoded in October spotlighted how Brave Browser’s Leo AI, powered by NVIDIA RTX GPUs and the open-source Ollama platform, enables users to run local LLMs like Llama 3 directly on their RTX PCs or workstations.

This local setup offers fast, responsive AI performance while keeping user data private — without relying on the cloud. NVIDIA’s optimizations for tools like Ollama offer accelerated performance for tasks like summarizing articles, answering questions and extracting insights, all directly within the Brave browser. Users can switch between local and cloud models, providing flexibility and control over their AI experience.

For simple instructions on how to add local LLM support via Ollama, read Brave’s blog. Once configured to point to Ollama, Leo AI will use the locally hosted LLM for prompts and queries.

Agentic AI — Enabling Complex Problem-Solving

Agentic AI is the next frontier of AI, capable of using sophisticated reasoning and iterative planning to autonomously solve complex, multi-step problems.

AI Decoded explored how the AI community is experimenting with the technology to create smarter, more capable AI systems.

Partner applications like AnythingLLM  showcase how AI is going beyond simple question-answering to improving productivity and creativity. Users can harness the application to deploy built-in agents that can tackle tasks like searching the web or scheduling meetings.

Example of a user invoking an AI agent in AnythingLLM to complete a web search query.

AnythingLLM lets users interact with documents through intuitive interfaces, automate complex tasks with AI agents and run advanced LLMs locally. Harnessing the power of RTX GPUs, it delivers faster, smarter and more responsive AI workflows — all within a single local desktop application. The application also works offline and is fast and private, capable of using local data and tools typically inaccessible with cloud-based solutions.

AnythingLLM’s Community Hub lets anyone easily access system prompts that can help them steer LLM behavior, discover productivity-boosting slash commands and build specialized AI agent skills for unique workflows and custom tools.

By enabling users to run agentic AI workflows on their own systems with full privacy, AnythingLLM is fueling innovation and making it easier to experiment with the latest technologies.

AI Decoded Wrapped

Over 600 Windows apps and games today are already running AI locally on more than 100 million GeForce RTX AI PCs and workstations worldwide, delivering fast, reliable and low-latency performance. Learn more about NVIDIA GeForce RTX AI PCs and NVIDIA RTX AI workstations.

Tune into the CES keynote delivered by NVIDIA founder and CEO Jensen Huang on Jan. 6. to discover how the latest in  AI is supercharging gaming, content creation and development.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More

NieR Perfect: GeForce NOW Loops Square Enix’s ‘NieR:Automata’ and ‘NieR Replicant ver.1.22474487139…’ Into the Cloud

NieR Perfect: GeForce NOW Loops Square Enix’s ‘NieR:Automata’ and ‘NieR Replicant ver.1.22474487139…’ Into the Cloud

Stuck in a gaming rut? Get out of the loop this GFN Thursday with four new games joining the GeForce NOW library of over 2,000 supported games.

Dive into Square Enix’s mind-bending action role-playing games (RPGs) NieR:Automata and NieR Replicant ver.1.22474487139…, now streaming in the cloud. Plus, explore HoYoverse’s Zenless Zone Zero for an adrenaline-packed adventure, just in time for its 1.4 update.

Check out GeForce Greats, which offers a look back at the biggest and best moments of PC gaming, from the launch of the GeForce 256 graphics card to the modern era. Follow the GeForce, GeForce NOW, NVIDIA Studio and NVIDIA AI PC channels on X, as well as #GeForceGreats, to join in on the nostalgic journey. Plus, participate in the GeForce LAN Missions from the cloud with GeForce NOW starting on Saturday, Jan. 4, for a chance to win in-game rewards, first come, first served.

GeForce NOW members will also be able to launch a virtual stadium for a front-row seat to the CES opening keynote, to be delivered by NVIDIA founder and CEO Jensen Huang on Monday, Jan. 6. Stay tuned to GFN Thursday for more details.

A Tale of Two NieRs

NieR:Automata and NieR Replicant ver.1.22474487139… — two captivating action RPGs from Square Enix — delve into profound existential themes and are set in a distant, postapocalyptic future.

Existence is futile, except in the cloud.

Control androids 2B, 9S and A2 as they battle machine life-forms in a proxy war for human survival in NieR:Automata. The game explores complex philosophical concepts through its multiple endings and perspective shifts, blurring the lines between man and machine. It seamlessly mixes stylish and exhilarating combat with open-world exploration for a diverse gameplay experience.

The hero’s journey leads to the cloud.

NieR Replicant ver.1.22474487139…, an updated version of the original NieR game, follows a young man’s quest to save his sister from a mysterious illness called the Black Scrawl. Uncover dark secrets about their world while encountering a cast of unforgettable characters and making heart-wrenching decisions.

Unravel the layers of the emotionally charged world of NieR with each playthrough on GeForce NOW. Experience rich storytelling and intense combat without high-end hardware. Carefully explore every possible loop with extended gaming sessions for Performance and Ultimate members.

Find Zen in the Cloud

Dive into the Hollows.

Zenless Zone Zero, the free-to-play action role-playing game from HoYoverse, is set in the post-apocalyptic metropolis of New Eridu. Take on the role of a “Proxy” and guide others through dangerous alternate dimensions to confront an interdimensional threat. The game features a fast-paced, combo-oriented combat system and offers a mix of intense action, character-driven storytelling and exploration of a unique futuristic world.

The title comes to the cloud in time for the version 1.4 update, A Storm of Falling Stars, bringing additions to the game for new and experienced players alike. Joining the roster of playable characters are Frost Anomaly agent Hoshimi Miyabi and Electric Attack agent Asaba Harumasa. Plus, the revamped Decibel system allows individual characters to collect and use Decibels instead of sharing across the squad, offering a new layer of strategy. Explore two new areas, Port Elpis and Reverb Arena, and try out the new “Hollow Zero-Lost Void” mode.

Experience the adventure on GeForce NOW and dive deeper into New Eridu across devices with a Performance or Ultimate membership. Snag some in-game loot by following the GeForce NOW social channels (X, Facebook, Instagram, Threads) and be on the lookout for for a limited-quantity redemption code for a free reward package — including 20,000 Dennies, three Official Investigator Logs and three W-Engine Power Supplies.

Fresh Arrivals

Look for the following games available to stream in the cloud this week:

  • NieR:Automata (Steam)
  • NieR Replicant ver.1.22474487139… (Steam)
  • Replikant Chat (Steam)
  • Zenless Zone Zero v1.4 (HoYoverse)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

AI’s in Style: Ulta Beauty Helps Shoppers Virtually Try New Hairstyles

AI’s in Style: Ulta Beauty Helps Shoppers Virtually Try New Hairstyles

Shoppers pondering a new hairstyle can now try styles before committing to curls or a new color. An AI app by Ulta Beauty, the largest specialty beauty retailer in the U.S., uses selfies to show near-instant, highly realistic previews of desired hairstyles.

GLAMlab Hair Try On is a digital experience that lets users take a photo, upload a headshot or use a model’s picture to experiment with different hair colors and styles. Used by thousands of web and mobile app users daily, the experience is powered by the NVIDIA StyleGAN2 generative AI model.

Hair color try-ons feature links to Ulta Beauty products so shoppers can achieve the look in real life. The company, which has more than 1,400 stores across the U.S., has found that people who use the virtual tool are more likely to purchase a product than those who don’t.

“Shoppers need to try out hair and makeup styles before they purchase,” said Juan Cardelino, director of the computer vision and digital innovation department at Ulta Beauty. “As one of the first cosmetics companies to integrate makeup testers in stores, offering try-ons is part of Ulta Beauty’s DNA — whether in physical or digital retail environments.”

Adding Ulta Beauty’s Flair to StyleGAN2

GLAMlab is Ulta Beauty’s first generative AI application, developed by its digital innovation team.

To build its AI pipeline, the team turned to StyleGAN2, a style-based neural network architecture for generative adversarial networks, aka GANs. StyleGAN2, developed by NVIDIA Research, uses transfer learning to generate infinite images in a variety of styles.

“StyleGAN2 is one of the most well-regarded models in the tech community, and, since the source code was available for experimentation, it was the right choice for our application,” Cardelino said. “For our hairstyle try-on use case, we had to license the model for commercial use, retrain it and put guardrails around it to ensure the AI was only modifying pixels related to hair — not distorting any feature of the user’s face.”

Available on the Ulta Beauty website and mobile app, the hair style and color try-ons rely on NVIDIA Tensor Core GPUs in the cloud to run AI inference, which takes around 5 seconds to compute the first style and about a second each for subsequent styles.

The company next plans to incorporate virtual trials for additional hair categories like wigs and is exploring how the virtual hairstyle try-ons could be connected to in-store styling services.

“Stylists could use the tool to show our guests how certain hairstyles will look on them, giving them more confidence to try new looks,” Cardelino said.

Beyond giving customers a new way to interact with Ulta Beauty’s products, these AI-powered virtual try-ons give users a chance to be creative and explore new possibilities for their personal styles.

“Hair and makeup are playful categories,” Cardelino said. “Virtual try-ons are a way to explore options that may be out of a customer’s comfort zone without needing to commit to a physical change.”

See the latest work from NVIDIA Research, which has hundreds of scientists and engineers worldwide, with teams focused on topics including AI, computer graphics, computer vision, self-driving cars and robotics.

Read More

Imbue’s Kanjun Qiu Shares Insights on How to Build Smarter AI Agents

Imbue’s Kanjun Qiu Shares Insights on How to Build Smarter AI Agents

Imagine a future in which everyone is empowered to build and use their own AI agents. That future may not be far off, as new software is infused with intelligence through collaborative AI systems that work alongside users rather than merely automating tasks.

In this episode of the NVIDIA AI Podcast, Kanjun Qiu, CEO of Imbue, discusses the rise of AI agents, drawing parallels between the personal computer revolution of the late 1970s and 80s and today’s AI agent transformation. She details Imbue’s approach to building reasoning capabilities into its products, the challenges of verifying the correctness of AI outputs and how Imbue is focusing on post-training and fine-tuning to improve verification capabilities.

Learn more about Imbue, and read more about AI agents, including how virtual assistants can enhance customer service experiences.

And hear more about the future of AI and graphics by tuning in to the CES keynote, delivered by NVIDIA founder and CEO Jensen Huang live in Las Vegas on Monday, Jan. 6, at 6:30 p.m. PT.

Time Stamps

1:21 – What are AI agents? And Imbue’s approach to them.

9:00 – Where are AI agents being used the most today?

17:05 – Why building a good user experience around agents requires invention.

26:28 – How reasoning and verification capabilities factor into Imbue’s products.

You Might Also Like… 

Zoom CTO Xuedong “XD” Huang on How AI Revolutionizes Productivity 

Zoom is now transforming into an AI-first platform. CTO Xuedong Huang discusses Zoom’s AI Companion 2.0 and the company’s “federated AI” strategy, which aims to integrate multiple large language models to enhance productivity and collaboration.

How Roblox Uses Generative AI to Enhance User Experiences

Roblox is enhancing its colorful online platform with generative AI to improve user safety and inclusivity through features like automated chat filters and real-time text translation. Anupam Singh, VP of AI and growth engineering at Roblox, explores how AI coding assistants are helping creators focus more on creative expression.

Rendered.ai CEO Nathan Kundtz on Using AI to Build Better AI

Data is crucial for training AI and machine learning systems, and synthetic data offers a solution to the challenges of compiling real-world data. Nathan Kundtz, founder and CEO of Rendered.ai, discusses how his company’s platform generates synthetic data to enhance AI models.

Subscribe to the AI Podcast

Get the AI Podcast through Apple Podcasts, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Read More

AI at Your Service: Digital Avatars With Speech Capabilities Offer Interactive Customer Experiences

AI at Your Service: Digital Avatars With Speech Capabilities Offer Interactive Customer Experiences

Editor’s note: This post is part of the AI On blog series, which explores the latest techniques and real-world applications of agentic AI, chatbots and copilots. The series will also highlight the NVIDIA software and hardware powering advanced AI agents, which form the foundation of AI query engines that gather insights and perform tasks to transform everyday experiences and reshape industries.

To enhance productivity and upskill workers, organizations worldwide are seeking ways to provide consistent, around-the-clock customer service with greater speed, accuracy and scale.

Intelligent AI agents offer one such solution. They deliver advanced problem-solving capabilities and integrate vast and disparate sources of data to understand and respond to natural language.

Powered by generative AI and agentic AI, digital avatars are boosting efficiency across industries like healthcare, telecom, manufacturing, retail and more. According to Gartner, by 2028, 45% of organizations with more than 500 employees will use employee AI avatars to expand the capacity of human capital.1

From educating prospects on policies to giving customers personalized solutions, AI is helping organizations optimize revenue streams and elevate employee knowledge and productivity.

Where Context-Aware AI Avatars Are Most Impactful

Staying ahead in a competitive, evolving market requires continuous learning and analysis. AI avatars — also referred to as digital humans — are addressing key concerns and enhancing operations across industries.

One key benefit of agentic digital human technology is the ability to offer consistent, multilingual support and personalized guidance for a variety of use cases.

For instance, a medical-based AI agent can provide 24/7 virtual intake and support telehealth services. Or, a virtual financial advisor can help enhance client security and financial literacy by alerting bank customers of potential fraud, or offering personalized offers and investment tips based on their unique portfolio.

These digital humans boost efficiency, cut costs and enhance customer loyalty. Some key ways digital humans can be applied include:

  • Personalized, On-Brand Customer Assistance: A digital human interface can provide a personal touch when educating new customers on a company’s products and service portfolios. They can provide ongoing customer support, offering immediate responses and solving problems without the need for a live operator.
  • Enhanced Employee Onboarding: Intelligent AI assistants can offer streamlined, adaptable, personalized employee onboarding, whether in hospitals or offices, by providing consistent access to updated institutional knowledge at scale. With pluggable, customizable retrieval-augmented generation (RAG), these assistants can deliver real-time answers to queries while maintaining a deep understanding of company-specific data.
  • Seamless Communication Across Languages: In global enterprises, communication barriers can slow down operations. AI-powered avatars with natural language processing capabilities can communicate effortlessly across languages. This is especially useful in customer service or employee training environments where multilingual support is crucial.

Learn more by listening to the NVIDIA AI Podcast episode with Kanjun Qiu, CEO of Imbue, who shares insights on how to build smarter AI agents.

Interactive AI Agents With Text-to-Speech and Speech-to-Text

With text-to-speech and speech-to-text capabilities, AI agents can offer enhanced interactivity and engagement in customer service interactions.

SoftServe, an IT consulting and digital services provider, has built several digital humans for a variety of use cases, highlighting the technology’s potential to enhance user experiences.

SoftServe’s Digital Concierge is accelerated by NVIDIA AI Blueprints and NVIDIA ACE technologies to rapidly deploy scalable, customizable digital humans across diverse infrastructures.

GEN, SoftServe’s virtual customer service assistant and digital concierge, makes customer service more engaging by providing lifelike interactions, continuous availability, personalized responses and simultaneous access to all necessary knowledge bases.

SoftServe also developed FINNA, an AI-powered virtual financial advisor that can provide financial guidance tailored to a client’s profile and simplify complex financial terminology. It helps streamline onboarding and due diligence, supporting goal-oriented financial planning and risk assessment.

AISHA is another AI-powered digital human developed by SoftServe with NVIDIA technology. Created for the UAE Ministry of Justice, the digital human significantly improves judicial processes by reducing case review times, enhancing the accuracy of rulings and providing rapid access to legal databases. It demonstrates how generative AI can bridge the gap between technology and meaningful user interaction to enhance customer service and operational efficiency in the judicial sector.

How to Design AI Agents With Avatar and Speech Features

Designing AI agents with avatar and speech features involves several key steps

  1. Determine the use case: Choose between 2D or 3D avatars based on the required level of immersion and interaction.
  2. Avatar development:
    • For 3D avatars, use specialized software and technical expertise to create lifelike movements and photorealism.
    • For 2D avatars, opt for quicker development suitable for web-embedded solutions.
  3. Integrate speech technologies: Use NVIDIA Riva for world-class automatic speech recognition, along with text-to-speech to enable verbal interactions.
  4. Rendering options: Use NVIDIA Omniverse RTX Renderer technology or Unreal Engine tools for 3D avatars to achieve high-quality output and compute efficiency.
  5. Deployment: Tap cloud-native deployment for real-time output and scalability, particularly for interactive web or mobile applications.

For an overview on how to design interactive customer service tools, read the technical blogs on how to “Build a Digital Human Interface for AI Apps With an NVIDIA AI Blueprint” and “Expanding AI Agent Interface Options With 2D and 3D Digital Human Avatars.”

NVIDIA AI Blueprint for Digital Humans

The latest release of the NVIDIA AI Blueprint for digital humans introduces several updates that enhance the interactivity and responsiveness of digital avatars, including dynamic switching between RAG models. Users can experience this directly in preview.

The integration of the Audio2Face-2D microservice in the blueprint means developers can create 2D digital humans, which require significantly less processing power compared with 3D models, for web- and mobile-based applications.

2D avatars are better suited for simpler interactions and platforms where photorealism isn’t necessary. This makes them ideal for scenarios like telemedicine, where quick loading times with lower bandwidth requirements are crucial.

Another significant update is the introduction of user attention detection through vision AI. This feature enables digital humans to detect when a user is present — even if they are idle or on mute — and initiate interaction, such as greeting the user. This capability is particularly beneficial in kiosk scenarios, where engaging users proactively can enhance the service experience.

Getting Started

NVIDIA AI Blueprints make it easy to start building and setting up virtual assistants by offering ready-made workflows and tools to accelerate deployment. Whether for a simple AI-powered chatbot or a fully animated digital human interface, the blueprints offer resources to create AI assistants that are scalable, aligned with an organization’s brand and deliver a responsive, efficient customer support experience.

 

1. Gartner®, Hype Cycle™ for the Future of Work, 2024, Tori Paulman, Emily Rose, etc., July 2024

GARTNER is a registered trademark and service mark and Hype Cycle is a trademark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.

Read More

NVIDIA Awards up to $60,000 Research Fellowships to PhD Students

NVIDIA Awards up to $60,000 Research Fellowships to PhD Students

For more than two decades, the NVIDIA Graduate Fellowship Program has supported graduate students doing outstanding work relevant to NVIDIA technologies. Today, the program announced the latest awards of up to $60,000 each to 10 Ph.D. students involved in research that spans all areas of computing innovation.

Selected from a highly competitive applicant pool, the awardees will participate in a summer internship preceding the fellowship year. Their work puts them at the forefront of accelerated computing — tackling projects in autonomous systems, computer architecture, computer graphics, deep learning, programming systems, robotics and security.

The NVIDIA Graduate Fellowship Program is open to applicants worldwide.

The 2025-2026 fellowship recipients are:

  • Anish Saxena, Georgia Institute of Technology — Rethinking data movement across the stack — spanning large language model architectures, system software and memory systems — to improve the efficiency of LLM training and inference.
  • Jiawei Yang, University of Southern California — Creating scalable, generalizable foundation models for autonomous systems through self-supervised learning, leveraging neural reconstruction to capture detailed environmental geometry and dynamic scene behaviors, and enhancing adaptability in robotics, digital twin technologies and autonomous driving.
  • Jiayi (Eris) Zhang, Stanford University — Developing intelligent algorithms, models and tools for enhancing user creativity and productivity in design, animation and simulation.
  • Ruisi Cai, University of Texas at Austin — Working on efficient training and inference for large foundation models as well as AI security and privacy.
  • Seul Lee, Korea Advanced Institute of Science and Technology — Developing generative models for molecules and exploration strategies in chemical space for drug discovery applications.
  • Sreyan Ghosh, University of Maryland, College Park — Advancing audio processing and reasoning by designing resource-efficient models and training techniques, improving audio representation learning and enhancing audio perception for AI systems.
  • Tairan He, Carnegie Mellon University — Researching the development of humanoid robots, with a focus on advancing whole-body loco-manipulation through large-scale simulation-to-real learning.
  • Xiaogeng Liu, University of Wisconsin–Madison — Developing robust and trustworthy AI systems, with an emphasis on evaluating and enhancing machine learning models to ensure consistent performance and resilience against diverse attacks and unforeseen inputs.
  • Yunze Man, University of Illinois Urbana-Champaign — Developing vision-centric reasoning models for multimodal and embodied AI agents, with a focus on object-centric perception systems in dynamic scenes, vision foundation models for open-world scene understanding and generation, and large multimodal models for embodied reasoning and robotics planning.
  • Zhiqiang Xie, Stanford University — Building infrastructures to enable more efficient, scalable and complex compound AI systems while enhancing the observability and reliability of such systems.

We also acknowledge the 2025-2026 fellowship finalists:

  • Bo Zhao, University of California, San Diego
  • Chenning Li, Massachusetts Institute of Technology
  • Dacheng Li, University of California, Berkeley
  • Jiankai Sun, Stanford University
  • Wenlong Huang, Stanford University

Read More

AI in Your Own Words: NVIDIA Debuts NeMo Retriever Microservices for Multilingual Generative AI Fueled by Data

AI in Your Own Words: NVIDIA Debuts NeMo Retriever Microservices for Multilingual Generative AI Fueled by Data

In enterprise AI, understanding and working across multiple languages is no longer optional — it’s essential for meeting the needs of employees, customers and users worldwide.

Multilingual information retrieval — the ability to search, process and retrieve knowledge across languages — plays a key role in enabling AI to deliver more accurate and globally relevant outputs.

Enterprises can expand their generative AI efforts into accurate, multilingual systems using NVIDIA NeMo Retriever embedding and reranking NVIDIA NIM microservices, which are now available on the NVIDIA API catalog. These models can understand information across a wide range of languages and formats, such as documents, to deliver accurate, context-aware results at massive scale.

With NeMo Retriever, businesses can now:

  • Extract knowledge from large and diverse datasets for additional context to deliver more accurate responses.
  • Seamlessly connect generative AI to enterprise data in most major global languages to expand user audiences.
  • Deliver actionable intelligence at greater scale with 35x improved data storage efficiency through new techniques such as long context support and dynamic embedding sizing.
New NeMo Retriever microservices reduce storage volume needs by 35x, enabling enterprises to process more information at once and fit large knowledge bases on a single server. This makes AI solutions more accessible, cost-effective and easier to scale across organizations.

Leading NVIDIA partners like DataStax, Cohesity, Cloudera, Nutanix, SAP, VAST Data and WEKA are already adopting these microservices to help organizations across industries securely connect custom models to diverse and large data sources. By using retrieval-augmented generation (RAG) techniques, NeMo Retriever enables AI systems to access richer, more relevant information and effectively bridge linguistic and contextual divides.

Wikidata Speeds Data Processing From 30 Days to Under Three Days 

In partnership with DataStax, Wikimedia has implemented NeMo Retriever to vector-embed the content of Wikipedia, serving billions of users. Vector embedding — or “vectorizing” —  is a process that transforms data into a format that AI can process and understand to extract insights and drive intelligent decision-making.

Wikimedia used the NeMo Retriever embedding and reranking NIM microservices to vectorize over 10 million Wikidata entries into AI-ready formats in under three days, a process that used to take 30 days. That 10x speedup enables scalable, multilingual access to one of the world’s largest open-source knowledge graphs.

This groundbreaking project ensures real-time updates for hundreds of thousands of entries that are being edited daily by thousands of contributors, enhancing global accessibility for developers and users alike. With Astra DB’s serverless model and NVIDIA AI technologies, the DataStax offering delivers near-zero latency and exceptional scalability to support the dynamic demands of the Wikimedia community.

DataStax is using NVIDIA AI Blueprints and integrating the NVIDIA NeMo Customizer, Curator, Evaluator and Guardrails microservices into the LangFlow AI code builder to enable the developer ecosystem to optimize AI models and pipelines for their unique use cases and help enterprises scale their AI applications.

Language-Inclusive AI Drives Global Business Impact

NeMo Retriever helps global enterprises overcome linguistic and contextual barriers and unlock the potential of their data. By deploying robust, AI solutions, businesses can achieve accurate, scalable and high-impact results.

NVIDIA’s platform and consulting partners play a critical role in ensuring enterprises can efficiently adopt and integrate generative AI capabilities, such as the new multilingual NeMo Retriever microservices. These partners help align AI solutions to an organization’s unique needs and resources, making generative AI more accessible and effective. They include:

  • Cloudera plans to expand the integration of NVIDIA AI in the Cloudera AI Inference Service. Currently embedded with NVIDIA NIM, Cloudera AI Inference will include NVIDIA NeMo Retriever to improve the speed and quality of insights for multilingual use cases.
  • Cohesity introduced the industry’s first generative AI-powered conversational search assistant that uses backup data to deliver insightful responses. It uses the NVIDIA NeMo Retriever reranking microservice to improve retrieval accuracy and significantly enhance the speed and quality of insights for various applications.
  • SAP is using the grounding capabilities of NeMo Retriever to add context to its Joule copilot Q&A feature and information retrieved from custom documents.
  • VAST Data is deploying NeMo Retriever microservices on the VAST Data InsightEngine with NVIDIA to make new data instantly available for analysis. This accelerates the identification of business insights by capturing and organizing real-time information for AI-powered decisions.
  • WEKA is integrating its WEKA AI RAG Reference Platform (WARRP) architecture with NVIDIA NIM and NeMo Retriever into its low-latency data platform to deliver scalable, multimodal AI solutions, processing hundreds of thousands of tokens per second.

Breaking Language Barriers With Multilingual Information Retrieval

Multilingual information retrieval is vital for enterprise AI to meet real-world demands. NeMo Retriever supports efficient and accurate text retrieval across multiple languages and cross-lingual datasets. It’s designed for enterprise use cases such as search, question-answering, summarization and recommendation systems.

Additionally, it addresses a significant challenge in enterprise AI — handling large volumes of large documents. With long-context support, the new microservices can process lengthy contracts or detailed medical records while maintaining accuracy and consistency over extended interactions.

These capabilities help enterprises use their data more effectively, providing precise, reliable results for employees, customers and users while optimizing resources for scalability. Advanced multilingual retrieval tools like NeMo Retriever can make AI systems more adaptable, accessible and impactful in a globalized world.

Availability

Developers can access the multilingual NeMo Retriever microservices, and other NIM microservices for information retrieval, through the NVIDIA API catalog, or a no-cost, 90-day NVIDIA AI Enterprise developer license.

Learn more about the new NeMo Retriever microservices and how to use them to build efficient information retrieval systems.

Read More