AI Decoded: Demystifying Large Language Models, the Brains Behind Chatbots

AI Decoded: Demystifying Large Language Models, the Brains Behind Chatbots

Editor’s note: This post is part of our AI Decoded series, which aims to demystify AI by making the technology more accessible, while showcasing new hardware, software, tools and accelerations for RTX PC and workstation users.

If AI is having its iPhone moment, then chatbots are one of its first popular apps.

They’re made possible thanks to large language models, deep learning algorithms pretrained on massive datasets — as expansive as the internet itself — that can recognize, summarize, translate, predict and generate text and other forms of content. They can run locally on PCs and workstations powered by NVIDIA GeForce and RTX GPUs.

LLMs excel at summarizing large volumes of text, classifying and mining data for insights, and generating new text in a user-specified style, tone or format. They can facilitate communication in any language, even beyond ones spoken by humans, such as computer code or protein and genetic sequences.

While the first LLMs dealt solely with text, later iterations were trained on other types of data. These multimodal LLMs can recognize and generate images, audio, videos and other content forms.

Chatbots like ChatGPT were among the first to bring LLMs to a consumer audience, with a familiar interface built to converse with and respond to natural-language prompts. LLMs have since been used to help developers write code and scientists to drive drug discovery and vaccine development.

But the AI models that power those functions are computationally intensive. Combining advanced optimization techniques and algorithms like quantization with RTX GPUs, which are purpose-built for AI, helps make LLMs compact enough and PCs powerful enough to run locally — no internet connection required. And a new breed of lightweight LLMs like Mistral — one of the LLMs powering Chat with RTX — sets the stage for state-of-the-art performance with lower power and storage demands.

Why Do LLMs Matter?

LLMs can be adapted for a wide range of use cases, industries and workflows. This versatility, combined with their high-speed performance, offers performance and efficiency gains across virtually all language-based tasks.

DeepL, running on NVIDIA GPUs in the cloud, uses advanced AI to provide accurate text translations.

LLMs are widely used in language translation apps such as DeepL, which uses AI and machine learning to provide accurate outputs.

Medical researchers are training LLMs on textbooks and other medical data to enhance patient care. Retailers are leveraging LLM-powered chatbots to deliver stellar customer support experiences. Financial analysts are tapping LLMs to transcribe and summarize earning calls and other important meetings. And that’s just the tip of the iceberg.

Chatbots — like Chat with RTX — and writing assistants built atop LLMs are making their mark on every facet of knowledge work, from content marketing and copywriting to legal operations. Coding assistants were among the first LLM-powered applications to point toward the AI-assisted future of software development. Now, projects like ChatDev are combining LLMs with AI agents — smart bots that act autonomously to help answer questions or perform digital tasks — to spin up an on-demand, virtual software company. Just tell the system what kind of app is needed and watch it get to work.

Learn more about LLM agents on the NVIDIA developer blog.

Easy as Striking Up a Conversation 

Many people’s first encounter with generative AI came by way of a chatbot such as ChatGPT, which simplifies the use of LLMs through natural language, making user action as simple as telling the model what to do.

LLM-powered chatbots can help generate a draft of marketing copy, offer ideas for a vacation, craft an email to customer service and even spin up original poetry.

Advances in image generation and multimodal LLMs have extended the chatbot’s realm to include analyzing and generating imagery — all while maintaining the wonderfully simple user experience. Just describe an image to the bot or upload a photo and ask the system to analyze it. It’s chatting, but now with visual aids.

For more on how these bots are designed, check out the on-demand webinar on Building Intelligent AI Chatbots Using RAG.

Future advancements will help LLMs expand their capacity for logic, reasoning, math and more, giving them the ability to break complex requests into smaller subtasks.

Progress is also being made on AI agents, applications capable of taking a complex prompt, breaking it into smaller ones, and engaging autonomously with LLMs and other AI systems to complete them. ChatDev is an example of an AI agent framework, but agents aren’t limited to technical tasks.

For example, users could ask a personal AI travel agent to book a family vacation abroad. The agent would break that task into subtasks — itinerary planning, booking travel and lodging, creating packing lists, finding a dog walker — and independently execute them in order.

Unlock Personal Data With RAG

As powerful as LLMs and chatbots are for general use, they can become even more helpful when combined with an individual user’s data. By doing so, they can help analyze email inboxes to uncover trends, comb through dense user manuals to find the answer to a technical question about some hardware, or summarize years of bank and credit card statements.

Retrieval-augmented generation, or RAG, is one of the easiest and most effective ways to hone LLMs for a particular dataset.

An example of RAG on a PC.

RAG enhances the accuracy and reliability of generative AI models with facts fetched from external sources. By connecting an LLM with practically any external resource, RAG lets users chat with data repositories while also giving the LLM the ability to cite its sources. The user experience is as simple as pointing the chatbot toward a file or directory.

For example, a standard LLM will have general knowledge about content strategy best practices, marketing tactics and basic insights into a particular industry or customer base. But connecting it via RAG to marketing assets supporting a product launch would allow it to analyze the content and help plan a tailored strategy.

RAG works with any LLM, as the application supports it. NVIDIA’s Chat with RTX tech demo is an example of RAG connecting an LLM to a personal dataset. It runs locally on systems with a GeForce RTX or NVIDIA RTX professional GPU.

To learn more about RAG and how it compares to fine-tuning an LLM, read the tech blog, RAG 101: Retrieval-Augmented Generation Questions Answered.

Experience the Speed and Privacy of Chat with RTX

Chat with RTX is a local, personalized chatbot demo that’s easy to use and free to download. It’s built with RAG functionality and TensorRT-LLM and RTX acceleration. It supports multiple open-source LLMs, including Meta’s Llama 2 and Mistral’s Mistral. Support for Google’s Gemma is coming in a future update.

Chat with RTX connects users to their personal data through RAG.

Users can easily connect local files on a PC to a supported LLM simply by dropping files into a folder and pointing the demo to that location. Doing so enables it to answer queries with quick, contextually relevant answers.

Since Chat with RTX runs locally on Windows with GeForce RTX PCs and NVIDIA RTX workstations, results are fast — and the user’s data stays on the device. Rather than relying on cloud-based services, Chat with RTX lets users process sensitive data on a local PC without the need to share it with a third party or have an internet connection.

To learn more about how AI is shaping the future, tune in to NVIDIA GTC, a global AI developer conference running March 18-21 in San Jose, Calif., and online.

Read More

Currents of Change: ITIF President Daniel Castro on Energy-Efficient AI and Climate Change

Currents of Change: ITIF President Daniel Castro on Energy-Efficient AI and Climate Change

AI-driven change is in the air, as are concerns about the technology’s environmental impact. In this episode of NVIDIA’s AI Podcast, Daniel Castro, vice president of the Information Technology and Innovation Foundation and director of its Center for Data Innovation, speaks with host Noah Kravitz about the motivation behind his AI energy use report, which addresses misconceptions about the technology’s energy consumption. Castro also touches on the need for policies and frameworks that encourage the development of energy-efficient technology. Tune in to discover the crucial role of GPU acceleration in enhancing sustainability and how AI can help address climate change challenges.

Register for NVIDIA GTC, a global AI developer conference running March 18-21 in San Jose, Calif., to explore sessions on energy-efficient computing and using AI to combat climate change.

You Might Also Like…

Overjet on Bringing AI to Dentistry – Ep. 179

Dentists get a bad rap. Dentists also get more people out of more aggravating pain than just about anyone, which is why the more technology dentists have, the better. Overjet, a member of the NVIDIA Inception program for startups, is moving fast to bring AI to dentists’ offices.

DigitalPath’s Ethan Higgins on Using AI to Fight Wildires – Ep. 211

DigitalPath is igniting change in the golden state — using computer vision, generative adversarial networks and a network of thousands of cameras to detect signs of fire in real-time.

Anima Anandkumar on Using Generative AI to Tackle Global Challenges – Ep. 204

Anima Anandkumar, Bren Professor at Caltech and senior director of AI research at NVIDIA, speaks to generative AI’s potential to make splashes in the scientific community, from accelerating drug and vaccine research to predicting extreme weather events like hurricanes or heat waves.

Doing the Best They Can: EverestLabs Ensures Fewer Recyclables Go to Landfills – Ep. 184

All of us recycle. Or, at least, all of us should. Now, AI is joining the effort. JD Ambati, founder and CEO of EverestLabs, developer of RecycleOS, discusses developing first AI-enabled operating system for recycling.

Show Notes

1:41: Context on and findings from the AI energy use report
10:36: How GPU acceleration has transformed the energy efficiency of AI, particularly in weather and climate forecasting
12:31: Examples of how GPU acceleration has improved the energy efficiency of AI operations
15:51: Castro’s insights on sustainability and AI
20:01: Policies and frameworks to encourage energy-efficient AI
26:43: Castro’s outlook on the interplay among advancing AI technology, energy sustainability and climate change

Subscribe to the AI Podcast

Get the AI Podcast through iTunes, Google Podcasts, Google Play, Amazon Music, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

 

Read More

Head of the Class: Explore AI’s Potential in Higher Education and Research at GTC

Head of the Class: Explore AI’s Potential in Higher Education and Research at GTC

For students, researchers and educators eager to delve into AI, GTC — NVIDIA’s conference on AI and accelerated computing — is in a class of its own.

Taking place from March 18-21 at the San Jose Convention Center, GTC features over 900 talks presented by world-renowned experts in fields such as generative AI, high performance computing, healthcare, energy and environment and robotics.

See some of the top sessions for attendees in higher education below. And don’t miss NVIDIA founder and CEO Jensen Huang’s GTC keynote on how AI is transforming industries, on Monday, March 18, at 1 p.m. PT.

For Researchers 

See more sessions for researchers.

For Educators

Find more sessions for educators.

For Students

Discover more sessions for students and apply to join the NVIDIA Student Network.

To gain hands-on experience, check out training labs and full-day technical workshops at GTC.

Read More

Eco-System Upgrade: AI Plants a Digital Forest at NVIDIA GTC

Eco-System Upgrade: AI Plants a Digital Forest at NVIDIA GTC

The ecosystem around NVIDIA’s technologies has always been verdant — but this is absurd.

After a stunning premiere at the World Economic Forum in Davos, immersive artworks based on Refik Anadol Studio’s Large Nature Model will come to the U.S. for the first time at NVIDIA GTC.

Offering a deep dive into the synergy between AI and the natural world, Anadol’s multisensory work, “Large Nature Model: A Living Archive,” will be situated prominently on the main concourse of the San Jose Convention Center, where the global AI event is taking place, from March 18-21.

Fueled by NVIDIA’s advanced AI technology, including powerful DGX A100 stations and high-performance GPUs, the exhibit offers a captivating journey through our planet’s ecosystems with stunning visuals, sounds and scents.

These scenes are rendered in breathtaking clarity across screens with a total output of 12.5 million pixels, immersing attendees in an unprecedented digital portrayal of Earth’s ecosystems.

Refik Anadol, recognized by The Economist as “the artist of the moment,” has emerged as a key figure in AI art. His work, notable for its use of data and machine learning, places him at the forefront of a generation pushing the boundaries between technology, interdisciplinary research and aesthetics. Anadol’s influence reflects a wider movement in the art world towards embracing digital innovation, setting new precedents in how art is created and experienced.

Exhibition Details

  • Location: Main concourse at the San Jose McEnery Convention Center, ensuring easy access for all GTC attendees.
  • Total experience hours: Available from 5-7 p.m., providing a curated window to engage with the installation fully.
  • Screen dimensions: The installation features two towering screens, each four meters high. The larger, four-by-12-meter screen displays the “Large Nature Model: Living Archive,” showcasing Anadol’s centerpiece. A second, four-by-six-meter screen offers a glimpse into the process of building the Large Nature Model.

A Gateway to Digital Nature

Large Nature Model is a generative AI model focused exclusively on nature.

This installation exemplifies AI’s unique potential to capture nature’s inherent intelligence, aiming to redefine our engagement with and appreciation of Earth’s ecosystems.

Anadol has been working with nature-based datasets throughout his career, and began working with rainforest data years ago.

The Large Nature Model, on which the work being shown at GTC is based, continues to evolve. It represents the work of a team of 29 data scientists, graphic designers and AI specialists from around the world, all working under the umbrella of the Refik Anadol Studio.

The Large Nature Model showcased at GTC is fine-tuned using the Getty Images foundation model built using the NVIDIA Edify architecture. The model is fine-tuned on an extensive dataset of approximately 750,000 images, comprising 274,947 images of flora, 358,713 images of fauna and 130,282 images of fungi — showcasing the rich biodiversity of the Amazonian rainforest.

Insights Into the Making

Alongside the visual feast, a panel discussion featuring Anadol and colleagues from the Refik Anadol Studio will provide insights into their research and design processes.

Moderated by Brian Dowdy, a senior technical marketing engineer at NVIDIA, the discussion will explore the collaborative efforts, technical challenges and creative processes that make such pioneering art possible.

The creation of the Large Nature Model represents six months of rigorous development and collaboration with NVIDIA researchers, underscoring the dedication and interdisciplinary effort required to bring this innovative vision to life.

Register for GTC today to join this immersive journey into the heart of nature, art and AI innovation.

Read More

AI Getting Green Light: City of Raleigh Taps NVIDIA Metropolis to Improve Traffic

AI Getting Green Light: City of Raleigh Taps NVIDIA Metropolis to Improve Traffic

You might say that James Alberque has a bird’s-eye view of the road congestion and challenges that come with a booming U.S. city.

Alberque analyzes traffic data for Raleigh, North Carolina, which has seen its population more than double in the past three decades. The city has been working with NVIDIA and its partners to analyze traffic on the roads and intersections to help reduce congestion and enhance pedestrian safety.

“We can now push traffic video into the NVIDIA DeepStream platform and can quantify in real time how many vehicles are entering and exiting intersections and visualize it for our engineers,” said Alberque, a geoinformation systems and emerging technology manager for the city.

Such information can be fed to vendors responsible for keeping traffic lights optimized, so population expansion doesn’t bring roadways to a crawl or increase the number of accidents.

Urban growth has slowed commutes as metropolitan regions across the nation turn to AI for assistance in optimizing traffic flow.

“We got great accuracy level using NVIDIA pre-trained AI computer vision models for traffic cameras right out of the box,” said Alberque. “And our engineers worked with an NVIDIA Metropolis partner, Quantiphi, to refine those models and got them up to an incredible 95% accuracy,” said Alberque.

The Raleigh system uses hundreds of cameras to enhance its model training, and the city has its sights set on everything from road flooding, license plate tracking and parking utilization to bus stop wait times and sanitation management.

Federal Initiatives Support Intersection Safety

Advances from the city of Raleigh and others looking to smooth the flow of traffic come as the U.S. Department of Transportation continues to support AI efforts.

The DOT recently announced winners of the first phase of its Intersection Safety Challenge, which aims to support innovation in intersection safety. Three of the DOT’s winning entrants are harnessing NVIDIA Metropolis for smart intersections.

In this first stage, 15 participants who submitted design concept proposals for intersection safety systems out of 120 submissions were awarded $100,000 each and an invitation to participate further.

The next stage will focus on system assessment and virtual testing, with teams expected to develop, train and improve algorithms used for detection, localization and classification of vehicles and road users in a controlled test intersection.

Enlisting NVIDIA AI for Smart Intersections

Deloitte Consulting is building a foundation for smart intersections, enlisting the NVIDIA Metropolis application framework, developer tools and partner ecosystem.

Derq USA is developing an intersection safety system that relies on NVIDIA Metropolis to help manage the deluge of sensor data for insights.

Metropolis partner, Miovision, which has traffic light safety systems deployed across the U.S., uses the NVIDIA Jetson edge AI platform in its devices for processing video and TensorRT for inference.

“There are so many people moving into our city and surrounding areas, and our number one and number two concerns for citizens are around traffic — this is providing data to move the needle,” said Alberque, regarding Raleigh’s Metropolis and DeepStream development.

 

Register for GTC24 to discover how AI is transforming smart cities. Some key sessions include: 


Explore more smart traffic solutions powered by NVIDIA Metropolis in this Smart Roadways eBook.

 

 

 

 

Read More

LLMs Land on Laptops: NVIDIA, HP CEOs Celebrate AI PCs

LLMs Land on Laptops: NVIDIA, HP CEOs Celebrate AI PCs

2024 will be the year generative AI gets personal, the CEOs of NVIDIA and HP said today in a fireside chat, unveiling new laptops that can build, test and run large language models.

“This is a renaissance of the personal computer,” said NVIDIA founder and CEO Jensen Huang at HP Amplify, a gathering in Las Vegas of about 1,500 resellers and distributors. “The work of creators, designers and data scientists is going to be revolutionized by these new workstations.”

“AI is the biggest thing to come to the PC in decades,” said HP’s Enrique Lores, in the runup to the announcement of what his company billed as “the industry’s largest portfolio of AI PCs and workstations.”

Greater Speed and Security

Compared to running their AI work in the cloud, the new systems will provide increased speed and security while reducing costs and energy, Lores said in a keynote at the event.

New HP ZBooks provide a portfolio of mobile AI workstations powered by a full range of NVIDIA RTX Ada Generation GPUs.

Entry-level systems with the NVIDIA RTX 500 Ada Generation Laptop GPU let users run generative AI apps and tools wherever they go.

High-end models pack the RTX 5000 to deliver up to 682 TOPS, so they can create and run LLMs locally, using retrieval-augmented generation (RAG) to connect to their content for results that are both personalized and private.

Access to Accelerated Software

The new workstations can tap into NVIDIA’s full-stack AI platform, including software that speeds the data science at the foundation of generative AI.

The systems’ Z by HP AI Studio platform — developed in collaboration with NVIDIA — links to NVIDIA NGC, a catalog of GPU-accelerated software for AI and data science. NGC includes NVIDIA NeMo, a framework to build, customize and deploy generative AI models.

In addition, HP and NVIDIA announced that NVIDIA CUDA-X libraries will be integrated with the systems to turbocharge the data preparation and processing that’s fundamental for generative AI.

Speedups for Data Scientists

The libraries include NVIDIA RAPIDS cuDF, which accelerates pandas, software used by nearly 10 million data scientists.

“It used to take them hours and sometimes days to process data that now they can do in minutes,” Huang said.

“This pandas library is insanely complex,” he added, noting NVIDIA engineers worked for more than five years on reformulating the code so it can be accelerated with GPUs.

Entering a New Era

In tandem with the new systems, HP announced a partner training program developed in collaboration with NVIDIA. It will equip computer vendors to advise customers on the right AI products and solutions to meet their needs.

Such programs pave the way for an industry that’s entering an era where AI lets software write software.

“We’ve reinvented the computer. We’ve reinvented how software is written, and now we have to reinvent how software is used,” said Huang. “Large language models, connected into other LLMs, will help solve application problems — that’s the future.”

Read More

First Class: NVIDIA Introduces Generative AI Professional Certification

First Class: NVIDIA Introduces Generative AI Professional Certification

NVIDIA is offering a new professional certification in generative AI to enable developers to establish technical credibility in this important domain.

Generative AI is revolutionizing industries worldwide, yet there’s a critical skills gap and need to uplevel employees to more fully harness the technology.

Available for the first time from NVIDIA, this new professional certification enables developers, career professionals, and others to validate and showcase their generative AI skills and expertise. Our new professional certification program introduces two associate-level generative AI certifications, focusing on proficiency in large language models and multimodal workflow skills.

“Generative AI has moved to center stage as governments, industries and organizations everywhere look to harness its transformative capabilities,” NVIDIA founder and CEO Jensen Huang recently said.

The certification will become available starting at GTC, where in-person attendees can also access recommended training to prepare for a certification exam.

“Organizations in every industry need to increase their expertise in this transformative technology,” said Greg Estes, VP of developer programs at NVIDIA. “Our goals are to assist in upskilling workforces, sharpen the skills of qualified professionals, and enable individuals to demonstrate their proficiency in order to gain a competitive advantage in the job market.”

See AI’s Future. Learn How to Use It.  

GTC 2024 — running March 18-21 in San Jose, Calif. — is the first in-person GTC event in five years, and more than 300,000 people are expected to register to attend in person or virtually.  There will be 900 sessions and more than 300 exhibitors showcasing how organizations are deploying NVIDIA platforms to achieve industry breakthroughs.

Attendees can choose from 20 full-day, hands-on technical workshops, with many sessions available virtually in EMEA and APAC time zones. Also, sign up for the GTC Conference + Training package for more than 40 complimentary onsite training labs.

Sign up for GTC . Learn more about the generative AI course here and here.

Read More

Don’t Pass This Up: Day Passes Now Available on GeForce NOW

Don’t Pass This Up: Day Passes Now Available on GeForce NOW

Gamers can now seize the day with Day Passes, available to purchase for 24-hour continuous access to powerful cloud gaming with all the benefits of a GeForce NOW Ultimate or Priority membership — no commitment required.

Publisher Cygames brings its next triple-A title to the cloud. Granblue Fantasy: Relink leads eight new games joining the GeForce NOW library this week.

Plus, an update for GeForce NOW Windows and macOS adds support for G-SYNC in the cloud. By pairing it with new NVIDIA Reflex support for 60 and 120 frames per second streaming options, Ultimate members can experience ultra-low-latency streaming that’s nearly indistinguishable from using a local PC.

Seize the Day

Day Passes offer access to 24 hours of GeForce RTX-powered cloud gaming. Users can get all the benefits of Ultimate and Priority memberships for a day without committing to longer-term monthly memberships, and choose how and when they access the cloud.

Day Pass Matrix on GeForce NOW
Play for a day.

Ultimate Day Pass users can stream at either 4K 120 fps, up to 240 fps, or with ultrawide resolutions. Plus, they can get all the same benefits as gamers using NVIDIA GeForce RTX 40 Series GPUs, with access to NVIDIA DLSS 3 and NVIDIA Reflex technologies for the smoothest gameplay and lowest latency, even on underpowered devices. Both Ultimate and Priority Day Pass users can turn RTX ON in supported games for immersive, cinematic gameplay.

The Ultimate Day Pass is available for $7.99 and the Priority Day Pass for $3.99. Twenty-four hours of continuous play begins at purchase. Day Passes are available in limited quantities each day, so grab one before the opportunity passes.

Head in the Clouds

Granblue Fantasy: Relink on GeForce NOW
Going on a grand adventure.

Cygames, known for developing popular online game Granblue Fantasy, brings their full-fledged action role-playing game to GeForce NOW. Granblue Fantasy: Relink is now available for fans to stream across devices.

Set in the same universe as the web browser and mobile version of the title, Granblue Fantasy: Relink is an ARPG that features many of the beloved characters from the franchise in an all-new original story. Step into the shoes of a captain leading a Skyfaring crew, alongside a scrappy dragon named Vyrn and a mysterious girl named Lyria, as they navigate the Sky Realm, a world of islands drifting in the clouds.

Slash, shoot and hex treacherous foes with up to three other gaming buddies. GeForce NOW Priority and Ultimate members can become Skyfarers in the cloud with longer game sessions and faster access to GeForce RTX-class servers.

Spring Into New Games

Undisputed on GeForce NOW
Pull no punches.

Step into the ring in Undisputed, an authentic boxing game from Steel City Interactive. Featuring bone-jarring action and more licensed boxers than ever, Undisputed, currently in early access, gives members unprecedented control to master every inch of the ring.

It’s available to stream from the cloud this week, along with the following games:

  • The Thaumaturge (New release on Steam, Mar. 4)
  • Classified: France ‘44 (New release on Steam, Mar. 5)
  • Expeditions: A MudRunner Game (New release on Steam, Mar. 5)
  • Winter Survival (New release on Steam, Mar. 6)
  • Taxi Life: A City Driving Simulator (New release on Steam, Mar. 7)
  • Zoria: Age of Shattering (New release on Steam, Mar. 7)
  • Granblue Fantasy: Relink (Steam)
  • Undisputed (Steam)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso

Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso

As visual generative AI matures from research to the enterprise domain, businesses are seeking responsible ways to integrate the technology into their products.

Bria, a startup based in Tel Aviv, is responding with an open platform for visual generative AI that emphasizes model transparency alongside fair attribution and copyright protections. Currently offering models that convert text prompts to images or transform existing images, the company will this year add text-to-video and image-to-video AI.

“Creating generative AI models requires time and expertise,” said Yair Adato, co-founder and CEO of Bria. “We do the heavy lifting so product teams can adopt our models to achieve a technical edge and go to market quickly, without investing as many resources.”

Advertising agencies and retailers can use Bria’s tools to quickly generate visuals for marketing campaigns. And creative studios can adopt the models to develop stock imagery or edit visuals. Dozens of enterprise clients have integrated the startup’s pretrained models or use its application programming interfaces.

Bria develops its models with the NVIDIA NeMo framework, which is available on NGC, NVIDIA’s hub for accelerated software. The company uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput, low-latency image generation. It’s also adopting NVIDIA Picasso, a foundry for visual generative AI models, to run inference.

“We were looking for a framework to train our models efficiently — one that would minimize compute cost while scaling AI training to more quickly reach model convergence,” said Misha Feinstein, vice president of research and development at Bria. “NeMo features optimization techniques that allow us to maximize the GPUs’ performance during both training and inference.”

Creative Solutions to Creative Challenges

Bria, founded in 2020, offers flexible options for enterprises adopting visual generative AI. By adopting Bria’s platform, its customers can gain a competitive edge by creating visual content at scale while retaining control of their data and technology. Developers can access its pretrained models through APIs or by directly licensing the source code and model weights for further fine-tuning.

“We want to build a company where we respect privacy, content ownership, data ownership and copyright,” said Adato. “To create a healthy, sustainable industry, it’s important to incentivize individuals to keep creating and innovating.”

Adato likens Bria’s attribution program to a music streaming service that pays artists each time one of their songs is played. It’s required for all customers who use Bria’s models — even if they further train and fine-tune the model on their own.

Using licensed datasets provides additional benefits: the Bria team doesn’t need to spend time cleaning the data or sorting out inappropriate content and misinformation.

A Growing Suite of NVIDIA-Accelerated Models

Bria offers two versions of its text-to-image model. One islatency-optimized to rapidly accomplish tasks like image background generation. The other offers higher image resolution. Additional foundation models enable super-resolution, object removal, object generation, inpainting and outpainting.

The company is working to continuously increase the resolution of its generated images, further reduce latency and develop domain-specific models for industries such as ecommerce and stock imagery. Inference is accelerated by the NVIDIA Triton Inference Server software and the NVIDIA TensorRT software development kit.

“We’re running on NVIDIA frameworks, hardware and software,” said Feinstein. “NVIDIA experts have helped us optimize these tools for our needs — we would probably run much slower without their help.”

To keep up with the latest hardware and networking infrastructure, Bria uses cloud computing resources: NVIDIA H100 Tensor Core GPUs for AI training and a variety of NVIDIA Tensor Core GPUs for inference.

Bria is a member of NVIDIA Inception, a program that provides startups with technological support and AI platform guidance. Visit Bria in the Inception Pavilion at NVIDIA GTC, running March 18-21 in San Jose and online.

To train optimized text-to-image models, check out the NeMo Multimodal user guide and GitHub repository. NeMo Multimodal is also available as part of the NeMo container on NGC.

Read More

AI Decoded: Demystifying AI and the Hardware, Software and Tools That Power It

AI Decoded: Demystifying AI and the Hardware, Software and Tools That Power It

With the 2018 launch of RTX technologies and the first consumer GPU built for AI — GeForce RTX — NVIDIA accelerated the shift to AI computing. Since then, AI on RTX PCs and workstations has grown into a thriving ecosystem with more than 100 million users and 500 AI applications.

Generative AI is now ushering in a new wave of capabilities from PC to cloud. And NVIDIA’s rich history and expertise in AI is helping ensure all users have the performance to handle a wide range of AI features.

Users at home and in the office are already taking advantage of AI on RTX with productivity- and entertainment-enhancing software. Gamers feel the benefits of AI on GeForce RTX GPUs with higher frame rates at stunning resolutions in their favorite titles. Creators can focus on creativity, instead of watching spinning wheels or repeating mundane tasks. And developers can streamline workflows using generative AI for prototyping and to automate debugging.

The field of AI is moving fast. As research advances, AI will tackle more complex tasks. And the demanding performance needs will be handled by RTX.

What Is AI?

In its most fundamental form, artificial intelligence is a smarter type of computing. It’s the capability of a computer program or a machine to think, learn and take actions without being explicitly coded with commands to do so, or a user having to control each command.

AI can be thought of as the ability for a device to perform tasks autonomously, by ingesting and analyzing enormous amounts of data, then recognizing patterns in that data — often referred to as being “trained.”

AI development is always oriented around developing systems that perform tasks that would otherwise require human intelligence, and often significant levels of input, to complete — only at speeds beyond any individual’s or group’s capabilities. For this reason, AI is broadly seen as both disruptive and highly transformational.

A key benefit of AI systems is the ability to learn from experiences or patterns inside data, adjusting conclusions on their own when fed new inputs or data. This self-learning allows AI systems to accomplish a stunning variety of tasks, including image recognition, speech recognition, language translation, medical diagnostics, car navigation, image and video enhancement, and hundreds of other use cases.

The next step in the evolution of AI is content generation — referred to as generative AI. It enables users to quickly create new content, and iterate on it, based on a variety of inputs, which can include text, images, sounds, animation, 3D models or other types of data. It then generates new content in the same or a new form.

Popular language applications, like the cloud-based ChatGPT, allow users to generate long-form copy based on a short text request. Image generators like Stable Diffusion turn descriptive text inputs into the desired image. New applications are turning text into video and 2D images into 3D renderings.

GeForce RTX AI PCs and NVIDIA RTX Workstations

AI PCs are computers with dedicated hardware designed to help AI run faster. It’s the difference between sitting around waiting for a 3D image to load, and seeing it update instantaneously with an AI denoiser.

On RTX GPUs, these specialized AI accelerators are called Tensor Cores. And they dramatically speed up AI performance across the most demanding applications for work and play.

One way that AI performance is measured is in teraops, or trillion operations per second (TOPS). Similar to an engine’s horsepower rating, TOPS can give users a sense of a PC’s AI performance with a single metric. The current generation of GeForce RTX GPUs offers performance options that range from roughly 200 AI TOPS all the way to over 1,300 TOPS, with many options across laptops and desktops in between. Professionals get even higher AI performance with the NVIDIA RTX 6000 Ada Generation GPU.

To put this in perspective, the current generation of AI PCs without GPUs range from 10 to 45 TOPS.

More and more types of AI applications will require the benefits of having a PC capable of performing certain AI tasks locally — meaning on the device rather than running in the cloud. Benefits of running on an AI PC include that computing is always available, even without an internet connection; systems offer low latency for high responsiveness; and increased privacy so that users don’t have to upload sensitive materials to an online database before it becomes usable by an AI.

AI for Everyone

RTX GPUs bring more than just performance. They introduce capabilities only possible with RTX technology. Many of these AI features are accessible — and impactful — to millions, regardless of the individual’s skill level.

From AI upscaling to improved video conferencing to intelligent, personalizable chatbots, there are tools to benefit all types of users.

RTX Video uses AI to upscale streaming video and display it in HDR. Bringing lower-resolution video in standard dynamic range to vivid, up to 4K high-resolution high dynamic range. RTX users can enjoy the feature with one-time, one-click enablement on nearly any video streamed in a Chrome or Edge browser.

NVIDIA Broadcast, a free app for RTX users with a straightforward user interface, has a host of AI features that improve video conferencing and livestreaming. It removes unwanted background sounds like clicky keyboards, vacuum cleaners and screaming children with Noise and Echo Removal. It can replace or blur backgrounds with better edge detection using Virtual Background. It smooths low-quality camera images with Video Noise Removal. And it can stay centered on the screen with eyes looking at the camera no matter where the user moves, using Auto Frame and Eye Contact.

Chat with RTX is a local, personalized AI chatbot demo that’s easy to use and free to download.

The tech demo, originally released in January, will get an update with Google’s Gemma soon.

Users can easily connect local files on a PC to a supported large language model simply by dropping files into a single folder and pointing the demo to the location. It enables queries for quick, contextually relevant answers.

Since Chat with RTX runs locally on Windows with GeForce RTX PCs and NVIDIA RTX workstations, results are fast — and the user’s data stays on the device. Rather than relying on cloud-based services, Chat with RTX lets users process sensitive data on a local PC without the need to share it with a third party or have an internet connection.

AI for Gamers

Over the past six years, game performance has seen the greatest leaps with AI acceleration. Gamers have been turning NVIDIA DLSS on since 2019, boosting frame rates and improving image quality. It’s a technique that uses AI to generate pixels in video games automatically. With ongoing improvements, it now increases frame rates by up to 4x.

And with the introduction of Ray Reconstruction in the latest version, DLSS 3.5, visual quality is further enhanced in some of the world’s top titles, setting a new standard for visually richer and more immersive gameplay.

There are now over 500 games and applications that have revolutionized the ways people play and create with ray tracing, DLSS and AI-powered technologies.

Beyond frames, AI is set to improve the way gamers interact with characters and remaster classic games.

NVIDIA ACE microservices — including generative AI-powered speech and animation models — are enabling developers to add intelligent, dynamic digital avatars to games. Demonstrated at CES, ACE won multiple awards for its ability to bring game characters to life as a glimpse into the future of PC gaming.

NVIDIA RTX Remix, a platform for modders to create stunning RTX remasters of classic games, delivers generative AI tools that can transform basic textures from classic games into modern, 4K-resolution, physically based rendering materials. Several projects have already been released or are in the works, including Half-Life 2 RTX and Portal with RTX.

AI for Creators

AI is unlocking creative potential by reducing or automating tedious tasks, freeing up time for pure creativity. These features run fastest or solely on PCs with NVIDIA RTX or GeForce RTX GPUs.

Adobe Premiere Pro’s AI-powered Enhance Speech tool removes unwanted noise and improves dialogue quality.

Adobe Premiere Pro’s Enhance Speech tool is accelerated by RTX, using AI to remove unwanted noise and improve the quality of dialogue clips so they sound professionally recorded. It’s up to 4.5x faster on RTX vs. Mac. Another Premiere feature, Auto Reframe, uses GPU acceleration to identify and track the most relevant elements in a video and intelligently reframes video content for different aspect ratios.

Another time-saving AI feature for video editors is DaVinci Resolve’s Magic Mask. Previously, if editors needed to adjust the color/brightness of a subject in one shot or remove an unwanted object, they’d have to use a combination of rotoscoping techniques or basic power windows and masks to isolate the subject from the background.

Magic Mask has completely changed that workflow. With it, simply draw a line over the subject and the AI will process for a moment before revealing the selection. And GeForce RTX laptops can run the feature 2.5x faster than the fastest non-RTX laptops.

This is just a sample of the ways that AI is increasing the speed of creativity. There are now more than 125 AI applications accelerated by RTX.

AI for Developers

AI is enhancing the way developers build software applications through scalable environments, hardware and software optimizations, and new APIs.

NVIDIA AI Workbench helps developers quickly create, test and customize pretrained generative AI models and LLMs using PC-class performance and memory footprint. It’s a unified, easy-to-use toolkit that can scale from running locally on RTX PCs to virtually any data center, public cloud or NVIDIA DGX Cloud.

After building AI models for PC use cases, developers can optimize them using NVIDIA TensorRT — the software that helps developers take full advantage of the Tensor Cores in RTX GPUs.

TensorRT acceleration is now available in text-based applications with TensorRT-LLM for Windows. The open-source library increases LLM performance and includes pre-optimized checkpoints for popular models, including Google’s Gemma, Meta Llama 2, Mistral and Microsoft Phi-2.

Developers also have access to a TensorRT-LLM wrapper for the OpenAI Chat API. With just one line of code change, continue.dev — an open-source autopilot for VS Code and JetBrains that taps into an LLM — can use TensorRT-LLM locally on an RTX PC for fast, local LLM inference using this popular tool.

Every week, we’ll demystify AI by making the technology more accessible, and we’ll showcase new hardware, software, tools and accelerations for RTX AI PC users.

The iPhone moment of AI is here, and it’s just the beginning. Welcome to AI Decoded.

Get weekly updates directly in your inbox by subscribing to the AI Decoded newsletter.

Read More