AI, Go Fetch! New NVIDIA NeMo Retriever Microservices Boost LLM Accuracy and Throughput

AI, Go Fetch! New NVIDIA NeMo Retriever Microservices Boost LLM Accuracy and Throughput

Generative AI applications have little, or sometimes negative, value without accuracy — and accuracy is rooted in data.

To help developers efficiently fetch the best proprietary data to generate knowledgeable responses for their AI applications, NVIDIA today announced four new NVIDIA NeMo Retriever NIM inference microservices.

Combined with NVIDIA NIM inference microservices for the Llama 3.1 model collection, also announced today, NeMo Retriever NIM microservices enable enterprises to scale to agentic AI workflows — where AI applications operate accurately with minimal intervention or supervision — while delivering the highest accuracy retrieval-augmented generation, or RAG.

NeMo Retriever allows organizations to seamlessly connect custom models to diverse business data and deliver highly accurate responses for AI applications using RAG. In essence, the production-ready microservices enable highly accurate information retrieval for building highly accurate AI applications.

For example, NeMo Retriever can boost model accuracy and throughput for developers creating AI agents and customer service chatbots, analyzing security vulnerabilities or extracting insights from complex supply chain information.

NIM inference microservices enable high-performance, easy-to-use, enterprise-grade inferencing. And with NeMo Retriever NIM microservices, developers can benefit from all of this — superpowered by their data.

These new NeMo Retriever embedding and reranking NIM microservices are now generally available:

  • NV-EmbedQA-E5-v5, a popular community base embedding model optimized for text question-answering retrieval
  • NV-EmbedQA-Mistral7B-v2, a popular multilingual community base model fine-tuned for text embedding for high-accuracy question answering
  • Snowflake-Arctic-Embed-L, an optimized community model, and
  • NV-RerankQA-Mistral4B-v3, a popular community base model fine-tuned for text reranking for high-accuracy question answering.

They join the collection of NIM microservices easily accessible through the NVIDIA API catalog.

Embedding and Reranking Models

NeMo Retriever NIM microservices comprise two model types — embedding and reranking — with open and commercial offerings that ensure transparency and reliability.

A diagram showing a user prompt inquiring about a bill, retrieving the most accurate response.
Example RAG pipeline using NVIDIA NIM microservices for Llama 3.1 and NeMo Retriever embedding and reranking NIM microservices for a customer service AI chatbot application.

An embedding model transforms diverse data — such as text, images, charts and video — into numerical vectors, stored in a vector database, while capturing their meaning and nuance. Embedding models are fast and computationally less expensive than traditional large language models, or LLMs.

A reranking model ingests data and a query, then scores the data according to its relevance to the query. Such models offer significant accuracy improvements while being computationally complex and slower than embedding models.

NeMo Retriever provides the best of both worlds. By casting a wide net of data to be retrieved with an embedding NIM, then using a reranking NIM to trim the results for relevancy, developers tapping NeMo Retriever can build a pipeline that ensures the most helpful, accurate results for their enterprise.

With NeMo Retriever, developers get access to state-of-the-art open, commercial models for building text Q&A retrieval pipelines that provide the highest accuracy. When compared with alternate models, NeMo Retriever NIM microservices provided 30% fewer inaccurate answers for enterprise question answering.

Bar chart showing lexical search (45%), alternative embedder (63%), compared with NeMo Retriever embedding NIM (73%) and NeMo Retriever embedding + reranking NIM microservices (75%).
Comparison of NeMo Retriever embedding NIM and embedding plus reranking NIM microservices performance versus lexical search and an alternative embedder.

Top Use Cases

From RAG and AI agent solutions to data-driven analytics and more, NeMo Retriever powers a wide range of AI applications.

The microservices can be used to build intelligent chatbots that provide accurate, context-aware responses. They can help analyze vast amounts of data to identify security vulnerabilities. They can assist in extracting insights from complex supply chain information. And they can boost AI-enabled retail shopping advisors that offer natural, personalized shopping experiences, among other tasks.

NVIDIA AI workflows for these use cases provide an easy, supported starting point for developing generative AI-powered technologies.

Dozens of NVIDIA data platform partners are working with NeMo Retriever NIM microservices to boost their AI models’ accuracy and throughput.

DataStax has integrated NeMo Retriever embedding NIM microservices in its Astra DB and Hyper-Converged platforms, enabling the company to bring accurate, generative AI-enhanced RAG capabilities to customers with faster time to market.

Cohesity will integrate NVIDIA NeMo Retriever microservices with its AI product, Cohesity Gaia, to help customers put their data to work to power insightful, transformative generative AI applications through RAG.

Kinetica will use NVIDIA NeMo Retriever to develop LLM agents that can interact with complex networks in natural language to respond more quickly to outages or breaches — turning insights into immediate action.

NetApp is collaborating with NVIDIA to connect NeMo Retriever microservices to exabytes of data on its intelligent data infrastructure. Every NetApp ONTAP customer will be able to seamlessly “talk to their data” to access proprietary business insights without having to compromise the security or privacy of their data.

NVIDIA global system integrator partners including Accenture, Deloitte, Infosys, LTTS, Tata Consultancy Services, Tech Mahindra and Wipro, as well as service delivery partners Data Monsters, EXLService (Ireland) Limited, Latentview, Quantiphi, Slalom, SoftServe and Tredence, are developing services to help enterprises add NeMo Retriever NIM microservices into their AI pipelines.

Use With Other NIM Microservices

NeMo Retriever NIM microservices can be used with NVIDIA Riva NIM microservices, which  supercharge speech AI applications across industries — enhancing customer service and enlivening digital humans.

New models that will soon be available as Riva NIM microservices include: FastPitch and HiFi-GAN for text-to-speech applications; Megatron for multilingual neural machine translation; and the record-breaking NVIDIA Parakeet family of models for automatic speech recognition.

NVIDIA NIM microservices can be used all together or separately, offering developers a modular approach to building AI applications. In addition, the microservices can be integrated with community models, NVIDIA models or users’ custom models — in the cloud, on premises or in hybrid environments — providing developers with further flexibility.

NVIDIA NIM microservices are available at ai.nvidia.com. Enterprises can deploy AI applications in production with NIM through the NVIDIA AI Enterprise software platform.

NIM microservices can run on customers’ preferred accelerated infrastructure, including cloud instances from Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure, as well as NVIDIA-Certified Systems from global server manufacturing partners including Cisco, Dell Technologies, Hewlett Packard Enterprise, Lenovo and Supermicro.

NVIDIA Developer Program members will soon be able to access NIM for free for research, development and testing on their preferred infrastructure.

Learn more about the latest in generative AI and accelerated computing by joining NVIDIA at SIGGRAPH, the premier computer graphics conference, running July 28-Aug. 1 in Denver. 

See notice regarding software product information.

Read More

NVIDIA’s AI Masters Sweep KDD Cup 2024 Data Science Competition

NVIDIA’s AI Masters Sweep KDD Cup 2024 Data Science Competition

Team NVIDIA has triumphed at the Amazon KDD Cup 2024, securing first place Friday across all five competition tracks.

The team — consisting of NVIDIANs Ahmet Erdem, Benedikt Schifferer, Chris Deotte, Gilberto Titericz, Ivan Sorokin and Simon Jegou — demonstrated its prowess in generative AI, winning in categories that included text generation, multiple-choice questions, name entity recognition, ranking, and retrieval.

The competition, themed “Multi-Task Online Shopping Challenge for LLMs,” asked participants to solve various challenges using limited datasets.

“The new trend in LLM competitions is that they don’t give you training data,” said Deotte, a senior data scientist at NVIDIA. “They give you 96 example questions — not enough to train a model — so we came up with 500,000 questions on our own.”

Deotte explained that the NVIDIA team generated a variety of questions by writing some themselves, using a large language model to create others, and transforming existing e-commerce datasets.

“Once we had our questions, it was straightforward to use existing frameworks to fine-tune a language model,” he said.

The competition organizers hid the test questions to ensure participants couldn’t exploit previously known answers. This approach encourages models that generalize well to any question about e-commerce, proving the model’s capability to handle real-world scenarios effectively.

Despite these constraints, Team NVIDIA’s innovative approach outperformed all competitors by using Qwen2-72B, a just-released LLM with 72 billion parameters, fine-tuned on eight NVIDIA A100 Tensor Core GPUs, and employing QLoRA, a technique for fine-tuning models with datasets.

About the KDD Cup 2024

The KDD Cup, organized by the Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining, or ACM SIGKDD, is a prestigious annual competition that promotes research and development in the field.

This year’s challenge, hosted by Amazon, focused on mimicking the complexities of online shopping with the goal of making it a more intuitive and satisfying experience using large language models. Organizers utilized the test dataset ShopBench — a benchmark that replicates the massive challenge for online shopping with 57 tasks and about 20,000 questions derived from real-world Amazon shopping data — to evaluate participants’ models.

The ShopBench benchmark focused on four key shopping skills, along with a fifth “all-in-one” challenge:

  1. Shopping Concept Understanding: Decoding complex shopping concepts and terminologies.
  2. Shopping Knowledge Reasoning: Making informed decisions with shopping knowledge.
  3. User Behavior Alignment: Understanding dynamic customer behavior.
  4. Multilingual Abilities: Shopping across languages.
  5. All-Around: Solving all tasks from the previous tracks in a unified solution.

NVIDIA’s Winning Solution

NVIDIA’s winning solution involved creating a single model for each track.

The team fine-tuned the just-released Qwen2-72B model using eight NVIDIA A100 Tensor Core GPUs for approximately 24 hours. The GPUs provided fast and efficient processing, significantly reducing the time required for fine-tuning.

First, the team generated training datasets based on the provided examples and synthesized additional data using Llama 3 70B hosted on build.nvidia.com.

Next, they employed QLoRA (Quantized Low-Rank Adaptation), a training process using the data created in step one. QLoRA modifies a smaller subset of the model’s weights, allowing efficient training and fine-tuning.

The model was then quantized — making it smaller and able to run on a system with a smaller hard drive and less memory — with AWQ 4-bit and used the vLLM inference library to predict the test datasets on four NVIDIA T4 Tensor Core GPUs within the time constraints.

This approach secured the top spot in each individual track and the overall first place in the competition—a clean sweep for NVIDIA for the second year in a row.

The team plans to submit a detailed paper on its solution next month and plans to present its findings at KDD 2024 in Barcelona.

Read More

Sustainable Strides: How AI and Accelerated Computing Are Driving Energy Efficiency

Sustainable Strides: How AI and Accelerated Computing Are Driving Energy Efficiency

AI and accelerated computing — twin engines NVIDIA continuously improves — are delivering energy efficiency for many industries.

It’s progress the wider community is starting to acknowledge.

“Even if the predictions that data centers will soon account for 4% of global energy consumption become a reality, AI is having a major impact on reducing the remaining 96% of energy consumption,” said a report from Lisbon Council Research, a nonprofit formed in 2003 that studies economic and social issues.

The article from the Brussels-based research group is among a handful of big-picture AI policy studies starting to emerge. It uses Italy’s Leonardo supercomputer, accelerated with nearly 14,000 NVIDIA GPUs, as an example of a system advancing work in fields from automobile design and drug discovery to weather forecasting.

Energy-efficiency gains over time for the most efficient supercomputer on the TOP500 list. Source: TOP500.org

Why Accelerated Computing Is Sustainable Computing

Accelerated computing uses the parallel processing of NVIDIA GPUs to do more work in less time. As a result, it consumes less energy than general-purpose servers that employ CPUs built to handle one task at a time.

That’s why accelerated computing is sustainable computing.

Accelerated systems use parallel processing on GPUs to do more work in less time, consuming less energy than CPUs.

The gains are even greater when accelerated systems apply AI, an inherently parallel form of computing that’s the most transformative technology of our time.

“When it comes to frontier applications like machine learning or deep learning, the performance of GPUs is an order of magnitude better than that of CPUs,” the report said.

NVIDIA offers a combination of GPUs, CPUs, and DPUs tailored to maximize energy efficiency with accelerated computing.

User Experiences With Accelerated AI

Users worldwide are documenting energy-efficiency gains with AI and accelerated computing.

In financial services, Murex — a Paris-based company with a trading and risk-management platform used daily by more than 60,000 people — tested the NVIDIA Grace Hopper Superchip. On its workloads, the CPU-GPU combo delivered a 4x reduction in energy consumption and a 7x reduction in time to completion compared with CPU-only systems (see chart below).

“On risk calculations, Grace is not only the fastest processor, but also far more power-efficient, making green IT a reality in the trading world,” said Pierre Spatz, head of quantitative research at Murex.

In manufacturing, Taiwan-based Wistron built a digital copy of a room where NVIDIA DGX systems undergo thermal stress tests to improve operations at the site. It used NVIDIA Omniverse, a platform for industrial digitization, with a surrogate model, a version of AI that emulates simulations.

The digital twin, linked to thousands of networked sensors, enabled Wistron to increase the facility’s overall energy efficiency by up to 10%. That amounts to reducing electricity consumption by 120,000 kWh per year and carbon emissions by a whopping 60,000 kilograms.

Up to 80% Fewer Carbon Emissions

The RAPIDS Accelerator for Apache Spark can reduce the carbon footprint for data analytics, a widely used form of machine learning, by as much as 80% while delivering 5x average speedups and 4x reductions in computing costs, according to a recent benchmark.

Thousands of companies — about 80% of the Fortune 500 — use Apache Spark to analyze their growing mountains of data. Companies using NVIDIA’s Spark accelerator include Adobe, AT&T and the U.S. Internal Revenue Service.

In healthcare, Insilico Medicine discovered and put into phase 2 clinical trials a drug candidate for a relatively rare respiratory disease, thanks to its NVIDIA-powered AI platform.

Using traditional methods, the work would have cost more than $400 million and taken up to six years. But with generative AI, Insilico hit the milestone for one-tenth of the cost in one-third of the time.

“This is a significant milestone not only for us, but for everyone in the field of AI-accelerated drug discovery,” said Alex Zhavoronkov, CEO of Insilico Medicine.

This is just a sampler of results that users of accelerated computing and AI are pursuing at companies such as Amgen, BMW, Foxconn, PayPal and many more.

Speeding Science With Accelerated AI 

In basic research, the National Energy Research Scientific Computing Center (NERSC), the U.S. Department of Energy’s lead facility for open science, measured results on a server with four NVIDIA A100 Tensor Core GPUs compared with dual-socket x86 CPU servers across four of its key high-performance computing and AI applications.

Researchers found that the apps, when accelerated with the NVIDIA A100 GPUs, saw energy efficiency rise 5x on average (see below). One application, for weather forecasting, logged gains of nearly 10x.

Scientists and researchers worldwide depend on AI and accelerated computing to achieve high performance and efficiency.

In a recent ranking of the world’s most energy-efficient supercomputers, known as the Green500, NVIDIA-powered systems swept the top six spots, and 40 of the top 50.

Underestimated Energy Savings

The many gains across industries and science are sometimes overlooked in forecasts that extrapolate only the energy consumption of training the largest AI models. That misses the benefits from most of an AI model’s life when it’s consuming relatively little energy, delivering the kinds of efficiencies users described above.

In an analysis citing dozens of sources, a recent study debunked as misleading and inflated projections based on training models.

“Just as the early predictions about the energy footprints of e-commerce and video streaming ultimately proved to be exaggerated, so too will those estimates about AI likely be wrong,” said the report from the Information Technology and Innovation Foundation (ITIF), a Washington-based think tank.

The report notes as much as 90% of the cost — and all the efficiency gains — of running an AI model are in deploying it in applications after it’s trained.

“Given the enormous opportunities to use AI to benefit the economy and society — including transitioning to a low-carbon future — it is imperative that policymakers and the media do a better job of vetting the claims they entertain about AI’s environmental impact,” said the report’s author, who described his findings in a recent podcast.

Others Cite AI’s Energy Benefits

Policy analysts from the R Street Institute, also in Washington, D.C., agreed.

“Rather than a pause, policymakers need to help realize the potential for gains from AI,” the group wrote in a 1,200-word article.

“Accelerated computing and the rise of AI hold great promise for the future, with significant societal benefits in terms of economic growth and social welfare,” it said, citing demonstrated benefits of AI in drug discovery, banking, stock trading and insurance.

AI can make the electric grid, manufacturing and transportation sectors more efficient, it added.

AI Supports Sustainability Efforts

The reports also cited the potential of accelerated AI to fight climate change and promote sustainability.

“AI can enhance the accuracy of weather modeling to improve public safety as well as generate more accurate predictions of crop yields. The power of AI can also contribute to … developing more precise climate models,” R Street said.

The Lisbon report added that AI plays “a crucial role in the innovation needed to address climate change” for work such as discovering more efficient battery materials.

How AI Can Help the Environment

ITIF called on governments to adopt AI as a tool in efforts to decarbonize their operations.

Public and private organizations are already applying NVIDIA AI to protect coral reefs, improve tracking of wildfires and extreme weather, and enhance sustainable agriculture.

For its part, NVIDIA is working with hundreds of startups employing AI to address climate issues. NVIDIA also announced plans for Earth-2, expected to be the world’s most powerful AI supercomputer dedicated to climate science.

Enhancing Energy Efficiency Across the Stack

Since its founding in 1993, NVIDIA has worked on energy efficiency across all its products — GPUs, CPUs, DPUs, networks, systems and software, as well as platforms such as Omniverse.

In AI, the brunt of an AI model’s life is in inference, delivering insights that help users achieve new efficiencies. The NVIDIA GB200 Grace Blackwell Superchip has demonstrated 25x energy efficiency over the prior NVIDIA Hopper GPU generation in AI inference.

Over the last eight years, NVIDIA GPUs have advanced a whopping 45,000x in their energy efficiency running large language models (see chart below).

Recent innovations in software include TensorRT-LLM. It can help GPUs reduce 3x the energy consumption of LLM inference.

Here’s an eye-popping stat: If the efficiency of cars improved as much as NVIDIA has advanced the efficiency of AI on its accelerated computing platform, cars would get 280,000 miles per gallon. That means you could drive to the moon on less than a gallon of gas.

The analysis applies to the fuel efficiency of cars NVIDIA’s whopping 10,000x efficiency gain in AI training and inference from 2016 to 2025 (see chart below).

How the big AI efficiency leap from the NVIDIA P100 GPU to the NVIDIA Grace Blackwell compares to car fuel-efficiency gains.

Driving Data Center Efficiency

NVIDIA delivers many optimizations through system-level innovations. For example, NVIDIA BlueField-3 DPUs can reduce power consumption up to 30% by offloading essential data center networking and infrastructure functions from less efficient CPUs.

Last year, NVIDIA received a $5 million grant from the U.S. Department of Energy — the largest of 15 grants from a pool of more than 100 applications — to design a new liquid-cooling technology for data centers. It will run 20% more efficiently than today’s air-cooled approaches and has a smaller carbon footprint.

These are just some of the ways NVIDIA contributes to the energy efficiency of data centers.

Data centers are among the most efficient users of energy and one of the largest consumers of renewable energy.

The ITIF report notes that between 2010 and 2018, global data centers experienced a 550% increase in compute instances and a 2,400% increase in storage capacity, but only a 6% increase in energy use, thanks to improvements across hardware and software.

NVIDIA continues to drive energy efficiency for accelerated AI, helping users in science, government and industry accelerate their journeys toward sustainable computing.

Try NVIDIA’s energy-efficiency calculator to find ways to improve energy efficiency. And check out NVIDIA’s sustainable computing site and corporate sustainability report for more information. 

Read More

Byte-Sized Courses: NVIDIA Offers Self-Paced Career Development in AI and Data Science

Byte-Sized Courses: NVIDIA Offers Self-Paced Career Development in AI and Data Science

AI has seen unprecedented growth — spurring the need for new training and education resources for students and industry professionals.

NVIDIA’s latest on-demand webinar, Essential Training and Tips to Accelerate Your Career in AI, featured a panel discussion with industry experts on fostering career growth and learning in AI and other advanced technologies.

Over 1,800 attendees gained insights on how to kick-start their careers and use NVIDIA’s technologies and resources to accelerate their professional development.

Opportunities in AI

AI’s impact is touching nearly every industry, presenting new career opportunities for professionals of all backgrounds.

Lauren Silveira, a university recruiting program manager at NVIDIA, challenged attendees to take their unique education and experience and apply it in the AI field.

“You don’t have to work directly in AI to impact the industry,” said Silveira. “I knew I wouldn’t be a doctor or engineer — that wasn’t in my career path — but I could create opportunities for those that wanted to pursue those dreams.”

Kevin McFall, a principal instructor for the NVIDIA Deep Learning Institute, offered some advice for those looking to navigate a career in AI and advanced technologies but finding themselves overwhelmed or unsure of where to start.

“Don’t try to do it all by yourself,” he said. “Don’t get focused on building everything from scratch — the best skill that you can have is being able to take pieces of code or inspiration from different resources and plug them together to make a whole.”

A main takeaway from the panelists was that students and industry professionals can significantly enhance their capabilities by leveraging tools and resources in addition to their networks.

Every individual can access a variety of free software development kits, community resources and specialized courses in areas like robotics, CUDA and OpenUSD through the NVIDIA Developer Program. Additionally, they can kick off projects with the CUDA code sample library and explore specialized guides such as “A Simple Guide to Deploying Generative AI With NVIDIA NIM”.

Spinning a Network

Staying up to date on the rapidly expanding technology industry involves more than just keeping up with the latest education and certifications.

Sabrina Koumoin, a senior software engineer at NVIDIA, spoke on the importance of networking. She believes people can find like-minded peers and mentors to gain inspiration from by sharing their personal learning journeys or projects on social platforms like LinkedIn.

A self-taught coder, Koumoin also advocates for active engagement and education accessibility. Outside of work, she hosted multiple coding bootcamps for people looking to break into tech.

“It’s a way to show that learning technical skills can be engaging, not intimidating,” she said.

David Ajoku, founder and CEO at Demystifyd and Aware.ai, also emphasized the importance of using LinkedIn to build connections, demonstrate key accomplishments and show passion.

He outlined a three-step strategy to enhance your LinkedIn presence, designed to help you stand out, gain deeper insights into your preferred companies and boldly share your aspirations and interests:

  1. Think about a company you’d like to work for and what draws you to it.
  2. Research thoroughly, focusing on its main activities, mission and goals.
  3. Be bold — create a series of posts informing your network about your career journey and what advancements interest you in the chosen company.

One attendee asked about how AI might evolve over the next decade and what skills professionals should focus on to stay relevant. Louis Stewart, head of strategic initiatives at NVIDIA, replied that crafting a personal narrative and growth journey is just as important as ensuring certifications and skills are up to date.

“Be intentional and purposeful — have an end in mind,” he said. “That’s how you connect with future potential companies and people — it’s a skill you have to develop to stay ahead.”

Deep Dive Into Learning

NVIDIA offers a variety of programs and resources to equip the next generation of AI professionals with the skills and training needed to excel in a career in AI.

NVIDIA’s AI Learning Essentials is designed to give individuals the knowledge, skills and certifications they need to be prepared for the workforce and the fast moving field of AI. It includes free access to self-paced introductory courses and webinars on topics such as generative AI, retrieval-augmented generation (RAG) and CUDA.

The NVIDIA Deep Learning Institute (DLI) provides a diverse range of resources, including learning materials, self-paced and live trainings, and educator programs spanning AI, accelerated computing and data science, graphics simulation and more. They also offer technical workshops for students currently enrolled in universities.

DLI provides comprehensive training for generative AI, RAG, NVIDIA NIM inference microservices and large language models. Offerings also include certifications for generative AI LLMs and generative AI multimodal that help learners showcase their expertise and stand out from the crowd.

Get started with AI Learning Essentials, the NVIDIA Deep Learning Institute and on-demand resources.

Read More

Magnetic Marvels: NVIDIA’s Supercomputers Spin a Quantum Tale

Magnetic Marvels: NVIDIA’s Supercomputers Spin a Quantum Tale

Research published earlier this month in the science journal Nature used NVIDIA-powered supercomputers to validate a pathway toward the commercialization of quantum computing.

The research, led by Nobel laureate Giorgio Parisi, focuses on quantum annealing, a method that may one day tackle complex optimization problems that are extraordinarily challenging to conventional computers.

To conduct their research, the team utilized 2 million GPU computing hours at the Leonardo facility (Cineca, in Bologna, Italy), nearly 160,000 GPU computing hours on the Meluxina-GPU cluster, in Luxembourg, and 10,000 GPU hours from the Spanish Supercomputing Network. Additionally, they accessed the Dariah cluster, in Lecce, Italy.

They used these state-of-the-art resources to simulate the behavior of a certain kind of quantum computing system known as a quantum annealer.

Quantum computers fundamentally rethink how information is computed to enable entirely new solutions.

Unlike classical computers, which process information in binary — 0s and 1s — quantum computers use quantum bits or qubits that can allow information to be processed in entirely new ways.

Quantum annealers are a special type of quantum computer that, though not universally useful, may have advantages for solving certain types of optimization problems.

The paper, “The Quantum Transition of the Two-Dimensional Ising Spin Glass,” represents a significant step in understanding the phase transition — a change in the properties of a quantum system — of Ising spin glass, a disordered magnetic material in a two-dimensional plane, a critical problem in computational physics.

The paper addresses the problem of how the properties of magnetic particles arranged in a two-dimensional plane can abruptly change their behavior.

The study also shows how GPU-powered systems play a key role in developing approaches to quantum computing.

GPU-accelerated simulations allow researchers to understand the complex systems’ behavior in developing quantum computers, illuminating the most promising paths forward.

Quantum annealers, like the systems developed by the pioneering quantum computing company D-Wave, operate by methodically decreasing a magnetic field that is applied to a set of magnetically susceptible particles.

When strong enough, the applied field will act to align the magnetic orientation of the particles — similar to how iron filings will uniformly stand to attention near a bar magnet.

If the strength of the field is varied slowly enough, the magnetic particles will arrange themselves to minimize the energy of the final arrangement.

Finding this stable, minimum-energy state is crucial in a particularly complex and disordered magnetic system known as a spin glass since quantum annealers can encode certain kinds of problems into the spin glass’s minimum-energy configuration.

Finding the stable arrangement of the spin glass then solves the problem.

Understanding these systems helps scientists develop better algorithms for solving difficult problems by mimicking how nature deals with complexity and disorder.

That’s crucial for advancing quantum annealing and its applications in solving extremely difficult computational problems that currently have no known efficient solution — problems that are pervasive in fields ranging from logistics to cryptography.

Unlike gate-model quantum computers, which operate by applying a sequence of quantum gates, quantum annealers allow a quantum system to evolve freely in time.

This is not a universal computer — a device capable of performing any computation given sufficient time and resources — but may have advantages for solving particular sets of optimization problems in application areas such as vehicle routing, portfolio optimization and protein folding.

Through extensive simulations performed on NVIDIA GPUs, the researchers learned how key parameters of the spin glasses making up quantum annealers change during their operation, allowing a better understanding of how to use these systems to achieve a quantum speedup on important problems.

Much of the work for this groundbreaking paper was first presented at NVIDIA’s GTC 2024 technology conference. Read the full paper and learn more about NVIDIA’s work in quantum computing.

Read More

Mistral AI and NVIDIA Unveil Mistral NeMo 12B, a Cutting-Edge Enterprise AI Model

Mistral AI and NVIDIA Unveil Mistral NeMo 12B, a Cutting-Edge Enterprise AI Model

Mistral AI and NVIDIA today released a new state-of-the-art language model, Mistral NeMo 12B, that developers can easily customize and deploy for enterprise applications supporting chatbots, multilingual tasks, coding and summarization.

By combining Mistral AI’s expertise in training data with NVIDIA’s optimized hardware and software ecosystem, the Mistral NeMo model offers high performance for diverse applications.

“We are fortunate to collaborate with the NVIDIA team, leveraging their top-tier hardware and software,” said Guillaume Lample, cofounder and chief scientist of Mistral AI. “Together, we have developed a model with unprecedented accuracy, flexibility, high-efficiency and enterprise-grade support and security thanks to NVIDIA AI Enterprise deployment.”

Mistral NeMo was trained on the NVIDIA DGX Cloud AI platform, which offers dedicated, scalable access to the latest NVIDIA architecture.

NVIDIA TensorRT-LLM for accelerated inference performance on large language models and the NVIDIA NeMo development platform for building custom generative AI models were also used to advance and optimize the process.

This collaboration underscores NVIDIA’s commitment to supporting the model-builder ecosystem.

Delivering Unprecedented Accuracy, Flexibility and Efficiency 

Excelling in multi-turn conversations, math, common sense reasoning, world knowledge and coding, this enterprise-grade AI model delivers precise, reliable performance across diverse tasks.

With a 128K context length, Mistral NeMo processes extensive and complex information more coherently and accurately, ensuring contextually relevant outputs.

Released under the Apache 2.0 license, which fosters innovation and supports the broader AI community, Mistral NeMo is a 12-billion-parameter model. Additionally, the model uses the FP8 data format for model inference, which reduces memory size and speeds deployment without any degradation to accuracy.

That means the model learns tasks better and handles diverse scenarios more effectively, making it ideal for enterprise use cases.

Mistral NeMo comes packaged as an NVIDIA NIM inference microservice, offering performance-optimized inference with NVIDIA TensorRT-LLM engines.

This containerized format allows for easy deployment anywhere, providing enhanced flexibility for various applications.

As a result, models can be deployed anywhere in minutes, rather than several days.

NIM features enterprise-grade software that’s part of NVIDIA AI Enterprise, with dedicated feature branches, rigorous validation processes, and enterprise-grade security and support.

It includes comprehensive support, direct access to an NVIDIA AI expert and defined service-level agreements, delivering reliable and consistent performance.

The open model license allows enterprises to integrate Mistral NeMo into commercial applications seamlessly.

Designed to fit on the memory of a single NVIDIA L40S, NVIDIA GeForce RTX 4090 or NVIDIA RTX 4500 GPU, the Mistral NeMo NIM offers high efficiency, low compute cost, and enhanced security and privacy.

Advanced Model Development and Customization 

The combined expertise of Mistral AI and NVIDIA engineers has optimized training and inference for Mistral NeMo.

Trained with Mistral AI’s expertise, especially on multilinguality, code and multi-turn content, the model benefits from accelerated training on NVIDIA’s full stack.

It’s designed for optimal performance, utilizing efficient model parallelism techniques, scalability and mixed precision with Megatron-LM.

The model was trained using Megatron-LM, part of NVIDIA NeMo, with 3,072 H100 80GB Tensor Core GPUs on DGX Cloud, composed of NVIDIA AI architecture, including accelerated computing, network fabric and software to increase training efficiency.

Availability and Deployment

With the flexibility to run anywhere — cloud, data center or RTX workstation — Mistral NeMo is ready to revolutionize AI applications across various platforms.

Experience Mistral NeMo as an NVIDIA NIM today via ai.nvidia.com, with a downloadable NIM coming soon.

See notice regarding software product information.

Read More

Hot Deal, Cool Prices: GeForce NOW Summer Sale Offers Priority and Ultimate Memberships Half Off

Hot Deal, Cool Prices: GeForce NOW Summer Sale Offers Priority and Ultimate Memberships Half Off

It’s time for a sweet treat — the GeForce NOW Summer Sale offers high-performance cloud gaming at half off for a limited time.

And starting today, gamers can directly access supported PC games on GeForce NOW via Xbox.com game pages, enabling them to get into their favorite Xbox PC games even faster.

It all comes with nine new games joining the cloud this week.

We Halve a Deal

Summer Sale on GeForce NOW
Unlock the power of cloud gaming with GeForce NOW’s sizzling summer sale.

Take advantage of a special new discount — one-month and six-month GeForce NOW Priority or Ultimate memberships are now 50% off until Aug. 18. It’s perfect for members wanting to level up their gaming experience or those looking to try GeForce NOW for the first time to access and stream an ever-growing library of over 1,900 games with top-notch performance.

Priority members enjoy more benefits over free users, including faster access to gaming servers and gaming sessions of up to six hours. They can also stream beautifully ray-traced graphics across multiple devices with RTX ON for the most immersive experience in supported games.

For those looking for top-notch performance, the Ultimate tier provides members with exclusive access to servers and the ability to stream at up to 4K resolution and 120 frames per second, or up to 240 fps — even without upgraded hardware. Ultimate members get all the same benefits as GeForce RTX 40 series GPU owners, including NVIDIA DLSS 3 for the smoothest frame rates and NVIDIA Reflex for the lowest-latency streaming from the cloud.

Strike while it’s hot — this scorching summer sale ends soon.

Path of the Goddess

Kunitsu-Gami: Path of the Goddess on GeForce NOW
Rinse and repeat.

Capcom’s latest release, Kunitsu-Gami: Path of the Goddess is a unique Japanese-inspired, single-player Kagura Action Strategy game.

The game takes place on a mountain covered in defilement. During the day, purify the villages and prepare for sundown. During the night, protect the Maiden against the hordes of the Seethe. Repeat the day-and-night cycle until the mountain has been cleansed of defilement and peace has returned to the land.

Walk the path of the goddess in the cloud with extended gaming sessions for Ultimate and Priority members. Ultimate members can also enjoy seeing supernatural and human worlds collide in ultrawide resolutions for an even more immersive experience.

Slay New Games

Dungeons of Hinterberg on GeForce NOW
Having a holiday in Hinterberg.

In Dungeons of Hinterberg from Microbird Games, play as Luisa, a burnt-out law trainee taking a break from her fast-paced corporate life. Explore the beautiful alpine village of Hinterberg armed with just a sword and a tourist guide, and uncover the magic hidden within its dungeons. Master magic, solve puzzles and slay monsters — all from the cloud.

Check out the list of new games this week:

  • The Crust (New release on Steam, July 15)
  • Gestalt: Steam & Cinder (New release on Steam, July 16)
  • Nobody Wants to Die (New release on Steam, July 17)
  • Dungeons of Hinterberg (New release on Steam and Xbox, available on PC Game Pass, July 18)
  • Flintlock: The Siege of Dawn  (New release on Steam and Xbox, available on PC Game Pass, July 18)
  • Norland (New release on Steam, July 18)
  • Kunitsu-Gami: Path of the Goddess (New release on Steam, July 19)
  • Content Warning (Steam)
  • Crime Boss: Rockay City (Steam)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Decoding How AI-Powered Upscaling on NVIDIA RTX Improves Video Quality

Decoding How AI-Powered Upscaling on NVIDIA RTX Improves Video Quality

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for RTX PC and workstation users.

Video is everywhere — nearly 80% of internet bandwidth today is used to stream video from content providers and social networks. While screens have become bigger and support higher resolutions, nearly all video is only 1080p quality or lower.

Upscalers can help sharpen streamed video and, powered by AI on the NVIDIA RTX platform, significantly enhance image quality and detail.

What Is an Upscaler?

The larger file size of videos makes it harder to compress and transmit compared to images or text. Platforms like Netflix, Vimeo and YouTube work around this limitation by encoding video — the process of compressing the raw source of a video into a smaller container format.

The encoder first analyzes the video to decide what information it can remove to make it fit a target resolution and frame rate. If the target bitrate is insufficient, the video quality decreases, resulting in a loss of detail and sharpness and the presence of encoding artifacts. The smaller the file, the easier it is to share on the internet — but the worse it looks.

Typically, software on the viewer’s device will upscale the video file to fit the display’s native resolution. However, these upscalers are fairly simplistic, merely multiplying pixels to meet the desired resolution. They can help sharpen the outlines of objects and scenes, but the final video typically carries encoding artifacts and sometimes looks over-sharpened and unnatural.

AI Know a Better Way

The NVIDIA RTX platform uses AI to easily de-artifact and upscale videos.

Easily de-artifact and upscale videos with RTX.

The process of AI upscaling involves analyzing images and motion vectors to generate new details not present in the original video. Instead of merely multiplying pixels, it recognizes the patterns of the image and enhances them to provide greater detail and video quality.

Images must be first de-artifacted before any processing begins. Artifacts — or unwanted distortions and anomalies that appear in video and image files — occur due to overcompression or data loss during transmission and storage.

NVIDIA AI networks can de-artifact images, helping remove blocky areas sometimes seen in streamed video. Without this first step, AI upscalers might end up enhancing the artifacted image itself instead of the desired content.

Super-Sized Video

Just like putting on a pair of prescription glasses can instantly snap the world into focus, RTX Video Super Resolution, one of NVIDIA’s latest innovations in AI-enhanced video technology, gives users a clearer picture into the world of streamed video.

Click the image to see the differences between bicubic upscaling (left) and RTX Video Super Resolution (right).

Available on GeForce RTX 40 and 30 Series GPUs and RTX professional GPUs, it uses AI running on dedicated Tensor Cores to remove block compression artifacts and upscale lower-resolution content up to 4K, matching the user’s native display resolution.

RTX Video Super Resolution can be used to enhance all video watched on browsers. By combining de-artifacting with AI upscaling techniques, it can make even low-bitrate Twitch streams look stunningly clear. RTX Video Super Resolution is also supported in popular video apps like VLC so users can apply the same upscaling process to their offline videos.

Creators can soon use RTX Video Super Resolution in editing apps like Black Magic’s Davinci Resolve, making it easier than ever to upscale lower-quality video files to 4K resolution, as well as convert standard-dynamic range source files into high-dynamic range (HDR).

Say Hi to High-Dynamic Range

RTX Video now also supports AI HDR. HDR video supports a wider range of colors, lending greater detail especially to the darker and lighter areas of images. The problem is that there isn’t that much HDR content online yet.

Enter RTX Video HDR — by simply turning on the feature, the AI network will turn any standard or low-dynamic-range content into HDR, performing the correct tone mapping so the image still looks natural and retains its original colors.

AI Across the Board

RTX Video is just the latest implementation of AI upscaling powered by NVIDIA RTX.

Members of the GeForce NOW cloud streaming service can play their favorite PC games on nearly any device. GeForce RTX servers located all over the world first render the game video content, encode it and then stream it to the player’s local device — just like streaming video from other content providers.

Members on older NVIDIA GPU-powered devices can still use AI-enhanced upscaling to improve gameplay quality. This means they can enjoy the best of both worlds — gameplay rendered on servers powered by RTX 4080-class GPUs in the cloud and AI-enhanced streaming quality. Get more information on enabling AI-enhanced upscaling on GeForce NOW.

The NVIDIA SHIELD TV takes this one step further, processing AI neural networks directly on its NVIDIA Tegra system-on-a-chip to upscale 1080p-quality or lower content from nearly any streaming platform to a display’s native resolution. That means users can improve the video quality of content streamed from Netflix, Prime Video, Max, Disney+ and more at the push of a remote button.

SHIELD TV is currently available for up to $30 off in North America and £30 or 35€ off in Europe as part of Amazon’s Prime Day event running July 16-17. For Prime members in Europe, eligible SHIELD TV purchases also include one month of the GeForce NOW Ultimate membership for free, enabling GeForce RTX 4080-class PC gameplay streamed directly to the living room.

AI has enabled unprecedented improvements in video quality, helping set a new standard in streaming experiences.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More

Next-Gen Video Editing: Wondershare Filmora Adds NVIDIA RTX Video HDR Support, RTX-Accelerated AI Features

Next-Gen Video Editing: Wondershare Filmora Adds NVIDIA RTX Video HDR Support, RTX-Accelerated AI Features

Editor’s note: This post is part of our In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX GPU features, technologies and resources, and how they dramatically accelerate content creation.

Wondershare Filmora — a video editing app with AI-powered tools — now supports NVIDIA RTX Video HDR, joining editing software like Blackmagic Design’s DaVinci Resolve and Cyberlink PowerDirector.

RTX Video HDR significantly enhances video quality, ensuring the final output is suitable for the best monitors available today.

Livestreaming software OBS Studio and XSplit Broadcaster now support Twitch Enhanced Broadcasting, giving streamers more control over video quality through client-side encoding and automatic configurations. The feature, developed in collaboration between Twitch, OBS and NVIDIA, also paves the way for more advancements, including vertical live video and advanced codecs such as HEVC and AV1.

A summer’s worth of creative app updates are included in the July Studio Driver, ready for download today. Install the NVIDIA app beta — the essential companion for creators and gamers — to keep GeForce RTX PCs up to date with the latest NVIDIA drivers and technology.

Join NVIDIA at SIGGRAPH to learn about the latest breakthroughs in graphics and generative AI, and tune in to a fireside chat featuring NVIDIA founder and CEO Jensen Huang and Lauren Goode, senior writer at WIRED, on Monday, July 29 at 2:30 p.m. MT. Register now.

And this week’s featured In the NVIDIA Studio artist, Kevin Stratvert, shares all about AI-powered content creation in Wondershare Filmora.

(Wonder)share the Beauty of RTX Video

RTX Video HDR analyzes standard dynamic range video and transforms it into HDR10-quality video, expanding the color gamut to produce clearer, more vibrant frames and enhancing the sense of depth for greater immersion.

With RTX Video HDR, Filmora users can create high-quality content that’s ideal for gaming videos, travel vlogs or event filmmaking.

Combining RTX Video HDR with RTX Video Super Resolution — another AI-powered tool that uses trained models to sharpen edges, restore features and remove artifacts in video — further enhances visual quality. RTX Video HDR requires an NVIDIA RTX GPU connected to an HDR10-compatible monitor or TV. For more information, check out the RTX Video FAQ.

Those with a RTX GPU-powered PC can send files to the Filmora desktop app and continue to edit with local RTX acceleration, doubling the speed of the export process with dual encoders on GeForce RTX 4070 Ti or above GPUs.

Learn more about Wondershare Filmora’s AI-powered features.

Maximizing AI Features in Filmora

Kevin Stratvert has the heart of a teacher — he’s always loved to share his technical knowledge and tips with others.

One day, he thought, “Why not make a YouTube video to explain stuff directly to users?” His first big hit was a tutorial on how to get Microsoft Office for free through Office.com. The video garnered millions of views and tons of engagement — and he’s continued creating content ever since.

“The more content I created, the more questions and feedback I got from viewers, sparking this cycle of creativity and connection that I just couldn’t get enough of,” said Stratvert.

Explaining the benefits of AI has been an area of particular interest for Stratvert, especially as it relates to AI-powered features in Wondershare Filmora. In one YouTube video, Filmora Video Editor Tutorial for Beginners, he breaks down the AI effects video editors can use to accelerate their workflows.

Examples include:

  • Smart Edit: Edit footage-based transcripts generated automatically, including in multiple languages.
  • Smart Cutout: Remove unwanted objects or change the background in seconds.
  • Speech-to-Text: Automatically generate compelling descriptions, titles and captions.

“AI has become a crucial part of my creative toolkit, especially for refining details that really make a difference,” said Stratvert. “By handling these technical tasks, AI frees up my time to focus more on creating content, making the whole process smoother and more efficient.”

Stratvert has also been experimenting with NVIDIA ChatRTX, a technology that lets users interact with their local data, installing and configuring various AI models, effectively prompting AI for both text and image outputs using CLIP and more.

NVIDIA Broadcast has been instrumental in giving Stratvert a professional setup for web conferences and livestreams. The app’s features, including background noise removal and virtual background, help maintain a professional appearance on screen. It’s especially useful in home studio settings, where controlling variables in the environment can be challenging.

“NVIDIA Broadcast has been instrumental in professionalizing my setup for web conferences and livestreams.” — Kevin Stratvert

Stratvert stresses the importance of his GeForce RTX 4070 graphics card in the content creation process.

“With an RTX GPU, I’ve noticed a dramatic improvement in render times and the smoothness of playback, even in demanding scenarios,” he said. “Additionally, the advanced capabilities of RTX GPUs support more intensive tasks like real-time ray tracing and AI-driven editing features, which can open up new creative possibilities in my edits.”

Check out Stratvert’s video tutorials on his website.

Content creator Kevin Stratvert.

Follow NVIDIA Studio on Instagram, X and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter

Read More

Jensen Huang, Mark Zuckerberg to Discuss Future of Graphics and Virtual Worlds at SIGGRAPH 2024

Jensen Huang, Mark Zuckerberg to Discuss Future of Graphics and Virtual Worlds at SIGGRAPH 2024

NVIDIA founder and CEO Jensen Huang and Meta founder and CEO Mark Zuckerberg will hold a public fireside chat on Monday, July 29, at the 50th edition of the SIGGRAPH graphics conference in Denver.

The two leaders will discuss the future of AI and simulation and the pivotal role of research at SIGGRAPH, which focuses on the intersection of graphics and technology.

Before the discussion, Huang will also appear in a fireside chat with WIRED senior writer Lauren Goode to discuss AI and graphics for the new computing revolution.

Both conversations will be available live and on replay at NVIDIA.com.

The appearances at the conference, which runs July 28-Aug. 1, highlight SIGGRAPH’s continued role in technological innovation. Nearly 100 exhibitors will showcase how graphics are stepping into the future.

Attendees exploring the SIGGRAPH Innovation Zone will encounter startups at the forefront of computing and graphics while insights from industry leaders like Huang deliver a glimpse into the technological horizon.

Since the conference’s 1974 inception in Boulder, Colorado, SIGGRAPH has been at the forefront of innovation.

It introduced the world to demos such as the “Aspen Movie Map” — a precursor to Google Street View decades ahead of its time — and one of the first screenings of Pixar’s Luxo Jr., which redefined the art of animation.

The conference remains the leading venue for groundbreaking research in computer graphics.

Publications that redefined modern visual culture — including Ed Catmull’s 1974 paper on texture mapping, Turner Whitted’s 1980 paper on ray-tracing techniques, and James T. Kajiya’s 1986 “The Rendering Equation” — first made their debut at SIGGRAPH.

Innovations like these are now spilling out across the world’s industries.

Throughout the Innovation Zone, over a dozen startups are showcasing how they’re bringing advancements rooted in graphics into diverse fields — from robotics and manufacturing to autonomous vehicles and scientific research, including climate science.

Highlights include Tomorrow.io, which leverages NVIDIA Earth-2 to provide precise weather insights and offers early warning systems to help organizations adapt to climate changes.

Looking Glass is pioneering holographic technology that enables 3D content experiences without headsets. The company is using NVIDIA RTX 6000 Ada Generation GPUs and NVIDIA Maxine technology to enhance real-time audio, video and augmented-reality effects to make this possible.

Manufacturing startup nTop developed a computer-aided design tool using NVIDIA GPU-powered signed distance fields. The tool uses the NVIDIA OptiX rendering engine and a two-way NVIDIA Omniverse LiveLink connector to enable real-time, high-fidelity visualization and collaboration across design and simulation platforms.

Conference attendees can also explore how generative AI — a technology deeply rooted in visual computing — is remaking professional graphics.

On July 31, industry leaders and developers will gather in room 607 at the Colorado Convention Center for Generative AI Day, exploring cutting-edge solutions for visual effects, animation and game development with leaders from Bria AI, Cuebric, Getty Images, Replikant, Shutterstock and others.

The conference’s speaker lineup is equally compelling.

In addition to Huang and Zuckerberg, notable presenters include Dava Newman of MIT Media Lab and Mark Sagar from Soul Machines, who’ll delve into the intersections of bioengineering, design and digital humans.

Finally, as part of SIGGRAPH’s rich legacy, the inaugural Stephen Parker Award will be presented to honor the memory and contributions of Stephen Parker, vice president of professional graphics at NVIDIA. Renowned for his pioneering work in interactive ray tracing and computer graphics, Parker left a legacy that continues to inspire the field.

Join the global technology community in Denver later this month to discover why SIGGRAPH remains at the forefront of demonstrating, predicting and shaping the future of technology.

Read More