Better Molecules, Faster: NVIDIA NIM Agent Blueprint Redefines Hit Identification With Generative AI-Based Virtual Screening

Better Molecules, Faster: NVIDIA NIM Agent Blueprint Redefines Hit Identification With Generative AI-Based Virtual Screening

Aiming at making the process faster and smarter, NVIDIA on Wednesday released the NIM Agent Blueprint for generative AI-based virtual screening.

This innovative approach will reduce the time and cost of developing life-saving drugs, enabling quicker access to critical treatments for patients.

This NIM Agent Blueprint introduces a paradigm shift in the drug discovery process, particularly in the crucial “hit-to-lead” transition, by moving from traditional fixed database screening to generative AI-driven molecule design and pre-optimization, enabling researchers to design better molecules faster.

What’s a NIM? What’s a NIM Agent Blueprint?

NVIDIA NIM microservices are modular, cloud-native components that accelerate AI model deployment and execution. These microservices allow researchers to integrate and scale advanced AI models within their workflows, enabling faster and more efficient processing of complex data.

The NIM Agent Blueprint, a comprehensive guide, shows how these microservices can optimize key stages of drug discovery, such as hit identification and lead optimization.

How Are They Used?

Drug discovery is a complex process with three critical stages: target identification, hit identification and lead optimization. Target identification involves choosing the right biology to modify to treat the disease; hit identification is identifying potential molecules that will bind to that target; and lead optimization is improving the design of those molecules to be safer and more effective.

This NVIDIA NIM Agent Blueprint, called generative virtual screening for accelerated drug discovery, identifies and improves virtual hits in a smarter and more efficient way.

At its core are three essential AI models, now including the recently integrated AlphaFold2 as part of NVIDIA’s NIM microservices.

  • AlphaFold2, renowned for its groundbreaking impact on protein structure prediction, is now available as an NVIDIA NIM.
  • MolMIM is a novel model developed by NVIDIA that generates molecules while simultaneously optimizing for multiple properties, such as high solubility and low toxicity.
  • DiffDock is an advanced tool for quickly modeling the binding of small molecules to their protein targets.

These models work in concert to improve the hit-to-lead process, making it more efficient and faster.

Each of these AI models is packaged within NVIDIA NIM microservices — portable containers designed to accelerate the performance, shorten time-to-market and simplify the deployment of generative AI models anywhere.

The NIM Agent Blueprint integrates these microservices into a flexible, scalable, generative AI workflow that can help transform drug discovery.

Leading computational drug discovery and biotechnology software providers that are using NIM microservices now, such as Benchling, Dotmatics, Terray, TetraScience and Cadence Molecular Sciences (OpenEye), are using NIM Agent Blueprints in their computer-aided drug discovery platforms.

These integrations aim to make the hit-to-lead process faster and more intelligent, leading to the identification of more viable drug candidates in less time and at lower cost.

Leading computational drug discovery and biotechnology software providers that are using NIM microservices now, such as Schrödinger, Benchling, Dotmatics, Terray, TetraScience and Cadence Molecular Sciences (OpenEye), are using NIM Agent Blueprints in their computer-aided drug discovery platforms.

These integrations aim to make the hit-to-lead process faster and more intelligent, leading to the identification of more viable drug candidates in less time and at lower cost.

Global professional services company Accenture is poised to tailor the NIM Agent Blueprint to the specific needs of drug development programs by optimizing the molecule generation step with input from pharmaceutical partners to inform the MolMIM NIM.

In addition, the NIM microservices that comprise the NIM Agent Blueprint will soon be available on AWS HealthOmics, a purpose-built service that helps customers orchestrate biological analyses. This includes streamlining the integration of AI into existing drug discovery workflows.

Revolutionizing Drug Development With AI

The stakes in drug discovery are high.

Developing a new drug typically costs around $2.6 billion and can take 10-15 years, with a success rate of less than 10%.

By making molecular design smarter with NVIDIA’s AI-powered NIM Agent Blueprint, pharmaceutical companies can reduce these costs and shorten development timelines in the $1.5 trillion global pharmaceutical market.

This NIM Agent Blueprint represents a significant shift from traditional drug discovery methods, offering a generative AI approach that pre-optimizes molecules for desired therapeutic properties.

For example, MolMIM, the generative model for molecules within this NIM Agent Blueprint, uses advanced functions to steer the generation of molecules with optimized pharmacokinetic properties — such as absorption rate, protein binding, half-life and other properties — a marked advancement over previous methods.

This smarter approach to small molecule design enhances the potential for successful lead optimization, accelerating the overall drug discovery process.

This leap in technology could lead to faster, more targeted treatments, addressing growing challenges in healthcare, from rising costs to an aging population.

NVIDIA’s commitment to supporting researchers with the latest advancements in accelerated computing underscores its role in solving the most complex problems in drug discovery.

Visit build.nvidia.com to download the NIM Agent Blueprint for generative AI-based virtual screening and take the first step toward faster, more efficient drug development.

See notice regarding software product information.

Read More

From Prototype to Prompt: NVIDIA NIM Agent Blueprints Fast-Forward Next Wave of Enterprise Generative AI

From Prototype to Prompt: NVIDIA NIM Agent Blueprints Fast-Forward Next Wave of Enterprise Generative AI

The initial wave of generative AI was driven by its use in internet services that showed incredible new possibilities with tools that could help people write, research and imagine faster than ever.

The second wave of generative AI is now here, powered by the availability of advanced open-source foundation models, as well as advancements in agentic AI that are improving efficiency and autonomy of AI workflows. Enterprises across industries can use models like Google Gemma, Llama 3.1 405B, Microsoft Phi, Mixtral and Nemotron to develop their own AI applications to support business growth and boost productivity.

To accelerate business transformation, enterprises need blueprints for canonical generative AI workflows like digital human customer service chatbots, retrieval-augmented generation and drug discovery. While NVIDIA NIM microservices help make these models efficient and accessible for enterprise use, building enterprise generative AI applications is a complex, multistep process.

Launched today, NVIDIA NIM Agent Blueprints include everything an enterprise developer needs to build and deploy customized generative AI applications that make a transformative  impact on business objectives.

Blueprints for Data-Driven Enterprise Flywheels

NIM Agent Blueprints are reference AI workflows tailored for specific use cases. They include sample applications built with NVIDIA NIM and partner microservices, reference code, customization documentation and a Helm chart for deployment.

With NIM Agent Blueprints, developers can gain a head start on creating their own applications using NVIDIA’s advanced AI tools and end-to-end development experience for each use case. The blueprints are designed to be modified and enhanced, and allow developers to leverage both information retrieval and agent-based workflows capable of performing complex tasks.

NIM Agent Blueprints also help developers improve their applications throughout the AI lifecycle. As users interact with AI applications, new data is generated. This data can be used to refine and enhance the models in a continuous learning cycle, creating a data-driven generative AI flywheel.

NIM Agent Blueprints help enterprises build their own generative AI flywheels with applications that link models with their data. NVIDIA NeMo facilitates this process, while NVIDIA AI Foundry serves as the production environment for running the flywheel.

The first NIM Agent Blueprints available are:

  • digital human for customer service
  • generative virtual screening for accelerated drug discovery
  • multimodal PDF data extraction for enterprise RAG

ServiceNow is a leader in enterprise AI that has integrated advanced generative AI capabilities into its digital workflow platform. It’s already bringing NIM microservices into its Now Assist AI solutions and working with the technologies featured in the digital human for customer service NIM Blueprint.

“AI is not just a tool, it’s the foundation of a fundamental shift in how companies can better equip employees and serve customers,” said Jon Sigler, senior vice president, Platform and AI at ServiceNow. “Through our collaboration with NVIDIA, we’ve built new generative AI products and features that are driving growth and powering AI transformation for ServiceNow customers.”

More NIM Agent Blueprints are in development for creating generative AI applications for customer service, content generation, software engineering, retail shopping advisors and R&D. NVIDIA plans to introduce new NIM Agent Blueprints monthly.

NVIDIA Ecosystem Supercharges Enterprise Adoption

NVIDIA’s partner ecosystem — including global systems integrators and service delivery partners Accenture, Deloitte, SoftServe, Quantiphi and World Wide Technology — are bringing NIM Agent Blueprints to the world’s enterprises.

NIM Agent Blueprints can be optimized using customer interaction data with tools from NVIDIA’s ecosystem of partners, such as Dataiku and DataRobot for model fine-tuning, governance, and monitoring, deepset, LlamaIndex and Langchain for building workflows, Weights and Biases for generative AI application evaluations, and CrowdStrike, Datadog, Fiddler AI, New Relic and Trend Micro for additional safeguarding. Infrastructure platform providers, including Nutanix, Red Hat and Broadcom, will support NIM Agent Blueprints on their enterprise solutions.

Customers can build and deploy NIM Agent Blueprints on NVIDIA-Certified Systems from manufacturers such as Cisco, Dell Technologies, Hewlett Packard Enterprise and Lenovo, as well as on NVIDIA-accelerated cloud instances from Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure.

To help enterprises put their data to work in generative AI applications, NIM Agent Blueprints can integrate with data and storage platforms from NVIDIA partners, such as Cohesity, Datastax, Dropbox, NetApp and VAST Data.

A Collaborative Future for Developers and Data Scientists

Generative AI is now fostering collaboration between developers and data scientists. Developers use NIM Agent Blueprints as a foundation to build their applications, while data scientists implement the data flywheel to continually improve their custom NIM microservices. As a NIM improves, so do the related applications, creating a cycle of continuous enhancement and data generation.

With NIM Agent Blueprints — and support from NVIDIA’s partners — virtually every enterprise can seamlessly integrate generative AI into their applications to drive efficiency and innovation across industries.

Enterprises can experience NVIDIA NIM Agent Blueprints today.

See notice regarding software product information.

Read More

NVIDIA Launches NIM Microservices for Generative AI in Japan, Taiwan

NVIDIA Launches NIM Microservices for Generative AI in Japan, Taiwan

Nations around the world are pursuing sovereign AI to produce artificial intelligence using their own computing infrastructure, data, workforce and business networks to ensure AI systems align with local values, laws and interests.

In support of these efforts, NVIDIA today announced the availability of four new NVIDIA NIM microservices that enable developers to more easily build and deploy high-performing generative AI applications.

The microservices support popular community models tailored to meet regional needs. They enhance user interactions through accurate understanding and improved responses based on local languages and cultural heritage.

In the Asia-Pacific region alone, generative AI software revenue is expected to reach $48 billion by 2030 — up from $5 billion this year, according to ABI Research.

Llama-3-Swallow-70B, trained on Japanese data, and Llama-3-Taiwan-70B, trained on Mandarin data, are regional language models that provide a deeper understanding of local laws, regulations and other customs.

The RakutenAI 7B family of models, built on Mistral-7B, were trained on English and Japanese datasets, and are available as two different NIM microservices for Chat and Instruct. Rakuten’s foundation and instruct models have achieved leading scores among open Japanese large language models, landing the top average score in the LM Evaluation Harness benchmark carried out from January to March 2024.

Training a large language model (LLM) on regional languages enhances the effectiveness of its outputs by ensuring more accurate and nuanced communication, as it better understands and reflects cultural and linguistic subtleties.

The models offer leading performance for Japanese and Mandarin language understanding, regional legal tasks, question-answering, and language translation and summarization compared with base LLMs like Llama 3.

Nations worldwide — from Singapore, the United Arab Emirates, South Korea and Sweden to France, Italy and India — are investing in sovereign AI infrastructure.

The new NIM microservices allow businesses, government agencies and universities to host native LLMs in their own environments, enabling developers to build advanced copilots, chatbots and AI assistants.

Developing Applications With Sovereign AI NIM Microservices

Developers can easily deploy the sovereign AI models, packaged as NIM microservices, into production while achieving improved performance.

The microservices, available with NVIDIA AI Enterprise, are optimized for inference with the NVIDIA TensorRT-LLM open-source library.

NIM microservices for Llama 3 70B — which was used as the base model for the new Llama–3-Swallow-70B and Llama-3-Taiwan-70B NIM microservices — can provide up to 5x higher throughput. This lowers the total cost of running the models in production and provides better user experiences by decreasing latency.

The new NIM microservices are available today as hosted application programming interfaces (APIs).

Tapping NVIDIA NIM for Faster, More Accurate Generative AI Outcomes

The NIM microservices accelerate deployments, enhance overall performance and provide the necessary security for organizations across global industries, including healthcare, finance, manufacturing, education and legal.

The Tokyo Institute of Technology fine-tuned Llama-3-Swallow 70B using Japanese-language data.

“LLMs are not mechanical tools that provide the same benefit for everyone. They are rather intellectual tools that interact with human culture and creativity. The influence is mutual where not only are the models affected by the data we train on, but also our culture and the data we generate will be influenced by LLMs,” said Rio Yokota, professor at the Global Scientific Information and Computing Center at the Tokyo Institute of Technology. “Therefore, it is of paramount importance to develop sovereign AI models that adhere to our cultural norms. The availability of Llama-3-Swallow as an NVIDIA NIM microservice will allow developers to easily access and deploy the model for Japanese applications across various industries.”

For instance, a Japanese AI company, Preferred Networks, uses the model to develop a healthcare specific model trained on a unique corpus of Japanese medical data, called Llama3-Preferred-MedSwallow-70B, that tops scores on the Japan National Examination for Physicians.

Chang Gung Memorial Hospital (CGMH), one of the leading hospitals in Taiwan, is building a custom-made AI Inference Service (AIIS) to centralize all LLM applications within the hospital system. Using Llama 3-Taiwan 70B, it is improving the efficiency of frontline medical staff with more nuanced medical language that patients can understand.

“By providing instant, context-appropriate guidance, AI applications built with local-language LLMs streamline workflows and serve as a continuous learning tool to support staff development and improve the quality of patient care,” said Dr. Changfu Kuo, director of the Center for Artificial Intelligence in Medicine at CGMH, Linko Branch. “NVIDIA NIM is simplifying the development of these applications, allowing for easy access and deployment of models trained on regional languages with minimal engineering expertise.”

Taiwan-based Pegatron, a maker of electronic devices, will adopt the Llama 3-Taiwan 70B NIM microservice for internal- and external-facing applications. It has integrated it with its PEGAAi Agentic AI System to automate processes, boosting efficiency in manufacturing and operations.

Llama-3-Taiwan 70B NIM is also being used by global petrochemical manufacturer Chang Chun Group, world-leading printed circuit board company Unimicron,  technology-focused media company TechOrange, online contract service company LegalSign.ai and generative AI startup APMIC. These companies are also collaborating on the open model.

Creating Custom Enterprise Models With NVIDIA AI Foundry

While regional AI models can provide culturally nuanced and localized responses, enterprises still need to fine-tune them for their business processes and domain expertise.

NVIDIA AI Foundry is a platform and service that includes popular foundation models, NVIDIA NeMo for fine-tuning, and dedicated capacity on NVIDIA DGX Cloud to provide developers a full-stack solution for creating a customized foundation model packaged as a NIM microservice.

Additionally, developers using NVIDIA AI Foundry have access to the NVIDIA AI Enterprise software platform, which provides security, stability and support for production deployments.

NVIDIA AI Foundry gives developers the necessary tools to more quickly and easily build and deploy their own custom, regional language NIM microservices to power AI applications, ensuring culturally and linguistically appropriate results for their users.

Read More

NVIDIA Launches Array of New CUDA Libraries to Expand Accelerated Computing and Deliver Order-of-Magnitude Speedup to Science and Industrial Applications

NVIDIA Launches Array of New CUDA Libraries to Expand Accelerated Computing and Deliver Order-of-Magnitude Speedup to Science and Industrial Applications

News summary: New libraries in accelerated computing deliver order-of-magnitude speedups and reduce energy consumption and costs in data processing, generative AI, recommender systems, AI data curation, data processing, 6G research, AI-physics and more. They include:

  • LLM applications: NeMo Curator, to create custom datasets, adds image curation and Nemotron-4 340B for high-quality synthetic data generation
  • Data processing: cuVS for vector search to build indexes in minutes instead of days and a new Polars GPU Engine in open beta
  • Physical AI: For physics simulation, Warp accelerates computations with a new TIle API. For wireless network simulation, Aerial adds more map formats for ray tracing and simulation. And for link-level wireless simulation, Sionna adds a new toolchain for real-time inference

Companies around the world are increasingly turning to NVIDIA accelerated computing to speed up applications they first ran on CPUs only. This has enabled them to achieve extreme speedups and benefit from incredible energy savings.

In Houston, CPFD makes computational fluid dynamics simulation software for industrial applications, like its Barracuda Virtual Reactor software that helps design next-generation recycling facilities. Plastic recycling facilities run CPFD software in cloud instances powered by NVIDIA accelerated computing. With a CUDA GPU-accelerated virtual machine, they can efficiently scale and run simulations 400x faster and 140x more energy efficiently than using a CPU-based workstation.

A conveyor belt filled with plastic bottles flowing through a recycling facility. AI-generated image.
Bottles being loaded into a plastics recycling facility. AI-generated image.

A popular video conferencing application captions several hundred thousand virtual meetings an hour. When using CPUs to create live captions, the app could query a transformer-powered speech recognition AI model three times a second. After migrating to GPUs in the cloud, the application’s throughput increased to 200 queries per second — a 66x speedup and 25x energy-efficiency improvement.

In homes across the globe, an e-commerce website connects hundreds of millions of shoppers a day to the products they need using an advanced recommendation system powered by a deep learning model, running on its NVIDIA accelerated cloud computing system. After switching from CPUs to GPUs in the cloud, it achieved significantly lower latency with a 33x speedup and nearly 12x energy-efficiency improvement.

With the exponential growth of data, accelerated computing in the cloud is set to enable even more innovative use cases.

NVIDIA Accelerated Computing on CUDA GPUs Is Sustainable Computing

NVIDIA estimates that if all AI, HPC and data analytics workloads that are still running on CPU servers were CUDA GPU-accelerated, data centers would save 40 terawatt-hours of energy annually. That’s the equivalent energy consumption of 5 million U.S. homes per year.

Accelerated computing uses the parallel processing capabilities of CUDA GPUs to complete jobs orders of magnitude faster than CPUs, improving productivity while dramatically reducing cost and energy consumption.

Although adding GPUs to a CPU-only server increases peak power, GPU acceleration finishes tasks quickly and then enters a low-power state. The total energy consumed with GPU-accelerated computing is significantly lower than with general-purpose CPUs, while yielding superior performance.

Energy-efficiency improvements are achieved for on-premises, cloud-based and hybrid workloads when using accelerated computing on GPUs compared to CPUs.
GPUs achieve 20x greater energy efficiency compared to traditional computing on CPU-only servers because they deliver greater performance per watt, completing more tasks in less time.

In the past decade, NVIDIA AI computing has achieved approximately 100,000x more energy efficiency when processing large language models. To put that into perspective, if the efficiency of cars improved as much as NVIDIA has advanced the efficiency of AI on its accelerated computing platform, they’d get 500,000 miles per gallon. That’s enough to drive to the moon, and back, on less than a gallon of gasoline.

In addition to these dramatic boosts in efficiency on AI workloads, GPU computing can achieve incredible speedups over CPUs. Customers of the NVIDIA accelerated computing platform running workloads on cloud service providers saw speedups of 10-180x across a gamut of real-world tasks, from data processing to computer vision, as the chart below shows.

Data processing, scientific computing, speech AI, recommender systems, search, computer vision and other workloads run by cloud customers achieved 10-160x speedups.
Speedups of 10-180x achieved in real-world implementations by cloud customers across workloads with the NVIDIA accelerated computing platform.

As workloads continue to demand exponentially more computing power, CPUs have struggled to provide the necessary performance, creating a growing performance gap and driving “compute inflation.” The chart below illustrates a multiyear trend of how data growth has far outpaced the growth in compute performance per watt of CPUs.

A trend known as compute inflation is highlighted by a graph, with an arc showing CPU performance per watt scaling down while data growth quickly rises.
The widening gap between data growth and the lagging compute performance per watt of CPUs.

The energy savings of GPU acceleration frees up what would otherwise have been wasted cost and energy.

With its massive energy-efficiency savings, accelerated computing is sustainable computing.

The Right Tools for Every Job 

GPUs cannot accelerate software written for general-purpose CPUs. Specialized algorithm software libraries are needed to accelerate specific workloads. Just like a mechanic would have an entire toolbox from a screwdriver to a wrench for different tasks, NVIDIA provides a diverse set of libraries to perform low-level functions like parsing and executing calculations on data.

Each NVIDIA CUDA library is optimized to harness hardware features specific to NVIDIA GPUs. Combined, they encompass the power of the NVIDIA platform.

New updates continue to be added on the CUDA platform roadmap, expanding across diverse use cases:

LLM Applications

NeMo Curator gives developers the flexibility to quickly create custom datasets in large language model (LLM) use cases. Recently, we announced capabilities beyond text to expand to multimodal support, including image curation.

SDG (synthetic data generation) augments existing datasets with high-quality, synthetically generated data to customize and fine-tune models and LLM applications. We announced Nemotron-4 340B, a new suite of models specifically built for SDG that enables businesses and developers to use model outputs and build custom models.

Data Processing Applications

cuVS is an open-source library for GPU-accelerated vector search and clustering that delivers incredible speed and efficiency across LLMs and semantic search. The latest cuVS allows large indexes to be built in minutes instead of hours or even days, and searches them at scale.

Polars is an open-source library that makes use of query optimizations and other techniques to process hundreds of millions of rows of data efficiently on a single machine. A new Polars GPU engine powered by NVIDIA’s cuDF library will be available in open beta. It delivers up to a 10x performance boost compared to CPU, bringing the energy savings of accelerated computing to data practitioners and their applications.

Physical AI

Warp, for high-performance GPU simulation and graphics, helps accelerate spatial computing by making it easier to write differentiable programs for physics simulation, perception, robotics and geometry processing. The next release will have support for a new Tile API that allows developers to use Tensor Cores inside GPUs for matrix and Fourier computations.

Aerial is a suite of accelerated computing platforms that includes Aerial CUDA-Accelerated RAN and Aerial Omniverse Digital Twin for designing, simulating and operating wireless networks for commercial applications and industry research. The next release will include a new expansion of Aerial with more map formats for ray tracing and simulations with higher accuracy.

Sionna is a GPU-accelerated open-source library for link-level simulations of wireless and optical communication systems. With GPUs, Sionna achieves orders-of-magnitude faster simulation, enabling interactive exploration of these systems and paving the way for next-generation physical layer research. The next release will include the entire toolchain required to design, train and evaluate neural network-based receivers, including support for real-time inference of such neural receivers using NVIDIA TensorRT.

NVIDIA provides over 400 libraries. Some, like CV-CUDA, excel at pre- and post-processing of computer vision tasks common in user-generated video, recommender systems, mapping and video conferencing. Others, like cuDF, accelerate data frames and tables central to SQL databases and pandas in data science.

CAD – Computer-Aided Design, CAE – Computer-Aided Engineering, EDA – Electronic Design Automation

Many of these libraries are versatile — for example, cuBLAS for linear algebra acceleration — and can be used across multiple workloads, while others are highly specialized to focus on a specific use case, like cuLitho for silicon computational lithography.

For researchers who don’t want to build their own pipelines with NVIDIA CUDA-X libraries, NVIDIA NIM provides a streamlined path to production deployment by packaging multiple libraries and AI models into optimized containers. The containerized microservices deliver improved throughput out of the box.

Augmenting these libraries’ performance are an expanding number of hardware-based acceleration features that deliver speedups with the highest energy efficiencies. The NVIDIA Blackwell platform, for example, includes a decompression engine that unpacks compressed data files inline up to 18x faster than CPUs. This dramatically accelerates data processing applications that need to frequently access compressed files in storage like SQL, Apache Spark and pandas, and decompress them for runtime computation.

The integration of NVIDIA’s specialized CUDA GPU-accelerated libraries into cloud computing platforms delivers remarkable speed and energy efficiency across a wide range of workloads. This combination drives significant cost savings for businesses and plays a crucial role in advancing sustainable computing, helping billions of users relying on cloud-based workloads to benefit from a more sustainable and cost-effective digital ecosystem.

Learn more about NVIDIA’s sustainable computing efforts and check out the Energy Efficiency Calculator to discover potential energy and emissions savings.

See notice regarding software product information.

Read More

NVIDIA to Present Innovations at Hot Chips That Boost Data Center Performance and Energy Efficiency

NVIDIA to Present Innovations at Hot Chips That Boost Data Center Performance and Energy Efficiency

A deep technology conference for processor and system architects from industry and academia has become a key forum for the trillion-dollar data center computing market.

At Hot Chips 2024 next week, senior NVIDIA engineers will present the latest advancements powering the NVIDIA Blackwell platform, plus research on liquid cooling for data centers and AI agents for chip design.

They’ll share how:

  • NVIDIA Blackwell brings together multiple chips, systems and NVIDIA CUDA software to power the next generation of AI across use cases, industries and countries.
  • NVIDIA GB200 NVL72 — a multi-node, liquid-cooled, rack-scale solution that connects 72 Blackwell GPUs and 36 Grace CPUs — raises the bar for AI system design.
  • NVLink interconnect technology provides all-to-all GPU communication, enabling record high throughput and low-latency inference for generative AI.
  • The NVIDIA Quasar Quantization System pushes the limits of physics to accelerate AI computing.
  • NVIDIA researchers are building AI models that help build processors for AI.

An NVIDIA Blackwell talk, taking place Monday, Aug. 26, will also spotlight new architectural details and examples of generative AI models running on Blackwell silicon.

It’s preceded by three tutorials on Sunday, Aug. 25, that will cover how hybrid liquid-cooling solutions can help data centers transition to more energy-efficient infrastructure and how AI models, including large language model (LLM)-powered agents, can help engineers design the next generation of processors.

Together, these presentations showcase the ways NVIDIA engineers are innovating across every area of data center computing and design to deliver unprecedented performance, efficiency and optimization.

Be Ready for Blackwell

NVIDIA Blackwell is the ultimate full-stack computing challenge. It comprises multiple NVIDIA chips, including the Blackwell GPU, Grace CPU, BlueField data processing unit, ConnectX network interface card, NVLink Switch, Spectrum Ethernet switch and Quantum InfiniBand switch.

Ajay Tirumala and Raymond Wong, directors of architecture at NVIDIA, will provide a first look at the platform and explain how these technologies work together to deliver a new standard for AI and accelerated computing performance while advancing energy efficiency.

The multi-node NVIDIA GB200 NVL72 solution is a perfect example. LLM inference requires low-latency, high-throughput token generation. GB200 NVL72 acts as a unified system to deliver up to 30x faster inference for LLM workloads, unlocking the ability to run trillion-parameter models in real time.

Tirumala and Wong will also discuss how the NVIDIA Quasar Quantization System — which brings together algorithmic innovations, NVIDIA software libraries and tools, and Blackwell’s second-generation Transformer Engine — supports high accuracy on low-precision models, highlighting examples using LLMs and visual generative AI.

Keeping Data Centers Cool

The traditional hum of air-cooled data centers may become a relic of the past as researchers develop more efficient and sustainable solutions that use hybrid cooling, a combination of air and liquid cooling.

Liquid-cooling techniques move heat away from systems more efficiently than air, making it easier for computing systems to stay cool even while processing large workloads. The equipment for liquid cooling also takes up less space and consumes less power than air-cooling systems, allowing data centers to add more server racks — and therefore more compute power — in their facilities.

Ali Heydari, director of data center cooling and infrastructure at NVIDIA, will present several designs for hybrid-cooled data centers.

Some designs retrofit existing air-cooled data centers with liquid-cooling units, offering a quick and easy solution to add liquid-cooling capabilities to existing racks. Other designs require the installation of piping for direct-to-chip liquid cooling using cooling distribution units or by entirely submerging servers in immersion cooling tanks. Although these options demand a larger upfront investment, they lead to substantial savings in both energy consumption and operational costs.

Heydari will also share his team’s work as part of COOLERCHIPS, a U.S. Department of Energy program to develop advanced data center cooling technologies. As part of the project, the team is using the NVIDIA Omniverse platform to create physics-informed digital twins that will help them model energy consumption and cooling efficiency to optimize their data center designs.

AI Agents Chip In for Processor Design

Semiconductor design is a mammoth challenge at microscopic scale. Engineers developing cutting-edge processors work to fit as much computing power as they can onto a piece of silicon a few inches across, testing the limits of what’s physically possible.

AI models are supporting their work by improving design quality and productivity, boosting the efficiency of manual processes and automating some time-consuming tasks. The models include prediction and optimization tools to help engineers rapidly analyze and improve designs, as well as LLMs that can assist engineers with answering questions, generating code, debugging design problems and more.

Mark Ren, director of design automation research at NVIDIA, will provide an overview of these models and their uses in a tutorial. In a second session, he’ll focus on agent-based AI systems for chip design.

AI agents powered by LLMs can be directed to complete tasks autonomously, unlocking broad applications across industries. In microprocessor design, NVIDIA researchers are developing agent-based systems that can reason and take action using customized circuit design tools, interact with experienced designers, and learn from a database of human and agent experiences.

NVIDIA experts aren’t just building this technology — they’re using it. Ren will share examples of how engineers can use AI agents for timing report analysis, cell cluster optimization processes and code generation. The cell cluster optimization work recently won best paper at the first IEEE International Workshop on LLM-Aided Design.

Register for Hot Chips, taking place Aug. 25-27, at Stanford University and online.

Read More

Straight Out of Gamescom and Into Xbox PC Games, GeForce NOW Newly Supports Automatic Xbox Sign-In

Straight Out of Gamescom and Into Xbox PC Games, GeForce NOW Newly Supports Automatic Xbox Sign-In

Straight out of Gamescom, NVIDIA introduced GeForce NOW support for Xbox automatic sign-in, as well as Black Myth: Wukong from Game Science and a demo for the PC launch of FINAL FANTASY XVI from Square Enix — all available in the cloud today.

There are more triple-A games coming to the cloud this GFN Thursday: Civilization VI, Civilization V and Civilization: Beyond Earth — some of the first games from publisher 2K — are available today for members to stream with GeForce quality.

And members can look forward to playing the highly anticipated Indiana Jones and the Great Circle from Bethesda when it joins the cloud later this year.

Plus, GeForce NOW has added a data center in Warsaw, Poland, expanding low-latency, high-performance cloud gaming to members in the region.

It’s an action-packed GFN Thursday, with 25 new titles joining the cloud this week.

Instant Play, Every Day

XBOX SSO on GeForce NOW
Auto sign-in, auto win.

GeForce NOW is streamlining gaming convenience with Xbox account integration. Starting today, members can link their Xbox profile directly to the cloud service. After initial setup, members will be logged in automatically across all devices for future cloud gaming sessions, enabling them to dive straight into their favorite PC games.

The new feature builds on existing support for Epic Games and Ubisoft automatic sign-in — and complements Xbox game sync, which adds supported PC Game Pass and Microsoft Store titles to members’ cloud libraries. Gamers can enjoy a cohesive experience accessing over 140 PC Game Pass titles across devices without the need for repeated logins.

Go Bananas

Black Myth Wukong on GeForce NOW
No monkey business in the cloud — just high-performance gameplay.

Black Myth: Wukong, the highly anticipated action role-playing game (RPG) based on Chinese mythology, is now available to stream from the cloud.

Embark on the Monkey King’s epic journey in the action RPG inspired by Chinese mythology, wielding magical abilities and battling fierce monsters and gods across the breathtaking landscapes of ancient China.

GeForce NOW Ultimate members can experience the game’s stunning visuals and fluid combat — enhanced by NVIDIA RTX technologies such as full ray tracing and DLSS 3 — at up to 4K resolution and 120 frames per second, bringing the mystical world of Black Myth: Wukong to life.

Fantasy Becomes Reality

FINAL FANTASY XVI Demo on GeForce NOW
Eikon-ic battles, epic tales.

The latest mainline numbered entry in Square Enix’s renowned RPG series, FINAL FANTASY XVI will join the cloud when it launches on PC later this month. Members can try a demo of the highly anticipated game today.

Take a journey through the epic, dark-fantasy world of Valisthea, a land dominated by colossal Mothercrystals and divided among six powerful nations teetering on the brink of conflict. Follow the story of Clive Rosfield, a young knight on a quest for vengeance after a tragic betrayal. Dive into the high-octane action with real-time combat for fast-paced, dynamic battles that emphasize strategy and skill.

The demo offers a taste of the game’s stunning visuals, intricate storyline and innovative combat options. With GeForce NOW, gamers can experience the breathtaking world of Valisthea and stream it at up to 4K and 120 frames per second with an Ultimate membership.

Everybody Wants to Rule the World

Civ games on GeForce NOW
Guide a rising nation to glory, and expand through diplomacy and other tactics in “Sid Meier’s Civilization VI.”

Becoming history’s greatest leader has never been easier — the Sid Meier’s Civilization franchise from 2K is now available on GeForce NOW.

Since 1991, the award-winning Civilization series of turn-based strategy games has challenged players to build an empire to stand the test of time. Players assume the role of a famous historical leader, making crucial economic, political and military decisions to pursue prosperity and secure a path to victory.

Members can lead, expand and conquer from the cloud in the latest entries from the franchise, including Sid Meier’s Civilization VI, Civilization V, Civilization IV and Civilization: Beyond Earth. Manage a budding nation with support for ultrawide resolutions, and build empires on the go using low-powered devices like Chromebooks, Macs and more.

Adventure Calls, Dr. Jones

Indiana Jones and the Great Circle coming soon to GeForce NOW
Gameplay so good, it belongs in a museum.

Uncover one of history’s greatest mysteries in Indiana Jones and the Great Circle. Members can stream the cinematic action-adventure game from the award-winning producers Bethesda Softworks, Lucasfilm and MachineGames at GeForce NOW Ultimate quality from the cloud when the title launches later this year.

In 1937, sinister forces are scouring the globe for the secret to an ancient power connected to the Great Circle, and only Indiana Jones can stop them. Become the legendary archaeologist and venture from the hallowed halls of the Vatican and the sunken temples of Sukhothai to the pyramids of Egypt and snowy Himalayan peaks.

Ultimate members can stream the game at up to 4K resolution and 120 fps, even on low-powered devices — as well as experience the adventure with support for full ray tracing, accelerated and enhanced by NVIDIA DLSS 3.5 with Ray Reconstruction.

Let’s Play Today

Skull & Bones S3
The latest ‘Skull and Bones’ is available to play from the cloud without waiting for updates.

In the newest season of Skull and Bones, gear up to face imminent dangers on scorched seas — from the formidable Li Tian Ning and Commander Zhang, to a ferocious dragon that descends from the heavens. Join the battle in season 3 to earn exclusive new rewards through time-limited events such as Mooncake Regatta and Requiem of the Lost. Discover new quality-of-life improvements including a new third-person camera while at sea, new endgame features and an expanded Black Market.

Members can look for the following games available to stream in the cloud this week:

  • Black Myth: Wukong (New release on Steam and Epic Games Store, Aug. 19)
  • Final Fantasy XVI Demo (New release on Steam, Aug. 19)
  • GIGANTIC: RAMPAGE EDITION (Available on Epic Games Store, free Aug. 22)
  • Skull & Bones (New release on Steam, Aug. 22)
  • Alan Wake’s American Nightmare (Xbox, available on Microsoft Store)
  • Commandos 3 – HD Remaster (Xbox, available on Microsoft Store)
  • Desperados III (Xbox, available on Microsoft Store)
  • The Dungeon Of Naheulbeuk: The Amulet Of Chaos (Xbox, available on Microsoft Store)
  • The Flame in the Flood (Xbox, available on Microsoft Store)
  • FTL: Faster Than Light (Xbox, available on Microsoft Store)
  • Genesis Noir (Xbox, available on PC Game Pass)
  • House Flipper (Xbox, available on PC Game Pass)
  • Medieval Dynasty (Xbox, available on PC Game Pass)
  • My Time At Portia (Xbox, available on PC Game Pass)
  • Night in the Wood (Xbox, available on Microsoft Store )
  • Offworld Trading Company (Xbox, available on PC Game Pass)
  • Orwell: Keeping an Eye On You (Xbox, available on Microsoft Store)
  • Project Winter (Xbox, available on Microsoft Store)
  • Shadow Tactics: Blades of the Shogun (Xbox, available on Microsoft Store)
  • Sid Meier’s Civilization VI (Steam, Epic Games Store and Xbox, available on the Microsoft store)
  • Sid Meier’s Civilization V (Steam)
  • Sid Meier’s Civilization IV (Steam)
  • Sid Meier’s Civilization: Beyond Earth (Steam)
  • Spirit of the North (Xbox, available on PC Game Pass)
  • Wreckfest (Xbox, available on PC Game Pass)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Lightweight Champ: NVIDIA Releases Small Language Model With State-of-the-Art Accuracy

Lightweight Champ: NVIDIA Releases Small Language Model With State-of-the-Art Accuracy

Developers of generative AI typically face a tradeoff between model size and accuracy. But a new language model released by NVIDIA delivers the best of both, providing state-of-the-art accuracy in a compact form factor.

Mistral-NeMo-Minitron 8B — a miniaturized version of the open Mistral NeMo 12B model released by Mistral AI and NVIDIA last month — is small enough to run on an NVIDIA RTX-powered workstation while still excelling across multiple benchmarks for AI-powered chatbots, virtual assistants, content generators and educational tools. Minitron models are distilled by NVIDIA using NVIDIA NeMo, an end-to-end platform for developing custom generative AI.

“We combined two different AI optimization methods — pruning to shrink Mistral NeMo’s 12 billion parameters into 8 billion, and distillation to improve accuracy,” said Bryan Catanzaro, vice president of applied deep learning research at NVIDIA. “By doing so, Mistral-NeMo-Minitron 8B delivers comparable accuracy to the original model at lower computational cost.”

Unlike their larger counterparts, small language models can run in real time on workstations and laptops. This makes it easier for organizations with limited resources to deploy generative AI capabilities across their infrastructure while optimizing for cost, operational efficiency and energy use. Running language models locally on edge devices also delivers security benefits, since data doesn’t need to be passed to a server from an edge device.

Developers can get started with Mistral-NeMo-Minitron 8B packaged as an NVIDIA NIM microservice with a standard application programming interface (API) — or they can download the model from Hugging Face. A downloadable NVIDIA NIM, which can be deployed on any GPU-accelerated system in minutes, will be available soon.

State-of-the-Art for 8 Billion Parameters

For a model of its size, Mistral-NeMo-Minitron 8B leads on nine popular benchmarks for language models. These benchmarks cover a variety of tasks including language understanding, common sense reasoning, mathematical reasoning, summarization, coding and ability to generate truthful answers.

Packaged as an NVIDIA NIM microservice, the model is optimized for low latency, which means faster responses for users, and high throughput, which corresponds to higher computational efficiency in production.

In some cases, developers may want an even smaller version of the model to run on a smartphone or an embedded device like a robot. To do so, they can download the 8-billion-parameter model and, using NVIDIA AI Foundry, prune and distill it into a smaller, optimized neural network customized for enterprise-specific applications.

The AI Foundry platform and service offers developers a full-stack solution for creating a customized foundation model packaged as a NIM microservice. It includes popular foundation models, the NVIDIA NeMo platform and dedicated capacity on NVIDIA DGX Cloud. Developers using NVIDIA AI Foundry can also access NVIDIA AI Enterprise, a software platform that provides security, stability and support for production deployments.

Since the original Mistral-NeMo-Minitron 8B model starts with a baseline of state-of-the-art accuracy, versions downsized using AI Foundry would still offer users high accuracy with a fraction of the training data and compute infrastructure.

Harnessing the Perks of Pruning and Distillation 

To achieve high accuracy with a smaller model, the team used a process that combines pruning and distillation. Pruning downsizes a neural network by removing model weights that contribute the least to accuracy. During distillation, the team retrained this pruned model on a small dataset to significantly boost accuracy, which had decreased through the pruning process.

The end result is a smaller, more efficient model with the predictive accuracy of its larger counterpart.

This technique means that a fraction of the original dataset is required to train each additional model within a family of related models, saving up to 40x the compute cost when pruning and distilling a larger model compared to training a smaller model from scratch.

Read the NVIDIA Technical Blog and a technical report for details.

NVIDIA also announced this week Nemotron-Mini-4B-Instruct, another small language model optimized for low memory usage and faster response times on NVIDIA GeForce RTX AI PCs and laptops. The model is available as an NVIDIA NIM microservice for cloud and on-device deployment and is part of NVIDIA ACE, a suite of digital human technologies that provide speech, intelligence and animation powered by generative AI.

Experience both models as NIM microservices from a browser or an API at ai.nvidia.com.

See notice regarding software product information.

Read More

SLMming Down Latency: How NVIDIA’s First On-Device Small Language Model Makes Digital Humans More Lifelike

SLMming Down Latency: How NVIDIA’s First On-Device Small Language Model Makes Digital Humans More Lifelike

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for RTX PC and workstation users.

At Gamescom this week, NVIDIA announced that NVIDIA ACE — a suite of technologies for bringing digital humans to life with generative AI — now includes the company’s first on-device small language model (SLM), powered locally by RTX AI.

The model, called Nemotron-4 4B Instruct, provides better role-play, retrieval-augmented generation and function-calling capabilities, so game characters can more intuitively comprehend player instructions, respond to gamers, and perform more accurate and relevant actions.

Available as an NVIDIA NIM microservice for cloud and on-device deployment by game developers, the model is optimized for low memory usage, offering faster response times and providing developers a way to take advantage of over 100 million GeForce RTX-powered PCs and laptops and NVIDIA RTX-powered workstations.

The SLM Advantage

An AI model’s accuracy and performance depends on the size and quality of the dataset used for training. Large language models are trained on vast amounts of data, but are typically general-purpose and contain excess information for most uses.

SLMs, on the other hand, focus on specific use cases. So even with less data, they’re capable of delivering more accurate responses, more quickly — critical elements for conversing naturally with digital humans.

Nemotron-4 4B was first distilled from the larger Nemotron-4 15B LLM. This process requires the smaller model, called a “student,” to mimic the outputs of the larger model, appropriately called a “teacher.” During this process, noncritical outputs of the student model are pruned or removed to reduce the parameter size of the model. Then, the SLM is quantized, which reduces the precision of the model’s weights.

With fewer parameters and less precision, Nemotron-4 4B has a lower memory footprint and faster time to first token — how quickly a response begins — than the larger Nemotron-4 LLM while still maintaining a high level of accuracy due to distillation. Its smaller memory footprint also means games and apps that integrate the NIM microservice can run locally on more of the GeForce RTX AI PCs and laptops and NVIDIA RTX AI workstations that consumers own today.

This new, optimized SLM is also purpose-built with instruction tuning, a technique for fine-tuning models on instructional prompts to better perform specific tasks. This can be seen in Mecha BREAK, a video game in which players can converse with a mechanic game character and instruct it to switch and customize mechs.

ACEs Up

ACE NIM microservices allow developers to deploy state-of-the-art generative AI models through the cloud or on RTX AI PCs and workstations to bring AI to their games and applications. With ACE NIM microservices, non-playable characters (NPCs) can dynamically interact and converse with players in the game in real time.

ACE consists of key AI models for speech-to-text, language, text-to-speech and facial animation. It’s also modular, allowing developers to choose the NIM microservice needed for each element in their particular process.

NVIDIA Riva automatic speech recognition (ASR) processes a user’s spoken language and uses AI to deliver a highly accurate transcription in real time. The technology builds fully customizable conversational AI pipelines using GPU-accelerated multilingual speech and translation microservices. Other supported ASRs include OpenAI’s Whisper, a open-source neural net that approaches human-level robustness and accuracy on English speech recognition.

Once translated to digital text, the transcription goes into an LLM — such as Google’s Gemma, Meta’s Llama 3 or now NVIDIA Nemotron-4 4B — to start generating a response to the user’s original voice input.

Next, another piece of Riva technology — text-to-speech — generates an audio response. ElevenLabs’ proprietary AI speech and voice technology is also supported and has been demoed as part of ACE, as seen in the above demo.

Finally, NVIDIA Audio2Face (A2F) generates facial expressions that can be synced to dialogue in many languages. With the microservice, digital avatars can display dynamic, realistic emotions streamed live or baked in during post-processing.

The AI network automatically animates face, eyes, mouth, tongue and head motions to match the selected emotional range and level of intensity. And A2F can automatically infer emotion directly from an audio clip.

Finally, the full character or digital human is animated in a renderer, like Unreal Engine or the NVIDIA Omniverse platform.

AI That’s NIMble

In addition to its modular support for various NVIDIA-powered and third-party AI models, ACE allows developers to run inference for each model in the cloud or locally on RTX AI PCs and workstations.

The NVIDIA AI Inference Manager software development kit allows for hybrid inference based on various needs such as experience, workload and costs. It streamlines AI model deployment and integration for PC application developers by preconfiguring the PC with the necessary AI models, engines and dependencies. Apps and games can then orchestrate inference seamlessly across a PC or workstation to the cloud.

ACE NIM microservices run locally on RTX AI PCs and workstations, as well as in the cloud. Current microservices running locally include Audio2Face, in the Covert Protocol tech demo, and the new Nemotron-4 4B Instruct and Whisper ASR in Mecha BREAK.

To Infinity and Beyond

Digital humans go far beyond NPCs in games. At last month’s SIGGRAPH conference, NVIDIA previewed “James,” an interactive digital human that can connect with people using emotions, humor and more. James is based on a customer-service workflow using ACE.

Interact with James at ai.nvidia.com.

Changes in communication methods between humans and technology over the decades eventually led to the creation of digital humans. The future of the human-computer interface will have a friendly face and require no physical inputs.

Digital humans drive more engaging and natural interactions. According to Gartner, 80% of conversational offerings will embed generative AI by 2025, and 75% of customer-facing applications will have conversational AI with emotion. Digital humans will transform multiple industries and use cases beyond gaming, including customer service, healthcare, retail, telepresence and robotics.

Users can get a glimpse of this future now by interacting with James in real time at ai.nvidia.com.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More

How Snowflake Is Unlocking the Value of Data With Large Language Models

How Snowflake Is Unlocking the Value of Data With Large Language Models

Snowflake is using AI to help enterprises transform data into insights and applications. In this episode of NVIDIA’s AI Podcast, host Noah Kravitz and Baris Gultekin, head of AI at Snowflake, discuss how the company’s AI Data Cloud platform enables customers to access and manage data at scale. By separating the storage of data from compute, Snowflake has allowed organizations across the world to connect via cloud technology and work on a unified platform — eliminating data silos and streamlining collaborative workflows.

Time Stamps

1:45: What does Snowflake do?
3:18: Snowflake’s AI and data strategies — building a platform with natural language analysis
5:30: How to efficiently access large language models with Snowflake Cortex
11:49: Snowflake’s open-source LLM: Arctic
16:18: Gultekin’s journey in AI and data science
23:05: The AI industry in three to five years — real-world applications of Snowflake technology
27:54: Gutlekin’s advice for professionals interested in AI

You Might Also Like:

How Roblox Uses Generative AI to Enhance User Experiences – Ep. 227

Roblox is a colorful online platform reimagining the way people come together. Anupam Singh, vice president of AI and growth engineering at Roblox, discusses how the company uses generative AI to enhance virtual experiences and bolster inclusivity and user safety.

NVIDIA’s Jim Fan Delves Into Large Language Models and Their Industry Impact – Ep. 204

Most know Minecraft as the popular blocky sandbox game, but for Jim Fan, senior AI scientist at NVIDIA, Minecraft was the perfect place to test the decision-making agency of AI models. Fan discusses how he used large language models to research open-ended AI agents and create Voyager, an AI bot built with Chat GPT-4 that can autonomously play Minecraft.

Media.Monk’s Lewis Smithingham on Enhancing Media and Marketing With AI – Ep. 222

Media.Monks’ platform Wormhole streamlines marketing and content creation workflows with AI-powered insights. Lewis Smithingham, senior vice president of innovation and special operations at Media.Monks, addresses AI’s potential in the entertainment and advertisement industries.

NVIDIA’s Annamali Chockalingam on the Rise of LLMs – Ep. 206

LLMs are in the spotlight, capable of tasks like generation, summarization, translation, instruction and chatting. Annamalai Chockalingam, senior product manager of developer marketing at NVIDIA, discusses how a combination of these modalities and actions can build applications to solve any problem.

Subscribe to the AI Podcast

Get the AI Podcast through iTunes, Google Play, Amazon Music, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

Read More

High-Tech Highways: India Uses NVIDIA Accelerated Computing to Ease Tollbooth Traffic

High-Tech Highways: India Uses NVIDIA Accelerated Computing to Ease Tollbooth Traffic

India is home to the globe’s second-largest road network, spanning nearly 4 million miles, and has over a thousand tollbooths, most of them run manually.

Traditional booths like these, wherever in the world they’re deployed, can contribute to massive traffic delays, long commute times and serious road congestion.

To help automate tollbooths across India, Calsoft, an Indian-American technology company, helped implement a broad range of NVIDIA technologies integrated with the country’s dominant payment system, known as the unified payments interface, or UPI, for a client.

Manual tollbooths demand more time and labor compared to automated ones. However, automating India’s toll systems faces an extra complication: the diverse range of license plates.

India’s non-standardized plates pose a significant challenge to the accuracy of automatic number plate recognition (ANPR) systems. Any implementation would need to address these plate variations, which include divergent color, sizing, font styles and placement upon vehicles, as well as many different languages.

The solution Calsoft helped build automatically reads passing vehicle plates and charges the associated driver’s UPI account. This approach reduces the need for manual toll collection and is a massive step toward addressing traffic in the region.

Automation in Action

As part of a pilot program, this solution has been deployed in several leading metropolitan cities. The solution provides about 95% accuracy in its ability to read plates through the use of an ANPR pipeline that detects and classifies the plates as they roll through tollbooths.

NVIDIA’s technology has been crucial in this effort, according to Vipin Shankar, senior vice president of technology at Calsoft. “Particularly challenging was night-time detection,” he said. “Another challenge was model accuracy improvement on pixel distortions due to environmental impacts like fog, heavy rains, reflections due to bright sunshine, dusty winds and more.”

The solution uses NVIDIA Metropolis to track and detect vehicles throughout the process. Metropolis is an application framework, a set of developer tools and a partner ecosystem that brings visual data and AI together to improve operational efficiency and safety across a range of industries.

Calsoft engineers used NVIDIA Triton Inference Server software to deploy and manage their AI models. The team also used the NVIDIA DeepStream software development kit to build a real-time streaming platform. This was key for processing and analyzing data streams efficiently, incorporating advanced capabilities such as real-time object detection and classification.

Calsoft uses NVIDIA hardware, including NVIDIA Jetson edge AI modules and NVIDIA A100 Tensor Core GPUs in its AI solutions. Calsoft’s tollbooth solution is also scalable, meaning it’s designed to accommodate future growth and expansion needs, and can better ensure sustained performance and adaptability as traffic conditions evolve.

Learn how NVIDIA Metropolis has helped other municipalities, like Raleigh, North Carolina, better manage traffic flow and enhance pedestrian safety. 

Read More