Gemma, Meet NIM: NVIDIA Teams Up With Google DeepMind to Drive Large Language Model Innovation

Gemma, Meet NIM: NVIDIA Teams Up With Google DeepMind to Drive Large Language Model Innovation

Large language models that power generative AI are seeing intense innovation — models that handle multiple types of data such as text, image and sounds are becoming increasingly common. 

However, building and deploying these models remains challenging. Developers need a way to quickly experience and evaluate models to determine the best fit for their use case, and then optimize the model for performance in a way that not only is cost-effective but offers the best performance.

To make it easier for developers to create AI-powered applications with world-class performance, NVIDIA and Google today announced three new collaborations at Google I/O ‘24. 

Gemma + NIM

Using TensorRT-LLM, NVIDIA is working with Google to optimize two new models it introduced at the event: Gemma 2 and PaliGemma. These models are built from the same research and technology used to create the Gemini models, and each is focused on a specific area: 

  • Gemma 2 is the next generation of Gemma models for a broad range of use cases and features a brand new architecture designed for breakthrough performance and efficiency.
  • PaliGemma is an open vision language model (VLM) inspired by PaLI-3. Built on open components including the SigLIP vision model and the Gemma language model, PaliGemma is designed for vision-language tasks such as image and short video captioning, visual question answering, understanding text in images, object detection and object segmentation. PaliGemma is designed for class-leading fine-tuning performance on a wide range of vision-language tasks and is also supported by NVIDIA JAX-Toolbox.

Gemma 2 and PaliGemma will be offered with NVIDIA NIM inference microservices, part of the NVIDIA AI Enterprise software platform, which simplifies the deployment of AI models at scale. NIM support for the two new models are available from the API catalog, starting with PaliGemma today; they soon will be released as containers on NVIDIA NGC and GitHub. 

Bringing Accelerated Data Analytics to Colab

Google also announced that RAPIDS cuDF, an open-source GPU dataframe library, is now supported by default on Google Colab, one of the most popular developer platforms for data scientists. It now takes just a few seconds for Google Colab’s 10 million monthly users to accelerate pandas-based Python workflows by up to 50x using NVIDIA L4 Tensor Core GPUs, with zero code changes.

With RAPIDS cuDF, developers using Google Colab can speed up exploratory analysis and production data pipelines. While pandas is one of the world’s most popular data processing tools due to its intuitive API, applications often struggle as their data sizes grow. With even 5-10GB of data, many simple operations can take minutes to finish on a CPU, slowing down exploratory analysis and production data pipelines.

RAPIDS cuDF is designed to solve this problem by seamlessly accelerating pandas code on GPUs where applicable, and falling back to CPU-pandas where not. With RAPIDS cuDF available by default on Colab, all developers everywhere can leverage accelerated data analytics.

Taking AI on the Road 

By employing AI PCs using NVIDIA RTX graphics, Google and NVIDIA also announced a Firebase Genkit collaboration that enables app developers to easily integrate generative AI models, like the new family of Gemma models, into their web and mobile applications to deliver custom content, provide semantic search and answer questions. Developers can start work streams using local RTX GPUs before moving their work seamlessly to Google Cloud infrastructure.

To make this even easier, developers can build apps with Genkit using JavaScript, a programming language mobile developers commonly use to build their apps.

The Innovation Beat Goes On

NVIDIA and Google Cloud are collaborating in multiple domains to propel AI forward. From the upcoming Grace Blackwell-powered DGX Cloud platform and JAX framework support, to bringing the NVIDIA NeMo framework to Google Kubernetes Engine, the companies’ full-stack partnership expands the possibilities of what customers can do with AI using NVIDIA technologies on Google Cloud.

Read More

CaLLM, Cool and Connected: Cerence Uses Generative AI to Transform the In-Car Experience

CaLLM, Cool and Connected: Cerence Uses Generative AI to Transform the In-Car Experience

The integration of AI has become pivotal in shaping the future of driving experiences. As vehicles transition into smart, connected entities, the demand for intuitive human-machine interfaces and advanced driver assistance systems has surged

In this journey toward automotive intelligence, Cerence, a global leader in AI-powered mobility solutions, is tapping NVIDIA’s core expertise in automotive cloud and edge technologies to redefine the in-car user experience.

In a recent video, Iqbal Arshad, chief technology officer of Cerence, emphasized the point, stating: “Generative AI is the single biggest change that’s happening in the tech industry overall.”

The cornerstone of Cerence’s vision lies in the development of its automotive-specific Cerence Automotive Large Language Model, or CaLLM. It serves as the foundation for the company’s next-gen in-car computing platform, running on NVIDIA DRIVE.

The platform, unveiled in December, showcases the future of in-car interaction, with an automotive- and mobility-specific assistant that provides an integrated in-cabin experience.

“We have datasets from the last 20 years of experience working in the automotive space,” Iqbal said. “And we’re able to take that data and make that an automotive-ready LLM.”

Generative AI a Game-Changer for the Automotive Industry

Generative AI enables vehicles to comprehend and respond to human language with remarkable accuracy, revolutionizing the way drivers interact with their cars.

Whether it’s initiating voice commands for navigation, controlling infotainment systems or even engaging in natural language conversations, generative AI opens a realm of possibilities for creating more convenient and enjoyable driving experiences.

Cerence is striving to empower vehicles with the cognitive capabilities necessary to seamlessly assist drivers in navigating their daily routines.

The company leverages NVIDIA DGX Cloud on Microsoft Azure, providing dedicated, scalable access to the latest NVIDIA architecture, co-engineered at every layer with Microsoft Azure, optimized for peak performance in AI workload training. NVIDIA’s inferencing technology helps Cerence deliver real-time performance, facilitating seamless user experiences.

As Cerence sees it, the future is one of intelligent driving, where vehicles aren’t just modes of transportation, but trusted companions on the road ahead.

“Generative computing is going to change your in-car experience,” said Iqbal.

With generative AI at its core, driving will evolve into a personalized, connected and, ultimately, safer experience for all.

Read More

NVIDIA to Help Elevate Japan’s Sovereign AI Efforts Through Generative AI Infrastructure Build-Out

NVIDIA to Help Elevate Japan’s Sovereign AI Efforts Through Generative AI Infrastructure Build-Out

Following an announcement by Japan’s Ministry of Economy, Trade and Industry, NVIDIA will play a central role in developing the nation’s generative AI infrastructure as Japan seeks to capitalize on the technology’s economic potential and further develop its workforce.

NVIDIA is collaborating with key digital infrastructure providers, including GMO Internet Group, Highreso, KDDI Corporation, RUTILEA, SAKURA internet Inc. and SoftBank Corp., which the ministry has certified to spearhead the development of cloud infrastructure crucial for AI applications.

Over the last two months, the ministry announced plans to allocate $740 million, approximately ¥114.6 billion, to assist six local firms in this initiative. Following on from last year, this is a significant effort by the Japanese government to subsidize AI computing resources, by expanding the number of companies involved.

With this move, Japan becomes the latest nation to embrace the concept of sovereign AI, aiming to fortify its local startups, enterprises and research efforts with advanced AI technologies.

Across the globe, nations are building up domestic computing capacity through various models. Some procure and operate sovereign AI clouds with state-owned telecommunications providers or utilities. Others are sponsoring local cloud partners to provide a shared AI computing platform for public and private sector use.

Japan’s initiative follows NVIDIA founder and CEO Jensen Huang’s visit last year, where he met with political and business leaders — including Japanese Prime Minister Fumio Kishida — to discuss the future of AI.

During his trip, Huang emphasized that “AI factories” — next-generation data centers designed to handle the most computationally intensive AI tasks — are crucial for turning vast amounts of data into intelligence. “The AI factory will become the bedrock of modern economies across the world,” Huang said during a meeting with the Japanese press in December.

The Japanese government plans to subsidize a significant portion of the costs for building AI supercomputers, which will facilitate AI adoption, enhance workforce skills, support Japanese language model development and bolster resilience against natural disasters and climate change.

Under the country’s Economic Security Promotion Act, the ministry aims to secure a stable supply of local cloud services, reducing the time and cost of developing next-generation AI technologies.

Japan’s technology powerhouses are already moving fast to embrace AI. Last week, SoftBank Corp. announced that it will invest ¥150 billion, approximately $960 million, for its plan to expand the infrastructure needed to develop Japan’s top-class AI, including purchases of NVIDIA accelerated computing.

The news follows Huang’s meetings with leaders in Canada, France, India, Japan, Malaysia, Singapore and Vietnam over the past year, as countries seek to supercharge their regional economies and embrace challenges such as climate change with AI.

Read More

Drug Discovery, STAT! NVIDIA, Recursion Speed Pharma R&D With AI Supercomputer

Drug Discovery, STAT! NVIDIA, Recursion Speed Pharma R&D With AI Supercomputer

Described as the largest system in the pharmaceutical industry, BioHive-2 at the Salt Lake City headquarters of Recursion debuts today at No. 35, up more than 100 spots from its predecessor on the latest TOP500 list of the world’s fastest supercomputers.

The advance represents the company’s most recent effort to accelerate drug discovery with NVIDIA technologies.

“Just as with large language models, we see AI models in the biology domain improve performance substantially as we scale our training with more data and compute horsepower, which ultimately leads to greater impacts on patients’ lives,” said Recursion’s CTO, Ben Mabey, who’s been applying machine learning to healthcare for more than a decade.

BioHive-2 packs 504 NVIDIA H100 Tensor Core GPUs linked on an NVIDIA Quantum-2 InfiniBand network to deliver 2 exaflops of AI performance. The resulting NVIDIA DGX SuperPOD is nearly 5x faster than Recursion’s first-generation system, BioHive-1.

Performance Powers Through Complexity

That performance is key to rapid progress because “biology is insanely complex,” Mabey said.

Finding a new drug candidate can take scientists years performing millions of wet-lab experiments.

That work is vital; Recursion’s scientists run more than 2 million such experiments a week. But going forward, they’ll use AI models on BioHive-2 to direct their platform to the most promising biology areas to run their experiments.

“With AI in the loop today, we can get 80% of the value with 40% of the wet lab work, and that ratio will improve going forward,” he said.

Biological Data Propels Healthcare AI

Recursion is collaborating with biopharma companies such as Bayer AG, Roche and Genentech. Over time, it also amassed a more than 50-petabyte database of biological, chemical and patient data, helping it build powerful AI models that are accelerating drug discovery.

“We believe it’s one of the largest biological datasets on Earth — it was built with AI training in mind, intentionally spanning biology and chemistry,” said Mabey, who joined the company more than seven years ago in part due to its commitment to building such a dataset.

Creating an AI Phenomenon

Processing that data on BioHive-1, Recursion developed a family of foundation models called Phenom. They turn a series of microscopic cellular images into meaningful representations for understanding the underlying biology.

A member of that family, Phenom-Beta, is now available as a cloud API and the first third-party model on NVIDIA BioNeMo, a generative AI platform for drug discovery.

Over several months of research and iteration, BioHive-1 trained Phenom-1 using more than 3.5 billion cellular images. Recursion’s expanded system enables training even more powerful models with larger datasets in less time.

The company also used NVIDIA DGX Cloud, hosted by Oracle Cloud Infrastructure, to provide additional supercomputing resources to power their work.

Animation of how Recursion trains AI models for drug discovery on NVIDIA GPUs
Much like how LLMs are trained to generate missing words in a sentence, Phenom models are trained by asking them to generate the masked out pixels in images of cells.

The Phenom-1 model serves Recursion and its partners in several ways, including finding and optimizing molecules to treat a variety of diseases and cancers. Earlier models helped Recursion predict drug candidates for COVID-19 nine out of 10 times.

The company announced its collaboration with NVIDIA in July. Less than 30 days later, the combination of BioHive-1 and DGX Cloud screened and analyzed a massive chemical library to predict protein targets for approximately 36 billion chemical compounds.

In January, the company demonstrated LOWE, an AI workflow engine with a natural-language interface to help make its tools more accessible to scientists. And in April it described a billion-parameter AI model it built to provide a new way to predict the properties of key molecules of interest in healthcare.

Recursion uses NVIDIA software to optimize its systems.

“We love CUDA and NVIDIA AI Enterprise, and we’re looking to see if NVIDIA NIM can help us distribute our models more easily, both internally and to partners,” he said.

A Shared Vision for Healthcare

The efforts are part of a broad vision that Jensen Huang, NVIDIA founder and CEO, described in a fireside chat with Recursion’s chairman as moving toward simulating biology.

“You can now recognize and learn the language of almost anything with structure, and you can translate it to anything with structure … This is the generative AI revolution,” Huang said.

“We share a similar view,” said Mabey.

“We are in the early stages of a very interesting time where just as computers accelerated chip design, AI can speed up drug design. Biology is much more complex, so it will take years to play out, but looking back, people will see this was a real turning point in healthcare,” he added.

Learn about NVIDIA’s AI platform for healthcare and life sciences and subscribe to NVIDIA healthcare news.

Pictured at top: BioHive-2 with a few members of the Recursion team (from left) Paige Despain, John Durkin, Joshua Fryer, Jesse Dean, Ganesh Jagannathan, Chris Gibson, Lindsay Ellinger, Michael Secora, Alex Timofeyev, and Ben Mabey. 

Read More

NVIDIA Blackwell Platform Pushes the Boundaries of Scientific Computing

NVIDIA Blackwell Platform Pushes the Boundaries of Scientific Computing

Quantum computing. Drug discovery. Fusion energy. Scientific computing and physics-based simulations are poised to make giant steps across domains that benefit humanity as advances in accelerated computing and AI drive the world’s next big breakthroughs.

NVIDIA unveiled at GTC in March the NVIDIA Blackwell platform, which promises generative AI on trillion-parameter large language models (LLMs) at up to 25x less cost and energy consumption than the NVIDIA Hopper architecture.

Blackwell has powerful implications for AI workloads, and its technology capabilities can also help to deliver discoveries across all types of scientific computing applications, including traditional numerical simulation.

By reducing energy costs, accelerated computing and AI drive sustainable computing. Many scientific computing applications already benefit. Weather can be simulated at 200x lower cost and with 300x less energy, while digital twin simulations have 65x lower cost and 58x less energy consumption versus traditional CPU-based systems and others.

Multiplying Scientific Computing Simulations With Blackwell

Scientific computing and physics-based simulation often rely on what’s known as double-precision formats, or FP64 (floating point), to solve problems. Blackwell GPUs deliver 30% more FP64 and FP32 FMA (fused multiply-add) performance  than Hopper.

Physics-based simulations are critical to product design and development. From planes and trains to bridges, silicon chips and pharmaceuticals — testing and improving products in simulation saves researchers and developers billions of dollars.

Today application-specific integrated circuits (ASICs) are designed almost exclusively on CPUs in a long and complex workflow, including analog analysis to identify voltages and currents.

But that’s changing. The Cadence SpectreX simulator is one example of an analog circuit design solver. SpectreX circuit simulations are projected to run 13x quicker on a GB200 Grace Blackwell Superchip — which connects Blackwell GPUs and Grace CPUs — than on a traditional CPU.

Also, GPU-accelerated computational fluid dynamics, or CFD, has become a key tool. Engineers and equipment designers use it to predict the behavior of designs. Cadence Fidelity runs CFD simulations that are projected to run as much as 22x faster on GB200 systems than on traditional CPU-powered systems. With parallel scalability and 30TB of memory per GB200 NVL72 rack, it’s possible to capture flow details like never before.

In another application, Cadence Reality’s digital twin software can be used to create a virtual replica of a physical data center, including all its components — servers, cooling systems and power supplies. Such a virtual model allows engineers to test different configurations and scenarios before implementing them in the real world, saving time and costs.

Cadence Reality’s magic happens from physics-based algorithms that can simulate how heat, airflow and power usage affect data centers. This helps engineers and data center operators to more effectively manage capacity, predict potential operational problems and make informed decisions to optimize the layout and operation of the data center for improved efficiency and capacity utilization. With Blackwell GPUs, these simulations are projected to run up to 30x faster than with CPUs, offering accelerated timelines and higher energy efficiency.

AI for Scientific Computing

New Blackwell accelerators and networking will deliver leaps in performance for advanced simulation.

The NVIDIA GB200 kicks off a new era for high-performance computing (HPC). Its architecture sports a second-generation transformer engine optimized to accelerate inference workloads for LLMs.

This delivers a 30x speedup on resource-intensive applications like the 1.8-trillion-parameter GPT-MoE (generative pretrained transformer-mixture of experts) model compared to the H100 generation, unlocking new possibilities for HPC. By enabling LLMs to process and decipher vast amounts of scientific data, HPC applications can sooner reach valuable insights that can accelerate scientific discovery.

Sandia National Laboratories is building an LLM copilot for parallel programming. Traditional AI can generate basic serial computing code efficiently, but when it comes to parallel computing code for HPC applications, LLMs can falter. Sandia researchers are tackling this issue head-on with an ambitious project — automatically generating parallel code in Kokkos, a specialized programming language designed by multiple national labs for running tasks across tens of thousands of processors in the world’s most powerful supercomputers.

Sandia is using an AI technique known as retrieval-augmented generation, or RAG, which combines information-retrieval capabilities with language generation models. The team is creating a Kokkos database and integrating it with AI models using RAG.

Initial results are promising. Different RAG approaches from Sandia have demonstrated autonomously generated Kokkos code for parallel computing applications. By overcoming hurdles in AI-based parallel code generation, Sandia aims to unlock new possibilities in HPC across leading supercomputing facilities worldwide. Other examples include renewables research, climate science and drug discovery.

Driving Quantum Computing Advances

Quantum computing unlocks a time machine trip for fusion energy, climate research, drug discovery and many more areas. So researchers are hard at work simulating future quantum computers on NVIDIA GPU-based systems and software to develop and test quantum algorithms faster than ever.

The NVIDIA CUDA-Q platform enables both simulation of quantum computers and hybrid application development with a unified programming model for CPUs, GPUs and QPUs (quantum processing units) working together.

CUDA-Q is speeding simulations in chemistry workflows for BASF, high-energy and nuclear physics for Stony Brook and quantum chemistry for NERSC.

NVIDIA Blackwell architecture will help drive quantum simulations to new heights. Utilizing the latest NVIDIA NVLink multi-node interconnect technology helps shuttle data faster for speedup benefits to quantum simulations.

Accelerating Data Analytics for Scientific Breakthroughs 

Data processing with RAPIDS is popular for scientific computing. Blackwell introduces a hardware decompression engine to decompress compressed data and speed up analytics in RAPIDS.

The decompression engine provides performance improvements up to 800GB/s and enables Grace Blackwell to perform 18x faster than CPUs — on Sapphire Rapids — and 6x faster than NVIDIA H100 Tensor Core GPUs for query benchmarks.

Rocketing data transfers with 8TB/s of high-memory bandwidth and the Grace CPU high-speed NVLink Chip-to-Chip (C2C) interconnect, the engine speeds up the entire process of database queries. Yielding top-notch performance across data analytics and data science use cases, Blackwell speeds data insights and reduces costs.

Driving Extreme Performance for Scientific Computing with NVIDIA Networking

The NVIDIA Quantum-X800 InfiniBand networking platform offers the highest throughput for scientific computing infrastructure.

It includes NVIDIA Quantum Q3400 and Q3200 switches and the NVIDIA ConnectX-8 SuperNIC, together hitting twice the bandwidth of the prior generation. The Q3400 platform offers 5x higher bandwidth capacity and 14.4Tflops of in-network computing with NVIDIA’s scalable hierarchical aggregation and reduction protocol (SHARPv4), providing a 9x increase compared with the prior generation.

The performance leap and power efficiency translates to significant reductions in workload completion time and energy consumption for scientific computing.

Learn more about NVIDIA Blackwell.

Read More

Generating Science: NVIDIA AI Accelerates HPC Research

Generating Science: NVIDIA AI Accelerates HPC Research

Generative AI is taking root at national and corporate labs, accelerating high-performance computing for business and science.

Researchers at Sandia National Laboratories aim to automatically generate code in Kokkos, a parallel programming language designed for use across many of the world’s largest supercomputers.

It’s an ambitious effort. The specialized language, developed by researchers from several national labs, handles the nuances of running tasks across tens of thousands of processors.

Sandia is employing retrieval-augmented generation (RAG) to create and link a Kokkos database with AI models. As researchers experiment with different RAG approaches, initial tests show promising results.

Cloud-based services like NeMo Retriever are among the RAG options the scientists will evaluate.

“NVIDIA provides a rich set of tools to help us significantly accelerate the work of our HPC software developers,” said Robert Hoekstra, a senior manager of extreme scale computing at Sandia.

Building copilots via model tuning and RAG is just a start. Researchers eventually aim to employ foundation models trained with scientific data from fields such as climate, biology and material science.

Getting Ahead of the Storm

Researchers and companies in weather forecasting are embracing CorrDiff, a generative AI model that’s part of NVIDIA Earth-2, a set of services and software for weather and climate research.

CorrDiff can scale the 25km resolution of traditional atmosphere models down to 2 kilometers and expand by more than 100x the number of forecasts that can be combined to improve confidence in predictions.

“It’s a promising innovation … We plan to leverage such models in our global and regional AI forecasts for richer insights,” said Tom Gowan, machine learning and modeling lead for Spire, a company in Vienna, Va., that collects data from its own network of tiny satellites.

Generative AI enables faster, more accurate forecasts, he said in a recent interview.

“It really feels like a big jump in meteorology,” he added. “And by partnering with NVIDIA, we have access to the world’s best GPUs that are the most reliable, fastest and most efficient ones for both training and inference.”

Graphic showing Spire weather forecast

Switzerland-based Meteomatics recently announced it also plans to use NVIDIA’s generative AI platform for its weather forecasting business.

“Our work with NVIDIA will help energy companies maximize their renewable energy operations and increase their profitability with quick and accurate insight into weather fluctuations,” said Martin Fengler, founder and CEO of Meteomatics.

Generating Genes to Improve Healthcare

At Argonne National Laboratory, scientists are using the technology to generate gene sequences that help them better understand the virus behind COVID-19. Their award-winning models, called GenSLMs, spawned simulations that closely resemble real-world variants of SARS-CoV-2.

“Understanding how different parts of the genome are co-evolving gives us clues about how the virus may develop new vulnerabilities or new forms of resistance,” Arvind Ramanathan, a lead researcher, said in a blog.

GenSLMs were trained on more than 110 million genome sequences with NVIDIA A100 Tensor Core GPU-powered supercomputers, including Argonne’s Polaris system, the U.S. Department of Energy’s Perlmutter and NVIDIA’s Selene.

Microsoft Proposes Novel Materials

Microsoft Research showed how generative AI can accelerate work in materials science.

Their MatterGen model generates novel, stable materials that exhibit desired properties. The approach enables specifying chemical, magnetic, electronic, mechanical and other desired properties.

“We believe MatterGen is an important step forward in AI for materials design,” the Microsoft Research team wrote of the model they trained on Azure AI infrastructure with NVIDIA A100 GPUs.

Companies such as Carbon3D are already finding opportunities, applying generative AI to materials science in commercial 3D printing operations.

It’s just the beginning of what researchers will be able to do for HPC and science with generative AI. The NVIDIA H200 Tensor Core GPUs available now and the upcoming NVIDIA Blackwell Architecture GPUs will take their work to new levels.

Learn more about tools like NVIDIA Modulus, a key component in the Earth-2 platform for building AI models that obey the laws of physics, and NVIDIA Megatron-Core, a NeMo library to tune and train large language models.

Read More

Dial It In: Data Centers Need New Metric for Energy Efficiency

Dial It In: Data Centers Need New Metric for Energy Efficiency

Data centers need an upgraded dashboard to guide their journey to greater energy efficiency, one that shows progress running real-world applications.

The formula for energy efficiency is simple: work done divided by energy used. Applying it to data centers calls for unpacking some details.

Today’s most widely used gauge — power usage effectiveness (PUE)  — compares the total energy a facility consumes to the amount its computing infrastructure uses. Over the last 17 years, PUE has driven the most efficient operators closer to an ideal where almost no energy is wasted on processes like power conversion and cooling.

Finding the Next Metrics

PUE served data centers well during the rise of cloud computing, and it will continue to be useful. But it’s insufficient in today’s generative AI era, when workloads and the systems running them have changed dramatically.

That’s because PUE doesn’t measure the useful output of a data center, only the energy that it consumes. That’d be like measuring the amount of gas an engine uses without noticing how far the car has gone.

Many standards exist for data center efficiency. A 2017 paper lists nearly three dozen of them, several focused on specific targets such as cooling, water use, security and cost.

Understanding What’s Watts

When it comes to energy efficiency, the computer industry has a long and somewhat unfortunate history of describing systems and the processors they use in terms of power, typically in watts. It’s a worthwhile metric, but many fail to realize that watts only measure input power at a point in time, not the actual energy computers use or how efficiently they use it.

So, when modern systems and processors report rising input power levels in watts, that doesn’t mean they’re less energy efficient. In fact, they’re often much more efficient in the amount of work they do with the amount of energy they use.

Modern data center metrics should focus on energy, what the engineering community knows as kilowatt-hours or joules. The key is how much useful work they do with this energy.

Reworking What We Call Work

Here again, the industry has a practice of measuring in abstract terms, like processor instructions or math calculations. So, MIPS (millions of instructions per second) and FLOPS (floating point operations per second) are widely quoted.

Only computer scientists care how many of these low-level jobs their system can handle. Users would prefer to know how much real work their systems put out, but defining useful work is somewhat subjective.

Data centers focused on AI may rely on the MLPerf benchmarks. Supercomputing centers tackling scientific research typically use additional measures of work. Commercial data centers focused on streaming media may want others.

The resulting suite of applications must be allowed to evolve over time to reflect the state of the art and the most relevant use cases. For example, the last MLPerf round added tests using two generative AI models that didn’t even exist five years ago.

A Gauge for Accelerated Computing

Ideally, any new benchmarks should measure advances in accelerated computing. This combination of parallel processing hardware, software and methods is running applications dramatically faster and more efficiently than CPUs across many modern workloads.

For example, on scientific applications, the Perlmutter supercomputer at the National Energy Research Scientific Computing Center demonstrated an average of 5x gains in energy efficiency using accelerated computing. That’s why it’s among the 39 of the top 50 supercomputers — including the No. 1 system — on the Green500 list that use NVIDIA GPUs.

Chart of GPU vs CPU energy efficiency
Because they execute lots of tasks in parallel, GPUs execute more work in less time than CPUs, saving energy.

Companies across many industries share similar results. For example, PayPal improved real-time fraud detection by 10% and lowered server energy consumption nearly 8x with accelerated computing.

The gains are growing with each new generation of GPU hardware and software.

In a recent report, Stanford University’s Human-Centered AI group estimated GPU performance “has increased roughly 7,000 times” since 2003, and price per performance is “5,600 times greater.”

Chart depicts relationships among various data center energy efficiency graphics
Data centers need a suite of benchmarks to track energy efficiency across their major workloads.

Two Experts Weigh In

Experts see the need for a new energy-efficiency metric, too.

With today’s data centers achieving scores around 1.2 PUE, the metric “has run its course,” said Christian Belady, a data center engineer who had the original idea for PUE. “It improved data center efficiency when things were bad, but two decades later, they’re better, and we need to focus on other metrics more relevant to today’s problems.”

Looking forward, “the holy grail is a performance metric. You can’t compare different workloads directly, but if you segment by workloads, I think there is a better likelihood for success,” said Belady, who continues to work on initiatives driving data center sustainability.

Jonathan Koomey, a researcher and author on computer efficiency and sustainability, agreed.

“To make good decisions about efficiency, data center operators need a suite of benchmarks that measure the energy implications of today’s most widely used AI workloads,” said Koomey.

“Tokens per joule is a great example of what one element of such a suite might be,” Koomey added. “Companies will need to engage in open discussions, share information on the nuances of their own workloads and experiments, and agree to realistic test procedures to ensure these metrics accurately characterize energy use for hardware running real-world applications.”

“Finally, we need an open public forum to conduct this important work,” he said.

It Takes a Village

Thanks to metrics like PUE and rankings like the Green500, data centers and supercomputing centers have made enormous progress in energy efficiency.

More can and must be done to extend efficiency advances in the age of generative AI. Metrics of energy consumed doing useful work on today’s top applications can take supercomputing and data centers to a new level of energy efficiency.

To learn more about available energy-efficiency solutions, explore NVIDIA sustainable computing.

Read More

Through the Wormhole: Media.Monks’ Vision for Enhancing Media and Marketing With AI

Through the Wormhole: Media.Monks’ Vision for Enhancing Media and Marketing With AI

Meet Media.Monks’ Wormhole, an alien-like, conversational robot with a quirky personality and the ability to offer keen marketing expertise. Lewis Smithingham, senior vice president of innovation and special ops at Media.Monks, a global marketing and advertising company, discusses the creation of Wormhole and AI’s potential to enhance media and entertainment with host Noah Kravitz in this AI Podcast episode recorded live at the NVIDIA GTC global AI conference. Wormhole was designed to showcase Monks.Flow, an AI-powered platform that streamlines marketing and content creation workflows. Smithingham delves into Media.Monks’ platforms for media, entertainment and advertising and speaks to its vision for a future where AI enhances creativity and allows for more personalized, scalable content creation.

Stay tuned for more episodes recorded live from GTC, and hear more from Smithingham in this GTC interview.

Time Stamps

1:45: What is Media.Monks?
6:23: Description of Wormhole
8:49: Possible use cases for Wormhole
10:21: Takeaways from developing Wormhole
12:02: What is Monks.Flow?
16:54: Response from creatives on using AI in their work
21:23: Smithingham’s outlook on hyperpersonalized content
34:24: What’s next for the future of AI-powered media?

You Might Also Like…

Exploring Filmmaking With Cuebric’s AI: Insights From Pinar Seyhan Demirdag – Ep. 214

In today’s episode of NVIDIA’s AI Podcast, host Noah Kravitz talks with Pinar Seyhan Demirdag, co-founder and CEO of Cuebric. Cuebric is on a mission to offer new solutions in filmmaking and content creation through immersive, two-and-a-half-dimensional cinematic environments.

Deepdub’s Ofir Krakowski on Redefining Dubbing From Hollywood to Bollywood – Ep. 202

On the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Deepdub’s cofounder and CEO, Ofir Krakowski. Deepdub uses AI-driven dubbing to help entertainment companies boost efficiency and cut costs while increasing accessibility.

WSC Sports’ Amos Bercovich on How AI Keeps the Sports Highlights Coming – Ep. 183

On this episode of the AI Podcast, host Noah Kravitz spoke with Amos Bercovich, algorithm group leader at WSC Sports, makers of an AI cloud platform that enables over 200 sports organizations worldwide to generate personalized and customized sports videos automatically and in real time.

Maya Ackerman on LyricStudio, an AI-Based Writing Songwriting Assistant – Ep. 153

Lennon and McCartney. Ashford and Simpson. Many of our all-time favorite tunes have come from songwriting duos. Now, anyone can find a snazzy compositional partner in AI. In this episode of the AI Podcast, Maya Ackerman, CEO of WaveAI, spoke with host Noah Kravtiz about WaveAI’s LyricStudio software, an AI-based lyric and poetry writing assistant.

Subscribe to the AI Podcast

Get the AI Podcast through iTunes, Google Podcasts, Google Play, Amazon Music, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

 

Read More

‘Honkai: Star Rail’ Blasts Off on GeForce NOW

‘Honkai: Star Rail’ Blasts Off on GeForce NOW

Gear up, Trailblazers — Honkai: Star Rail lands on GeForce NOW this week, along with an in-game reward for members to celebrate the title’s launch in the cloud.

Stream it today, along with five new games joining the GeForce NOW library of more than 1,900 titles this week.

Five Stars

Take a galactic journey in the cloud with Honkai: Star Rail, a new Cosmic Adventure Strategy role-playing game from HoYoverse, the company behind Genshin Impact. The title seamlessly blends intricate storytelling with immersive gameplay mechanics for an epic journey through the cosmos.

Meet a cast of unique characters and explore diverse planets, each with its own mysteries to uncover. Assemble formidable teams, strategically deploying skills and resources to overcome mighty adversaries and unravel the mysteries of the Honkai phenomenon. Encounter new civilizations and face off against threats that endanger the Astral Express, overcome the struggles caused by Stellaron together, powerful artifacts that hold the keys to the universe’s fate.

Begin the trailblazing journey without needing to wait for downloads or game updates with GeForce NOW. Members who’ve opted into GeForce NOW’s Rewards program will receive an email with a code for a Honkai: Star Rail starter kit, containing 30,000 credits, three Refined Aethers and three Traveler’s Guides. All aboard the Astral Express for adventures and thrills!

A Big Cloud for New Games 

Little Kitty Big City on GeForce MEOW
Stream it on GeForce MEOW.

Do what cats do best in Little Kitty, Big City, the open-world adventure game from Double Dagger Studios. Explore the city as a curious little kitty with a big personality, make new friends with stray animals, and wear delightful little hats. Create a little bit of chaos finding the way back home throughout the big city.

Here’s the full list of new games this week:

  • Little Kitty, Big City (New release on Steam and Xbox, available on PC Game Pass, May 9)
  • Farmer’s Life (Steam)
  • Honkai: Star Rail (Epic Games Store)
  • Supermarket Simulator (Steam)
  • Tomb Raider: Definitive Edition (Xbox, available on PC Game Pass)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

‘Get On the Train,’ NVIDIA CEO Says at ServiceNow’s Knowledge 2024

‘Get On the Train,’ NVIDIA CEO Says at ServiceNow’s Knowledge 2024

Now’s the time to hop aboard AI, NVIDIA founder and CEO Jensen Huang declared Wednesday as ServiceNow unveiled a demo of futuristic AI avatars together with NVIDIA during a keynote at the Knowledge 24 conference in Las Vegas.

“If something is moving a million times faster every 10 years, what should you do?” Huang asked, citing rapid advancements in AI capabilities. “The first thing you should do is instead of looking at the train, from the side is … get on the train, because on the train, it’s not moving that fast.”

The demo — built on NVIDIA NIM inference microservices and NVIDIA Avatar Cloud Engine, or ACE, speech and animation generative AI technologies, all available with NVIDIA AI Enterprise software — highlighted how AI advancements support cutting-edge digital avatar communications and have the potential to revolutionize customer service interactions.

The demo showed a customer who was struggling with a slow internet connection interacting with a digital avatar. The AI customer service avatar comes to the rescue –  swiftly diagnoses the problem, offers an option for a faster internet connection, confirms the customer’s credit card number and upgrades their internet connection immediately.

The futuristic demonstration took place in front of thousands of conference attendees who were eager to learn about the latest enterprise generative AI technology advancements, which promise to empower workers across the globe.

“We’ve transitioned from instruction-driven computer coding, which very few people can do, to intention-driven computing, which is connecting with somebody through intention,” Huang said during an on-stage conversation at the conference with ServiceNow Chief Operating Officer Chirantan “CJ” Desai.

The moment is another compelling example of the ongoing collaboration between ServiceNow and NVIDIA to explore more engaging, personal service experiences across various functions, including IT services, human resources, customer support and more.

The demonstration builds upon the companies’ plan to collaborate on robust, generative AI capabilities within enterprise operations and incorporates NVIDIA ACE and NVIDIA NIM microservices.

These avatars are designed to add a human-like touch to digital interactions, improving customer experience by providing empathetic and efficient support.

These include NVIDIA Riva for automatic speech recognition and text-to-speech, NVIDIA Audio2Face for facial animation, and NVIDIA Omniverse Renderer for high-quality visual output.

ServiceNow and NVIDIA are further exploring the use of AI avatars to provide another communication option for users who prefer visual interactions.

 

Visit this link to watch a recording of Huang and Desai presenting the digital avatar demo at the Knowledge 24 keynote. 


###END###

Read More