December 2024 – Page 10

Introducing Gemini 2.0: our new AI model for the agentic era

Today, we’re announcing Gemini 2.0, our most capable multimodal AI model yet.Read More

Introducing Gemini 2.0: our new AI model for the agentic era

Today, we’re announcing Gemini 2.0, our most capable multimodal AI model yet.Read More

Introducing Gemini 2.0: our new AI model for the agentic era

Today, we’re announcing Gemini 2.0, our most capable multimodal AI model yet.Read More

Introducing Gemini 2.0: our new AI model for the agentic era

Today, we’re announcing Gemini 2.0, our most capable multimodal AI model yet.Read More

Introducing Gemini 2.0: our new AI model for the agentic era

Today, we’re announcing Gemini 2.0, our most capable multimodal AI model yet.Read More

Introducing Gemini 2.0: our new AI model for the agentic era

Today, we’re announcing Gemini 2.0, our most capable multimodal AI model yet.Read More

Into the Omniverse: How OpenUSD-Based Simulation and Synthetic Data Generation Advance Robot Learning

Editor’s note: This post is part of Into the Omniverse, a series focused on how developers, 3D practitioners, and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse.

Scalable simulation technologies are driving the future of autonomous robotics by reducing development time and costs.

Universal Scene Description (OpenUSD) provides a scalable and interoperable data framework for developing virtual worlds where robots can learn how to be robots. With SimReady OpenUSD-based simulations, developers can create limitless scenarios based on the physical world.

And NVIDIA Isaac Sim is advancing perception AI-based robotics simulation. Isaac Sim is a reference application built on the NVIDIA Omniverse platform for developers to simulate and test AI-driven robots in physically based virtual environments.

At AWS re:Invent, NVIDIA announced that Isaac Sim is now available on Amazon EC2 G6e instances powered by NVIDIA L40S GPUs. These powerful instances enhance the performance and accessibility of Isaac Sim, making high-quality robotics simulations more scalable and efficient.

These advancements in Isaac Sim mark a significant leap for robotics development. By enabling realistic testing and AI model training in virtual environments, companies can reduce time to deployment and improve robot performance across a variety of use cases.

Advancing Robotics Simulation With Synthetic Data Generation

Robotics companies like Cobot, Field AI and Vention are using Isaac Sim to simulate and validate robot performance while others, such as SoftServe and Tata Consultancy Services, use synthetic data to bootstrap AI models for diverse robotics applications.

The evolution of robot learning has been deeply intertwined with simulation technology. Early experiments in robotics relied heavily on labor-intensive, resource-heavy trials. Simulation is a crucial tool for the creation of physically accurate environments where robots can learn through trial and error, refine algorithms and even train AI models using synthetic data.

Physical AI describes AI models that can understand and interact with the physical world. It embodies the next wave of autonomous machines and robots, such as self-driving cars, industrial manipulators, mobile robots, humanoids and even robot-run infrastructure like factories and warehouses.

Robotics simulation, which forms the second computer in the three computer solution, is a cornerstone of physical AI development that lets engineers and researchers design, test and refine systems in a controlled virtual environment.

A simulation-first approach significantly reduces the cost and time associated with physical prototyping while enhancing safety by allowing robots to be tested in scenarios that might otherwise be impractical or hazardous in real life.

With a new reference workflow , developers can accelerate the generation of synthetic 3D datasets with generative AI using OpenUSD NIM microservices. This integration streamlines the pipeline from scene creation to data augmentation, enabling faster and more accurate training of perception AI models.

Synthetic data can help address the challenge of limited, restricted or unavailable data needed to train various types of AI models, especially in computer vision. Developing action recognition models is a common use case that can benefit from synthetic data generation.

To learn how to create a human action recognition video dataset with Isaac Sim, check out the technical blog on Scaling Action Recognition Models With Synthetic Data. 3D simulations offer developers precise control over image generation, eliminating hallucinations.

Robotic Simulation for Humanoids

Humanoid robots are the next wave of embodied AI, but they present a challenge at the intersection of mechatronics, control theory and AI. Simulation is crucial to solving this challenge by providing a safe, cost-effective and versatile platform for training and testing humanoids.

With NVIDIA Isaac Lab, an open-source unified framework for robot learning built on top of Isaac Sim, developers can train humanoid robot policies at scale via simulations. Leading commercial robot makers are adopting Isaac Lab to handle increasingly complex movements and interactions.

NVIDIA Project GR00T, an active research initiative to enable the humanoid robot ecosystem of builders, is pioneering workflows such as GR00T-Gen to generate robot tasks and simulation-ready environments in OpenUSD. These can be used for training generalist robots to perform manipulation, locomotion and navigation.

Recently published research from Project GR00T also shows how advanced simulation can be used to train interactive humanoids. Using Isaac Sim, the researchers developed a single unified controller for physically simulated humanoids called MaskedMimic. The system is capable of generating a wide range of motions across diverse terrains from intuitive user-defined intents.

Physics-Based Digital Twins Simplify AI Training

Partners across industries are using Isaac Sim, Isaac Lab, Omniverse, and OpenUSD to design, simulate and deploy smarter, more capable autonomous machines:

Agility uses Isaac Lab to create simulations that let simulated robot behaviors transfer directly to the robot, making it more intelligent, agile and robust when deployed in the real world.
Cobot uses Isaac Sim with its AI-powered cobot, Proxie, to optimize logistics in warehouses, hospitals, manufacturing sites and more.
Cohesive Robotics has integrated Isaac Sim into its software framework called Argus OS for developing and deploying robotic workcells used in high-mix manufacturing environments.
Field AI, a builder of robot foundation models, uses Isaac Sim and Isaac Lab to evaluate the performance of its models in complex, unstructured environments across industries such as construction, manufacturing, oil and gas, mining, and more.
Fourier uses NVIDIA Isaac Gym and Isaac Lab to train its GR-2 humanoid robot, using reinforcement learning and advanced simulations to accelerate development, enhance adaptability and improve real-world performance.
Foxglove integrates Isaac Sim and Omniverse to enable efficient robot testing, training and sensor data analysis in realistic 3D environments.
Galbot used Isaac Sim to verify the data generation of DexGraspNet, a large-scale dataset of 1.32 million ShadowHand grasps, advancing robotic hand functionality by enabling scalable validation of diverse object interactions across 5,355 objects and 133 categories.
Standard Bots is simulating and validating the performance of its R01 robot used in manufacturing and machining setups.
Wandelbots integrates its NOVA platform with Isaac Sim to create physics-based digital twins and intuitive training environments, simplifying robot interaction and enabling seamless testing, validation and deployment of robotic systems in real-world scenarios.

Learn more about how Wandelbots is advancing robot learning with NVIDIA technology in this livestream recording:

Get Plugged Into the World of OpenUSD

NVIDIA experts and Omniverse Ambassadors are hosting livestream office hours and study groups to provide robotics developers with technical guidance and troubleshooting support for Isaac Sim and Isaac Lab. Learn how to get started simulating robots in Isaac Sim with this new, free course on NVIDIA Deep Learning Institute (DLI).

For more on optimizing OpenUSD workflows, explore the new self-paced Learn OpenUSD training curriculum that includes free DLI courses for 3D practitioners and developers. For more resources on OpenUSD, explore the Alliance for OpenUSD forum and the AOUSD website.

Don’t miss the CES keynote delivered by NVIDIA founder and CEO Jensen Huang live in Las Vegas on Monday, Jan. 6, at 6:30 p.m. PT for more on the future of AI and graphics.

Stay up to date by subscribing to NVIDIA news, joining the community, and following NVIDIA Omniverse on Instagram, LinkedIn, Medium and X.

Featured image courtesy of Fourier.

torchcodec: Easy and Efficient Video Decoding for PyTorch

We are pleased to officially announce torchcodec, a library for decoding videos into PyTorch tensors. It is fast, accurate, and easy to use. When running PyTorch models on videos, torchcodec is our recommended way to turn those videos into data your model can use.

Highlights of torchcodec include:

An intuitive decoding API that treats a video file as a Python sequence of frames. We support both index-based and presentation-time-based frame retrieval.
An emphasis on accuracy: we ensure you get the frames you requested, even if your video has variable frame rates.
A rich sampling API that makes it easy and efficient to retrieve batches of frames.
Best-in-class CPU decoding performance.
CUDA accelerated decoding that enables high throughput when decoding many videos at once.
Support for all codecs available in your installed version of FFmpeg.
Simple binary installs for Linux and Mac.

Easy to Use

A simple, intuitive API was one of our main design principles. We start with simple decoding and extracting specific frames of a video:

from torchcodec.decoders import VideoDecoder
from torch import Tensor

decoder = VideoDecoder("my_video.mp4")

# Index based frame retrieval.
first_ten_frames: Tensor = decoder[10:]
last_ten_frames: Tensor = decoder[-10:]

# Multi-frame retrieval, index and time based.
frames = decoder.get_frames_at(indices=[10, 0, 15])
frames = decoder.get_frames_played_at(seconds=[0.2, 3, 4.5])

All decoded frames are already PyTorch tensors, ready to be fed into models for training.

Of course, more common in ML training pipelines is sampling multiple clips from videos. A clip is just a sequence of frames in presentation order—but the frames are often not consecutive. Our sampling API makes this easy:

from torchcodec.samplers import clips_at_regular_timestamps

clips = clips_at_regular_timestamps(
  decoder,
  seconds_between_clip_starts=10,
  num_frames_per_clip=5,
  seconds_between_frames=0.2,
)

The above call yields a batch of clips where each clip starts 10 seconds apart, each clip has 5 frames, and those frames are 0.2 seconds apart. See our tutorials on decoding and sampling for more!

Fast Performance

Performance was our other main design principle. Decoding videos for ML training has different performance requirements than decoding videos for playback. A typical ML video training pipeline will process many different videos (sometimes in the millions!), but only sample a small number of frames (dozens to hundreds) from each video.

For this reason, we’ve paid particular attention to our decoder’s performance when seeking multiple times in a video, decoding a small number of frames after each seek. We present experiments with the following four scenarios:

Decoding and transforming frames from multiple videos at once, inspired by what we have seen in data loading for large-scale training pipelines:

a. Ten threads decode batches of 50 videos in parallel.
b. For each video, decode 10 frames at evenly spaced times.
c. For each frame, resize it to a 256×256 resolution.
Decoding 10 frames at random locations in a single video.
Decoding 10 frames at evenly spaced times of a single video.
Decoding the first 100 frames of a single video.

We compare the following video decoders:

Torchaudio, CPU decoding only.
Torchvision, using the video_reader backend which is CPU decoding only.
Torchcodec, GPU decoding with CUDA.
Torchcodec, CPU decoding only.

Using the following three videos:

A synthetically generated video using FFmpeg’s mandelbrot generation pattern. The video is 10 seconds long, 60 frames per second and 1920×1080.
Same as above, except the video is 120 seconds long.
A promotional video from NASA that is 206 seconds long, 29.7 frames per second and 960×540.

The experimental script is in our repo. Our experiments run on a Linux system with an Intel processor that has 22 available cores and an NVIDIA GPU. For CPU decoding, all libraries were instructed to automatically determine the best number of threads to use.

From our experiments, we draw several conclusions:

Torchcodec is consistently the best-performing library for the primary use case we designed it for: decoding many videos at once as a part of a training data loading pipeline. In particular, high-resolution videos see great gains with CUDA where decoding and transforms both happen on the GPU.
Torchcodec is competitive on the CPU with seek-heavy use cases such as random and uniform sampling. Currently, torchcodec’s performance is better with shorter videos that have a smaller file size. This performance is due to torchcodec’s emphasis on seek-accuracy, which involves an initial linear scan.
Torchcodec is not as competitive when there is no seeking; that is, opening a video file and decoding from the beginning. This is again due to our emphasis on seek-accuracy and the initial linear scan.

Implementing an approximate seeking mode in torchcodec should resolve these performance gaps, and it’s our highest priority feature for video decoding.

What’s Next?

As the name implies, the long-term future for torchcodec is more than just video decoding. Our next big feature is audio support—both decoding audio streams from video, and from audio-only media. In the long term, we want torchcodec to be the media decoding library for PyTorch. That means as we implement functionality in torchcodec, we will deprecate and eventually remove complementary features from torchaudio and torchvision.

We also have video decoding improvements lined up, such as the previously mentioned approximate seeking mode for those who are willing to sacrifice accuracy for performance.

Most importantly, we’re looking for feedback from the community! We’re most interested in working on features that the community finds valuable. Come share your needs and influence our future direction!

BayesCNS: A Unified Bayesian Approach to Address Cold Start and Non-Stationarity in Search Systems at Scale

Information Retrieval (IR) systems used in search and recommendation platforms frequently employ Learning-to-Rank (LTR) models to rank items in response to user queries. These models heavily rely on features derived from user interactions, such as clicks and engagement data. This dependence introduces cold start issues for items lacking user engagement and poses challenges in adapting to non-stationary shifts in user behavior over time. We address both challenges holistically as an online learning problem and propose BayesCNS, a Bayesian approach designed to handle cold start and…Apple Machine Learning Research

AI Pioneers Win Nobel Prizes for Physics and Chemistry

Artificial intelligence, once the realm of science fiction, claimed its place at the pinnacle of scientific achievement Monday in Sweden.

In a historic ceremony at Stockholm’s iconic Konserthuset, John Hopfield and Geoffrey Hinton received the Nobel Prize in Physics for their pioneering work on neural networks — systems that mimic the brain’s architecture and form the bedrock of modern AI.

Meanwhile, Demis Hassabis and John Jumper accepted the Nobel Prize in Chemistry for Google DeepMind’s AlphaFold, a system that solved biology’s “impossible” problem: predicting the structure of proteins, a feat with profound implications for medicine and biotechnology.

These achievements go beyond academic prestige. They mark the start of an era where GPU-powered AI systems tackle problems once deemed unsolvable, revolutionizing multitrillion-dollar industries from healthcare to finance.

Hopfield’s Legacy and the Foundations of Neural Networks

In the 1980s, Hopfield, a physicist with a knack for asking big questions, brought a new perspective to neural networks.

He introduced energy landscapes — borrowed from physics — to explain how neural networks solve problems by finding stable, low-energy states. His ideas, abstract yet elegant, laid the foundation for AI by showing how complex systems optimize themselves.

Fast forward to the early 2000s, when Geoffrey Hinton — a British cognitive psychologist with a penchant for radical ideas — picked up the baton. Hinton believed neural networks could revolutionize AI, but training these systems required enormous computational power.

In 1983, Hinton and Sejnowski built on Hopfield’s work and invented the Boltzmann Machine which used stochastic binary neurons to jump out of local minima. They discovered an elegant and very simple learning procedure based on statistical mechanics which was an alternative to backpropagation.

In 2006 a simplified version of this learning procedure proved to be very effective at initializing deep neural networks before training them with backpropagation. However, training these systems still required enormous computational power.

AlphaFold: Biology’s AI Revolution

A decade after AlexNet, AI moved to biology. Hassabis and Jumper led the development of AlphaFold to solve a problem that had stumped scientists for years: predicting the shape of proteins.

Proteins are life’s building blocks. Their shapes determine what they can do. Understanding these shapes is the key to fighting diseases and developing new medicines. But finding them was slow, costly and unreliable.

AlphaFold changed that. It used Hopfield’s ideas and Hinton’s networks to predict protein shapes with stunning accuracy. Powered by GPUs, it mapped almost every known protein. Now, scientists use AlphaFold to fight drug resistance, make better antibiotics and treat diseases once thought to be incurable.

What was once biology’s Gordian knot has been untangled — by AI.

The GPU Factor: Enabling AI’s Potential

GPUs, the indispensable engines of modern AI, are at the heart of these achievements. Originally designed to make video games look good, GPUs were perfect for the massive parallel processing demands of neural networks.

NVIDIA GPUs, in particular, became the engine driving breakthroughs like AlexNet and AlphaFold. Their ability to process vast datasets with extraordinary speed allowed AI to tackle problems on a scale and complexity never before possible.

Redefining Science and Industry

The Nobel-winning breakthroughs of 2024 aren’t just rewriting textbooks — they’re optimizing global supply chains, accelerating drug development and helping farmers adapt to changing climates.

Hopfield’s energy-based optimization principles now inform AI-powered logistics systems. Hinton’s architectures underpin self-driving cars and language models like ChatGPT. AlphaFold’s success is inspiring AI-driven approaches to climate modeling, sustainable agriculture and even materials science.

The recognition of AI in physics and chemistry signals a shift in how we think about science. These tools are no longer confined to the digital realm. They’re reshaping the physical and biological worlds.

Vedere AI

Monthly Archives: December 2024

Introducing Gemini 2.0: our new AI model for the agentic era

Introducing Gemini 2.0: our new AI model for the agentic era

Introducing Gemini 2.0: our new AI model for the agentic era

Introducing Gemini 2.0: our new AI model for the agentic era

Introducing Gemini 2.0: our new AI model for the agentic era

Introducing Gemini 2.0: our new AI model for the agentic era

Into the Omniverse: How OpenUSD-Based Simulation and Synthetic Data Generation Advance Robot Learning

Advancing Robotics Simulation With Synthetic Data Generation

Robotic Simulation for Humanoids

Physics-Based Digital Twins Simplify AI Training

Get Plugged Into the World of OpenUSD

torchcodec: Easy and Efficient Video Decoding for PyTorch

Easy to Use

Fast Performance

What’s Next?

BayesCNS: A Unified Bayesian Approach to Address Cold Start and Non-Stationarity in Search Systems at Scale

AI Pioneers Win Nobel Prizes for Physics and Chemistry

Hopfield’s Legacy and the Foundations of Neural Networks

AlphaFold: Biology’s AI Revolution

The GPU Factor: Enabling AI’s Potential

Redefining Science and Industry

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.