Artificial intelligence and accelerated computing are being used to help solve the world’s greatest challenges.
NVIDIA has reinvented the computing stack — spanning GPUs, CPUs, DPUs, networking and software. Our platform drives the AI revolution, powering hundreds of millions of devices in every cloud and fueling 75% of the world’s TOP500 supercomputers.
Put in the hands of entrepreneurs and enterprises, developers and scientists, that platform becomes a system for invention, and a force for good across industries and geographies.
Here are five examples of how these technologies are being put to work from the past year:
Supporting Surgeons
Illinois-based startup SimBioSys has created TumorSight Viz, a technology that converts MRI images into 3D models of breast tissue. This helps surgeons better treat breast cancers by providing detailed visualizations of tumors and surrounding tissue.
Saving Lives and Energy
Researchers at the Wellcome Sanger Institute, a key player in the Human Genome Project, analyze tens of thousands of cancer genomes annually, providing insights into cancer formation and treatment effectiveness. NVIDIA accelerated computing and software drastically reduce the institute’s analysis runtime and energy consumption per genome.
Cleaning Up Our Waters
Clearbot, developed by University of Hong Kong grads, is an AI-driven sea-cleaning boat that autonomously collects trash from the water. Enabled by the NVIDIA Jetson platform, Clearbot is making a splash in Hong Kong and India, helping keep tourist regions clean.
Greening Recycling Plants
Greyparrot, a UK-based startup, has developed the Greyparrot Analyzer, an AI-powered device that offers “waste intelligence” to recycling plants. Using embedded cameras and machine learning, the analyzer identifies and differentiates materials on conveyor belts, significantly improving recycling efficiency.
Driving Technological Advancement in Africa
A new AI innovation hub has launched in Tunisia, part of NVIDIA’s efforts to train 100,000 developers across Africa. Built in collaboration with the NVIDIA Deep Learning Institute, the hub offers training, technologies and business networks to drive AI adoption across the continent.
All of these initiatives — whether equipping surgeons with new tools or making recycling plants greener — rely on the ingenuity of human beings across the globe, humans increasingly supercharged by AI.
Find more examples of how AI is helping people from across industries and the globe to make a difference and drive positive social impact.
TL;DR: The brain may have evolved a modular architecture for daily tasks, with circuits featuring functionally specialized modules that match the task structure. We hypothesize that this architecture enables better learning and generalization than architectures with less specialized modules. To test this, we trained reinforcement learning agents with various neural architectures on a naturalistic navigation task. We found that the modular agent, with an architecture that segregates computations of state representation, value, and action into specialized modules, achieved better learning and generalization. Our results shed light on the possible rationale for the brain’s modularity and suggest that artificial systems can use this insight from neuroscience to improve learning and generalization in natural tasks.
Motivation
Despite the tremendous success of AI in recent years, it remains true that even when trained on the same data, the brain outperforms AI in many tasks, particularly in terms of fast in-distribution learning and zero-shot generalization to unseen data. In the emerging field of neuroAI (Zador et al., 2023), we are particularly interested in uncovering the principles underlying the brain’s extraordinary capabilities so that these principles can be leveraged to develop more versatile and general-purpose AI systems.
Given the same training data, the differing abilities of learning systems—biological or artificial—stem from their distinct assumptions about the data, known as inductive biases. For instance, if the underlying data distribution is linear, a linear model that assumes linearity can learn very quickly—by observing only a few points without needing to fit the entire dataset—and generalize effectively to unseen data. In contrast, another model with a different assumption, such as quadratic, cannot achieve the same performance. Even if it were a powerful universal function approximator, it would not achieve the same efficiency. The brain may have evolved inductive biases that align with the underlying structure of natural tasks, which explains its high efficiency and generalization abilities in such tasks.
What are the brain’s useful inductive biases? One perspective suggests that the brain may have evolved an inductive bias for a modular architecture featuring functionally specialized modules (Bertolero et al., 2015). Each module specializes in a specific aspect or a subset of task variables, collectively covering all demanding computations of the task. We hypothesize that this architecture enables higher efficiency in learning the structure of natural tasks and better generalization in tasks with a similar structure than those with less specialized modules.
Previous works (Goyal et al., 2022; Mittal et al., 2022) have outlined the potential rationale for this architecture: Data generated from natural tasks typically stem from the latent distribution of multiple task variables. Decomposing the task and learning these variables in distinct modules allow a better understanding of the relationships among these variables and therefore the data generation process. This modularization also promotes hierarchical computation, where independent variables are initially computed and then forwarded to other modules specialized in computing dependent variables. Note that “modular” may take on different meanings in different contexts. Here, it specifically refers to architectures with multiple modules, each specializing in one or a subset of the desired task variables. Architectures with multiple modules lacking enforced specialization in computing variables do not meet the criteria for modular in our context.
To test our hypothesis, it is essential to select a natural task and compare a modular architecture designed for the task with alternative architectures.
Task
We chose a naturalistic virtual navigation task (Figure 1) previously used to investigate the neural computations underlying animals’ flexible behaviors (Lakshminarasimhan et al., 2020). At the beginning of each trial, the subject is situated at the center of the ground plane facing forward; a target is presented at a random location within the field of view (distance: (100) to (400) cm, angle: (-35) to (+35^{circ})) on the ground plane and disappears after (300) ms. The subject can freely control its linear and angular velocities with a joystick (maximum: (200) cm/s and (90^{circ})/s, referred to as the joystick gain) to move along its heading in the virtual environment. The objective is to navigate toward the memorized target location, then stop inside the reward zone, a circular region centered at the target location with a radius of (65) cm. A reward is given only if the subject stops inside the reward zone.
The subject’s self-location is not directly observable because there are no stable landmarks; instead, the subject needs to use optic flow cues on the ground plane to perceive self-motion and perform path integration. Each textural element of the optic flow, an isosceles triangle, appears at random locations and orientations, disappearing after only a short lifetime ((sim 250) ms), making it impossible to use as a stable landmark. A new trial starts after the subject stops moving.
Task modeling
We formulate this task as a Partially Observable Markov Decision Process (POMDP; Kaelbling et al., 1998) in discrete time, with continuous state and action spaces (Figure 2). At each time step (t), the environment is in the state (boldsymbol{s}_t) (including the agent’s position and velocity, and the target’s position). The agent takes an action (boldsymbol{a}_t) (controlling its linear and angular velocities) to update (boldsymbol{s}_t) to the next state (boldsymbol{s}_{t+1}) following the environmental dynamics given by the transition probability (T(boldsymbol{s}_{t+1}|boldsymbol{s}_{t},boldsymbol{a}_{t})), and receives a reward (r_t) from the environment following the reward function (R(boldsymbol{s}_t,boldsymbol{a}_t)) ((1) if the agent stops inside the reward zone otherwise (0)).
We use a model-free actor-critic approach to learning, with the actor and critic implemented using distinct neural networks. At each (t), the actor receives two sources of inputs (boldsymbol{i}_t) about the state: observation (boldsymbol{o}_t) and last action (boldsymbol{a}_{t-1}). It then outputs an action (boldsymbol{a}_t), aiming to maximize the state-action value (Q_t). This value is a function of the state and action, representing the expected discounted rewards when an action is taken at a state, and future rewards are then accumulated from (t) until the trial’s last step. Since the ground truth value is unknown, the critic is used to approximate the value. In addition to receiving the same inputs (boldsymbol{i}_t) as the actor to infer the state, the critic also takes as inputs the action (boldsymbol{a}_t) taken by the actor in this state. It then outputs the estimated (Q_t) for this action, trained through the temporal-difference error (TD error) after receiving the reward (r_t) ((|r_t+gamma Q_{t+1}-Q_{t}|), where (gamma) denotes the temporal discount factor). In practice, our algorithm is off-policy and incorporates mechanisms such as two critic networks and target networks as in TD3 (fujimoto et al., 2018) to enhance training (see Materials and Methods in Zhang et al., 2024).
The state (boldsymbol{s}_t) is not fully observable, so the agent must maintain an internal state representation (belief (b_t)) for deciding (boldsymbol{a}_t) and (Q_t). Both actor and critic undergo end-to-end training through back-propagation without explicit objectives for shaping (b_t). Consequently, networks are free to learn diverse forms of (b_t) encoded in their neural activities that aid them in achieving their learning objectives. Ideally, networks may develop an effective belief update rule, e.g., recursive Bayesian estimation, using the two sources of evidence in the inputs (boldsymbol{i}_t={boldsymbol{o}_t, boldsymbol{a}_{t-1}}). They may predict the state (boldsymbol{s}_t) based on its internal model of the dynamics, its previous belief (b_{t-1}), and the last self-action (boldsymbol{a}_{t-1}). The second source is a partial and noisy observation (boldsymbol{o}_t) of (boldsymbol{s}_t) drawn from the observation probability (O(boldsymbol{o}_t|boldsymbol{s}_t)). Note that the actual (O) in the brain for this task is unknown. For simplicity, we model (boldsymbol{o}_t) as a low-dimensional vector, including the target’s location when visible (the first (300) ms, (Delta t=0.1) s), and the agent’s observation of its velocities through optic flow, with velocities subject to Gaussian additive noise.
Actor-critic RL agent
Each RL agent requires an actor and a critic network, and actor and critic networks can have a variety of architectures. Our goal here is to investigate whether functionally specialized modules provide advantages for our task. Therefore, we designed architectures incorporating modules with distinct levels of specialization for comparison. The first architecture is a holistic actor/critic, comprising a single module where all neurons jointly compute the belief (b_t) and the action (boldsymbol{a}_t)/value (Q_t). In contrast, the second architecture is a modular actor/critic, featuring modules specialized in computing different variables (Figure 3).
The specialization of each module is determined as follows.
First, we can confine the computation of beliefs. Since computing beliefs about the evolving state requires integrating evidence over time, a network capable of computing belief must possess some form of memory. Recurrent neural networks (RNNs) satisfy this requirement by using a hidden state that evolves over time. In contrast, computations of value and action do not need additional memory when the belief is provided, making memoryless multi-layer perceptrons (MLPs) sufficient. Consequently, adopting an architecture with an RNN followed by a memoryless MLP (modular actor/critic in Figure 3) ensures that the computation of belief is exclusively confined to the RNN.
Second, we can confine the computation of the state-action value (Q_t) for the critic. Since a critic is trained end-to-end to compute (Q_t), stacking two modules between all inputs and outputs does not limit the computation of (Q_t) to a specific module. However, since (Q_t) is a function of the action (boldsymbol{a}_t), we can confine the computation of (Q_t) to the second module of the modular critic in Figure 3 by supplying (boldsymbol{a}_t) only to the second module. This ensures that the first module, lacking access to the action, cannot accurately compute (Q_t). Therefore, the modular critic’s RNN is dedicated to computing (b_t) and sends it to the MLP dedicated to computing (Q_t). This architecture enforces modularity.
Besides the critic, the modular actor has higher specialization than the holistic actor, which lacks confined (b_t) computation. Thought bubbles in Figure 3 denote the variables that can be computed within each module enforced through architecture rather than indicating they are encoded in each module. For example, (b_t) in modular architectures is passed to the second module, but an accurate (b_t) computation can only be completed in the first RNN module.
Behavioral accuracy
We trained agents using all four combinations of these two actor and critic architectures. We refer to an agent whose actor and critic are both holistic or both modular as a holistic agent or a modular agent, respectively. Agents with modular critics demonstrated greater consistency across various random seeds and achieved near-perfect accuracy more efficiently than agents with holistic critics (Figure 4).
Agents’ behavior was compared with that of two monkeys (Figure 5 left) for a representative set of targets uniformly sampled on the ground plane (Figure 5 right).
We used a Receiver Operating Characteristic (ROC) analysis (Lakshminarasimhan et al., 2020) to systematically quantify behavioral accuracy. A psychometric curve for stopping accuracy is constructed from a large representative dataset by counting the fraction of rewarded trials as a function of a hypothetical reward boundary size (Figure 6 left, solid; radius (65) cm is the true size; infinitely small/large reward boundary leads to no/all rewarded trials). A shuffled curve is constructed similarly after shuffling targets across trials (Figure 6 left, dashed). Then, an ROC curve is obtained by plotting the psychometric curve against the shuffled curve (Figure 6 right). An ROC curve with a slope of (1) denotes a chance level (true(=)shuffled) with the area under the curve (AUC) equal to (0.5). High AUC values indicate that all agents reached good accuracy after training (Figure 6 right, inset).
Although all agents exhibited high stop location accuracy, we have noticed distinct characteristics in their trajectories (Figure 5 left). To quantify these differences, we examined two crucial trajectory properties: curvature and length. When tested on the same series of targets as the monkeys experienced, the difference between trajectories generated by agents with modular critics and those of monkey B was comparable to the variation between trajectories of two monkeys (Figure 7). In contrast, when agents used holistic critics, the difference in trajectories from monkey B was much larger, suggesting that modular critics facilitated more animal-like behaviors.
Behavioral efficiency
Agents are expected to develop efficient behaviors, as the value of their actions gets discounted over time. Therefore, we assess their efficiency throughout the training process by measuring the reward rate, which refers to the number of rewarded trials per second. We found that agents with modular critics achieved much higher reward rates, which explains their more animal-like efficient trajectories (Figure 8).
Together, these results suggest that modular critics provide a superior training signal compared to holistic critics, allowing actors to learn more optimal beliefs and actions. With a poor training signal from the holistic critic, the modularization of actors may not enhance performance. Next, we will evaluate the generalization capabilities of the trained agents.
An unseen task
One crucial aspect of sensorimotor mapping is the joystick gain, which linearly maps motor actions on the joystick (dimensionless, bounded in ([-1,1])) to corresponding velocities in the environment. During training, the gain remains fixed at (200) cm/s and (90^{circ})/s for linear and angular components, referred to as the (1times) gain. By increasing the gain to values that were not previously experienced, we create a gain task manipulation.
To assess generalization abilities, monkeys and agents were tested with novel gains of (1.5times) and (2times) (Figure 9).
Blindly following the same action sequence as in the training task would cause the agents to overshoot (no-generalization hypothesis: Figure 10 dashed lines). Instead, the agents displayed varying degrees of adaptive behavior (Figure 10 solid lines).
To quantitatively evaluate behavioral accuracy while also considering over-/under-shooting effects, we defined radial error as the Euclidean distance between the stop and target locations in each trial, with positive/negative sign denoting over-/under-shooting. Under the novel gains, agents with modular critics consistently exhibited smaller radial errors than agents with holistic critics (Figure 11), with the modular agent demonstrating the smallest errors, comparable to those observed in monkeys.
Neural analysis
Although we have confirmed that agents with distinct neural architectures exhibit varying levels of generalization in the gain task, the underlying mechanism remains unclear. We hypothesized that agents with superior generalization abilities should generate actions based on more accurate internal beliefs within their actor networks. Therefore, the goal next is to quantify the accuracy of beliefs across agents tested on novel gains, and to examine the impact of this accuracy on their generalization performance.
During the gain task, we recorded the activities of RNN neurons in the agents’ actors, as these neurons are responsible for computing the beliefs that underlie actions. To systematically quantify the accuracy of these beliefs, we used linear regression (with (ell_2) regularization) to decode agents’ locations from the recorded RNN activities for each gain condition (Figure 12).
We defined the decoding error, which represents the Euclidean distance between the true and decoded locations, as an indicator of belief accuracy. While all agents demonstrated small decoding errors under the training gain, we found that more holistic agents struggling with generalization under increased gains also displayed reduced accuracy in determining their own location (Figure 13 left). In fact, agents’ behavioral performance correlates with their belief accuracy (Figure 13 right).
Conclusion
The brain has evolved advantageous modular architectures for mastering daily tasks. Here, we investigated the impact of architectural inductive biases on learning and generalization using deep RL agents. We posited that an architecture with functionally specialized modules would allow agents to more efficiently learn essential task variables and their dependencies during training, and then use this knowledge to support generalization in novel tasks with a similar structure. To test this, we trained agents with architectures featuring distinct module specializations on a partially observable navigation task. We found that the agent using a modular architecture exhibited superior learning of belief and control actions compared to agents with weaker modular specialization.
Furthermore, for readers interested in the full paper, we also demonstrated that the modular agent’s beliefs closely resemble an Extended Kalman Filter, appropriately weighting information sources based on their relative reliability. Additionally, we presented several more architectures with varying levels of modularity and confirmed that greater modularity leads to better performance.
GeForce NOW is kicking off 2025 by delivering 14 games to the cloud this month, with two available to stream this week so members can get started on their New Year’s gaming resolutions.
This year’s CES trade show will open with a keynote from NVIDIA founder and CEO Jensen Huang on Monday, Jan. 6. GeForce NOW is offering members front-row seats in a virtual stadium, so they can hear the latest announcements and get hyped with livestreams — no downloads or installations required.
It’s all powered by GeForce NOW cloud streaming and hosted by ZENOS, an innovative virtual stadium platform. Members can enter the virtual stadium starting at 3 p.m. PT on Monday, Jan. 6.
In addition, gear up to participate in NVIDIA GeForce LAN 50 gaming missions starting on Saturday, Jan. 4, at 4:30 p.m. PT. Stream #GeForceGreats games to unlock incredible in-game rewards with GeForce NOW.
Mission Possible
It’s rewarding to be a GeForce NOW member. Unlock exclusive rewards during CES by doing what gamers do best — playing the game. Members can participate in GeForce LAN 50 gaming missions even without a game-ready rig.
To participate, stream the following featured GeForce Greats games on GeForce NOW and earn exclusive in-game items:
Diablo IV: Creeping Shadows Mount Armor Bundle
The Elder Scrolls Online: Pineblossom Vale Elk Mount
The Finals: Legendary Corrugatosaurus Mask
World of Warcraft: Armored Bloodwing Mount
Fallout 76: Settler Work Chief Outfit and Raider Nomad Outfit
Complete each game’s mission to become eligible for the associated reward. Rewards will be available to redeem on Thursday, Jan. 9. They’re available on a first-come, first-served basis, so make sure to jump in right away.
With GeForce NOW, members can participate in the event on any supported device, whether a PC, Mac, SHIELD TV or mobile device, with access to GeForce RTX gaming rigs in the cloud for maximum performance.
Front-Row Seat to NVIDIA at CES 2025
Join others around the world as NVIDIA celebrates the latest advancements in gaming, technology and generative AI at CES, starting with livestreams from the GeForce LAN 50 online event that’ll lead up to the show’s opening NVIDIA keynote — all from the comfort of home, no trip to Las Vegas needed.
Enter the virtual stadium using the GeForce NOW app on PC, Mac or through a web browser for those without a GeForce NOW membership. The virtual stadium is hosted by ZENOS, which uses NVIDIA’s cloud gaming infrastructure to deliver high-fidelity, low-latency streaming experiences directly via web browsers, making live events more accessible worldwide.
Enhance the experience by signing in with a GeForce NOW membership, and create a “Ready Player Me” avatar and account to save digital characters for future visits. Members can link their Twitch accounts to chat, emote with other viewers in the stadium and collect NVIDIA-branded digital items, including NVIDIA foam fingers and jerseys, to customize their avatars.
New Year New Games
Look for the following games available to stream in the cloud this week: