Unifying learning from preferences and demonstration via a ranking game for imitation learning

Unifying learning from preferences and demonstration via a ranking game for imitation learning

Rank Game diagram

For many people, opening door handles or moving a pen between their fingers is a movement that happens multiple times a day, often without much thought. For a robot, however, these movements aren’t always so easy.

In reinforcement learning, robots learn to perform tasks by exploring their environments, receiving signals along the way that indicate how good their behavior is compared to the desired outcome, or state. For the described movements, for example, we can specify a reward function that is +1 when the door is successfully opened or the pen is at the desired orientation and 0 otherwise. But this makes the learning task complicated for the robot since it has to try out various motions before stumbling on the successful outcome, or a reward of +1.

The imitation learning (IL) paradigm was introduced to mitigate the amount of trial and error. In IL, the robot is provided with demonstrations of a given task performed by an expert from which it can try to learn the task and possibly gain information about the expert’s reward function, or the expert’s intent, similar to how people pick up various skills. Yet, learning remains difficult in instances where we only have access to the change enacted by the expert in the world, known as the expert observation, and not the precise actions the expert took to achieve the change. Another difficulty the robot faces is that even if it sees infinite expert demonstrations, it can’t fully reason about the intent of the expert—that is, compare whether one of its own learned behaviors is closer to the expert’s than another behavior—as it only knows the best behavior and has no notion of ordering over other behaviors.

Spotlight: On-demand video

AI Explainer: Foundation models ​and the next era of AI

Explore how the transformer architecture, larger models and more data, and in-context learning have helped advance AI from perception to creation.

In our paper “A Ranking Game for Imitation Learning,” being presented at Transactions on Machine Learning Research 2023 (TMLR), we propose a simple and intuitive framework, (texttt{rank-game}), that unifies learning from expert demonstrations and preferences by generalizing a key approach to imitation learning. Giving robots the ability to learn from preferences, obtained by having an expert rank which behavior aligns better with their objectives, allows the learning of more informative reward functions. Our approach, which enabled us to propose a new objective for training over behavior preferences, makes the learning process easier for a robot and achieves state-of-the-art results in imitation learning. It also enabled the training of a robot that can solve the tasks of opening a door and moving a pen between its fingers in simulation, a first in imitation learning with expert observations alone. The incorporation of preferences has also seen success in language modeling, where chatbots such as ChatGPT are improving themselves by learning a reward function inferred via preferences over several samples of model responses in addition to learning from desired human conversational data.

Robotics has found a place in controlled environments where the tasks at hand are well-defined and repeatable, such as on a factory floor. Our framework has the potential to help enable robot learning of tasks in more dynamic environments, such as helping people with daily chores around the home.

With (texttt{rank-game}), which combines learning from preferences and demonstrations via a two-player ranking-based game, robots in simulation were trained to manipulate a pen with a dexterous hand (left) and open a door with a parallel jaw gripper (right). The successful completion of these tasks marked a first in imitation learning with expert observations alone.

A ranking game for imitation learning

Inverse reinforcement learning (IRL) is a popular and effective method for imitation learning. IRL learns by inferring the reward function, also referred to as the intent of the expert, and a policy, which specifies what actions the agent—or, in our case, the robot—should take in a given state to successfully mimic the expert.

Notation: We use (pi) and (pi^E) to denote the policy of the agent and the expert, respectively, and (R_{gt}) to be the reward function of the expert, which is unknown to the agent/robot. (rho^pi) denotes the state-action/state visitation distribution of policy (pi) in the environment—the probabilistic collection of states the policy visits in the environment. We use (J(R;pi)) to denote the (textit{cumulative reward}), or the performance of policy (pi) under a reward function (R). We assume policy (pi) belongs to function class (Pi) and reward function R belongs to function class (mathcal{R}).

The goal of imitation learning is to make the agent have the same performance as the expert under the expert’s unknown reward function (R_{gt}). The classical IRL formulation tackles this by minimizing the imitation gap under a reward function that makes the performance gap the largest. We denote this framework by (texttt{imit-game}) and write it below formally:

(texttt{imit-game}(pi,pi^E): text{argmin}_{piinPi}text{max}_{Rinmathcal{R}} [mathbb{E}_{rho^E(s,a)}[R(s,a)]-mathbb{E}_{rho^pi(s,a)}[R(s,a)]])

Simply stated, the (texttt{imit-game}) tries to find a policy that has the lowest worst-case performance difference with the expert policy. This classical IRL formulation learns from expert demonstrations but provides no mechanism to incorporate learning from preferences. In our work, we ask, does IRL really need to consider the worst-case performance difference? We find that relaxing this requirement allows us to incorporate preferences.

Our proposed method treats imitation as a two-player ranking-based game between a policy and a reward. In this game, the reward agent learns to map more preferred behaviors to a higher total reward for each of the pairwise preferences, while the policy agent learns to maximize the performance on this reward function by interacting with the environment. Contrary to the classical IRL framework, the reward function now has to get only the rankings correct and not optimize for the worst case (see Figure 1).

A flow chart with, clockwise from top left, a green box labeled “policy agent,” a blue box labeled “reward agent,” and an orange box label “Dataset D,” which contains pairwise behavior rankings obtained from three sources. An arrow points from the policy agent to the dataset, indicating the policy’s contribution of rankings. An arrow pointing from the policy agent to the reward is labeled with the optimization strategy. An arrow pointing from the reward agent to the dataset is labeled with the ranking loss function.
Figure 1: The proposed (texttt{rank-game}) method treats imitation learning as a two-player ranking-based game between a policy and a reward. The policy agent maximizes the reward function by interacting with the environment. The reward agent satisfies a set of behavior rankings obtained from various sources: generated by the policy agent, automatically generated via data augmentation, or expert-annotated rankings obtained from a human or offline dataset.

To incorporate preferences, we need to quantify the behaviors in order to compare them. In this work, we choose the behaviors ((rho)) to be the state-action or state-only visitation distribution of the agent. A ranking between behaviors is used to specify that the expert would prefer one behavior over the other. A reward function that satisfies the behavior rankings ensures that the average return under a lower-ranked behavior is smaller than the higher-ranked behavior. More formally, the ranking game is defined as a game where the policy agent (pi) maximizes the expected return (J(R;pi)) of the policy under reward function (R) when deployed in the environment. The reward player takes the dataset of pairwise rankings (D^p) (rankings are denoted as (rho^ipreceqrho^j)) as an input and attempts to learn a reward function that satisfies those rankings using a ranking loss (denoted by (L(D^p;R))).

(underbrace{text{argmax}_{piinPi}J(R;pi)}_{text{Policy Agent}}~~~~~~~~~~~~~~~underbrace{text{argmin}_{Rinmathcal{R}}L(D^p;R)}_{text{Reward Agent}})

The ranking loss induces a reward function (R) that attempts to satisfy each pairwise preference in the dataset as follows:

(mathbb{E}_{rho^i}[R(s,a)]lemathbb{E}_{rho^j}[R(s,a)]~~,~~forall rho^ipreceqrho^j in D^p)

Generalizing prior imitation learning approaches with (texttt{rank-game})

The (texttt{rank-game}) framework neatly encapsulates prior work in IRL and prior work in learning from preferences, respectively. First, let’s see how classical IRL is a part of this framework. Recall that the classical IRL/(texttt{imit-game}) optimization can be written as:

(text{argmin}_{piinPi}text{max}_{Rinmathcal{R}} [mathbb{E}_{rho^E(s,a)}[R(s,a)]-mathbb{E}_{rho^pi(s,a)}[R(s,a)]])

The inner optimization learns a reward function that ensures that the return gap under the reward function is maximized between the current policy’s behavior and the expert behavior. Thus, (texttt{imit-game}) can be seen to be a special case of (texttt{rank-game}) with: (1) a ranking dataset that prefers expert behavior more than the current agent behavior and (2) a form of ranking loss that maximizes the performance gap (termed as (textit{supremum loss})). A number of prior methods in the imitation learning domain can be understood as special cases of (texttt{rank-game}) under various ranking losses, classes of reward functions, and abilities to incorporate preferences (see Figure 2).

A table with a summary of imitation learning (IL) methods demonstrating the data modalities they can handle (expert data and/or preferences), their ranking-loss functions, the assumptions they make on reward function, and whether they require availability of an external agent to provide preferences during training.  

  

The IL methods MaxEntIRL, AdRIL, GAN-GCL, GAIL, f-MAX, and AIRL don’t use offline preferences or active human query, enable Learning from Demonstration (LfD) when incorporating expert data, and use the supremum ranking loss function and a non-linear reward function. 

  

BCO, GAIfO, DACfO, OPOLO, and f-IRL don’t use offline preferences or active human query, enable Learning from Observation (LfO), and use the supremum ranking loss function and a non-linear reward function. 

  

TREX and DREX use offline preferences, the Bradley-Terry ranking loss function and a non-linear reward function; they don’t use active human query or enable LfO or LfD. 

  

BREX uses offline preferences, the Bradley-Terry ranking loss function, and a linear reward function; it doesn’t use active human query or enable LfO or LfD. 

  

DemPref uses offline preferences, the Bradley-Terry ranking loss function, a linear reward function, and active human query; it enables LfO and LfD. 

  

Ibarz et al. (2018) uses offline preferences, the Bradley-Terry ranking loss function, a non-linear reward function, and active human query; it enables LfD. 

  

Rank-game uses offline preferences, a new principled ranking loss that can naturally incorporate rankings provided by diverse sources, and a non-linear reward function; it enables LfO and LfD and doesn’t use active human query.
Figure 2: Previous methods that learn from expert demonstrations or preferences form a special case of (texttt{rank-game}) under a specific choice of ranking loss and a reward function class. Also noted in the table is whether a method enables learning from demonstration (LfD)—that is, learning from both expert states and actions—or learning from observations (LfO), where an agent learns from expert states alone.

Setting up the ranking game

To develop a framework that successfully combines learning from demonstrations and learning from preferences, we addressed several questions:

  1. What is the ranking loss function that allows for the reward to satisfy the preferences in the dataset?
  2. Where do we get the dataset of pairwise preferences?
  3. How can we effectively optimize this two-player game?

Step 1: A new ranking loss function for reward learning

Our proposed framework requires learning a reward function such that the rankings in the dataset are satisfied. While several loss functions exist in prior literature to enable this, such as Luce Shepard, Lovász-Bregman divergences, and the earlier discussed supremum loss, we introduce a new loss function:

(L_k(mathcal{D}^p;R) = mathbb{E}_{(rho^{pi^i},rho^{pi^j})sim mathcal{D}^p} Big[mathbb{E}_{s,asimrho^{pi^i}}{[(R(s,a)-0)^2]} + mathbb{E}_{s,asimrho^{pi^j}}{[(R(s,a)-k)^2]}Big])

The loss function is simple and intuitive: For all the preference pairs in the dataset, the less preferred behavior is regressed to a return of 0 and more preferred behavior is regressed to a return of user-defined parameter (k). This loss function allows us to learn a reward function with user-defined scale (k), which plays an important role in enabling better policy optimization; it’s principled and facilitates near-optimal imitation learning; and by design, it allows us to incorporate preferences.

Step 2: Getting the ranking dataset

Besides giving more information about the expert’s intent and being easy to obtain, another benefit of preferences is that they can also help learn a more informative, or shaped, reward function. This form of reward shaping can provide better guidance for policy optimization, reducing the burden of exploring the environment to find the optimal policy and increasing sample efficiency for IRL. Our initial ranking dataset is generated by the policy agent from its interactions with the environment; we always prefer expert’s behavior to be better or equal to current policy’s behavior in the rankings. To further leverage the benefits of preferences, we consider two methods for augmenting this ranking dataset:

  • Expert-annotated rankings: In situations where we have access to additional rankings, provided by humans or obtained from reward-annotated datasets, we can simply add them to our ranking dataset.
  • Automatically generated rankings: It turns out we can improve learning efficiency for imitation by using the rankings already present in the dataset of pairwise preferences to generate more preferences in a procedure similar to Mixup regularization in trajectory space.

Step 3: Improving optimization stability with Stackelberg game

Prior work has found the Stackelberg game framework to be a strong candidate for optimizing two-player games in various applications. A Stackelberg game is a bi-level optimization problem:

(text{max}_x (f(x,y_x)),~~~~text{s.t}~~y_xin text{min}_x(g(x,y)))

In this optimization, we have two players—Leader (x) and Follower (y)—that are trying to maximize and minimize their own payoff (f) and (g), respectively. We cast (texttt{rank-game}) as a Stackelberg game and propose two algorithms depending on which player is set to be the leader:

  • Policy as Leader (PAL): (text{max}_pi J(R,pi)~~~~~text{s.t}~~ R=text{argmin}_R~L(D^p;R))
  • Reward as Leader (RAL): (text{min}_R L(D^p;R)~~~text{s.t}~~pi = text{argmax}_pi~J(R;pi))

Aside from improving training stability, both methods have complementary benefits in the non-stationary imitation learning setting. PAL can adjust more quickly when the intent of the expert changes, while RAL can handle environmental changes better.

How well does (texttt{rank-game}) perform in practice?

In testing the capabilities of (texttt{rank-game}), one of the scenarios we consider is the learning from observations alone (LfO) setting, in which only expert observations are provided with no expert actions. This more challenging setting better reflects the learning conditions robots will operate under if we want them to be more widely deployed in both controlled and dynamic environments. People can more naturally provide demonstrations by performing tasks themselves (observations only) versus performing the task indirectly by operating a robot (observations and precise actions). We investigate the LfO performance of (texttt{rank-game}) on simulated locomotion tasks like hopping, walking, and running and benchmark it with respect to representative baselines. (texttt{Rank-game}) approaches require fewer environment interactions to succeed and outperform recent methods in final performance and training stability.

Additionally, our experiments reveal that none of the prior LfO methods can solve complex manipulation tasks such as door opening with a parallel jaw gripper and pen manipulation with a dexterous hand. This failure is potentially a result of the exploration requirements of LfO, which are high because of the unavailability of expert actions coupled with the fact that in these tasks observing successes is rare.

In this setting, we show that using only a handful of expert-annotated preferences in the (texttt{rank-game}) framework can allow us to solve these tasks. We cannot solve these tasks using only expert data—adding preferences is key.

Next steps

Equipping agents to learn from different sources of information present in the world is a promising direction toward more capable agents that can better assist people in the dynamic environments in which they live and work. The (texttt{rank-game}) framework has the potential to be extended directly to the setting where humans present their preferences interactively as the robot is learning. There are some promising future directions and open questions for researchers interested in this work. First, preferences obtained in the real world are usually noisy, and one limitation of (texttt{rank-game}) is that it does not suggest a way to handle noisy preferences. Second, (texttt{rank-game}) proposes modifications to learn a reward function amenable to policy optimization, but these hyperparameters are set manually. Future work can explore methods to automate such learning of reward functions. Third, despite learning effective policies, we observed that (texttt{rank-game}) did not learn reusable robust reward functions.

For additional details, including experiments in the learning from demonstration (LfD) setting, non-stationary imitation setting, and further framework analysis, check out the paper, project page, code, and video presentation.

Acknowledgments

This research was supported in part by the National Science Foundation, Air Force Office of Scientific Research, and Army Research Office.

The post Unifying learning from preferences and demonstration via a ranking game for imitation learning appeared first on Microsoft Research.

Read More

Unifying learning from preferences and demonstration via a ranking game for imitation learning

Unifying learning from preferences and demonstration via a ranking game for imitation learning

Rank Game diagram

For many people, opening door handles or moving a pen between their fingers is a movement that happens multiple times a day, often without much thought. For a robot, however, these movements aren’t always so easy.

In reinforcement learning, robots learn to perform tasks by exploring their environments, receiving signals along the way that indicate how good their behavior is compared to the desired outcome, or state. For the described movements, for example, we can specify a reward function that is +1 when the door is successfully opened or the pen is at the desired orientation and 0 otherwise. But this makes the learning task complicated for the robot since it has to try out various motions before stumbling on the successful outcome, or a reward of +1.

The imitation learning (IL) paradigm was introduced to mitigate the amount of trial and error. In IL, the robot is provided with demonstrations of a given task performed by an expert from which it can try to learn the task and possibly gain information about the expert’s reward function, or the expert’s intent, similar to how people pick up various skills. Yet, learning remains difficult in instances where we only have access to the change enacted by the expert in the world, known as the expert observation, and not the precise actions the expert took to achieve the change. Another difficulty the robot faces is that even if it sees infinite expert demonstrations, it can’t fully reason about the intent of the expert—that is, compare whether one of its own learned behaviors is closer to the expert’s than another behavior—as it only knows the best behavior and has no notion of ordering over other behaviors.

Spotlight: On-Demand EVENT

Microsoft Research Summit 2022

On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.

In our paper “A Ranking Game for Imitation Learning,” being presented at Transactions on Machine Learning Research 2023 (TMLR), we propose a simple and intuitive framework, (texttt{rank-game}), that unifies learning from expert demonstrations and preferences by generalizing a key approach to imitation learning. Giving robots the ability to learn from preferences, obtained by having an expert rank which behavior aligns better with their objectives, allows the learning of more informative reward functions. Our approach, which enabled us to propose a new objective for training over behavior preferences, makes the learning process easier for a robot and achieves state-of-the-art results in imitation learning. It also enabled the training of a robot that can solve the tasks of opening a door and moving a pen between its fingers in simulation, a first in imitation learning with expert observations alone. The incorporation of preferences has also seen success in language modeling, where chatbots such as ChatGPT are improving themselves by learning a reward function inferred via preferences over several samples of model responses in addition to learning from desired human conversational data.

Robotics has found a place in controlled environments where the tasks at hand are well-defined and repeatable, such as on a factory floor. Our framework has the potential to help enable robot learning of tasks in more dynamic environments, such as helping people with daily chores around the home.

With (texttt{rank-game}), which combines learning from preferences and demonstrations via a two-player ranking-based game, robots in simulation were trained to manipulate a pen with a dexterous hand (left) and open a door with a parallel jaw gripper (right). The successful completion of these tasks marked a first in imitation learning with expert observations alone.

A ranking game for imitation learning

Inverse reinforcement learning (IRL) is a popular and effective method for imitation learning. IRL learns by inferring the reward function, also referred to as the intent of the expert, and a policy, which specifies what actions the agent—or, in our case, the robot—should take in a given state to successfully mimic the expert.

Notation: We use (pi) and (pi^E) to denote the policy of the agent and the expert, respectively, and (R_{gt}) to be the reward function of the expert, which is unknown to the agent/robot. (rho^pi) denotes the state-action/state visitation distribution of policy (pi) in the environment—the probabilistic collection of states the policy visits in the environment. We use (J(R;pi)) to denote the (textit{cumulative reward}), or the performance of policy (pi) under a reward function (R). We assume policy (pi) belongs to function class (Pi) and reward function R belongs to function class (mathcal{R}).

The goal of imitation learning is to make the agent have the same performance as the expert under the expert’s unknown reward function (R_{gt}). The classical IRL formulation tackles this by minimizing the imitation gap under a reward function that makes the performance gap the largest. We denote this framework by (texttt{imit-game}) and write it below formally:

(texttt{imit-game}(pi,pi^E): text{argmin}_{piinPi}text{max}_{Rinmathcal{R}} [mathbb{E}_{rho^E(s,a)}[R(s,a)]-mathbb{E}_{rho^pi(s,a)}[R(s,a)]])

Simply stated, the (texttt{imit-game}) tries to find a policy that has the lowest worst-case performance difference with the expert policy. This classical IRL formulation learns from expert demonstrations but provides no mechanism to incorporate learning from preferences. In our work, we ask, does IRL really need to consider the worst-case performance difference? We find that relaxing this requirement allows us to incorporate preferences.

Our proposed method treats imitation as a two-player ranking-based game between a policy and a reward. In this game, the reward agent learns to map more preferred behaviors to a higher total reward for each of the pairwise preferences, while the policy agent learns to maximize the performance on this reward function by interacting with the environment. Contrary to the classical IRL framework, the reward function now has to get only the rankings correct and not optimize for the worst case (see Figure 1).

A flow chart with, clockwise from top left, a green box labeled “policy agent,” a blue box labeled “reward agent,” and an orange box label “Dataset D,” which contains pairwise behavior rankings obtained from three sources. An arrow points from the policy agent to the dataset, indicating the policy’s contribution of rankings. An arrow pointing from the policy agent to the reward is labeled with the optimization strategy. An arrow pointing from the reward agent to the dataset is labeled with the ranking loss function.
Figure 1: The proposed (texttt{rank-game}) method treats imitation learning as a two-player ranking-based game between a policy and a reward. The policy agent maximizes the reward function by interacting with the environment. The reward agent satisfies a set of behavior rankings obtained from various sources: generated by the policy agent, automatically generated via data augmentation, or expert-annotated rankings obtained from a human or offline dataset.

To incorporate preferences, we need to quantify the behaviors in order to compare them. In this work, we choose the behaviors ((rho)) to be the state-action or state-only visitation distribution of the agent. A ranking between behaviors is used to specify that the expert would prefer one behavior over the other. A reward function that satisfies the behavior rankings ensures that the average return under a lower-ranked behavior is smaller than the higher-ranked behavior. More formally, the ranking game is defined as a game where the policy agent (pi) maximizes the expected return (J(R;pi)) of the policy under reward function (R) when deployed in the environment. The reward player takes the dataset of pairwise rankings (D^p) (rankings are denoted as (rho^ipreceqrho^j)) as an input and attempts to learn a reward function that satisfies those rankings using a ranking loss (denoted by (L(D^p;R))).

(underbrace{text{argmax}_{piinPi}J(R;pi)}_{text{Policy Agent}}~~~~~~~~~~~~~~~underbrace{text{argmin}_{Rinmathcal{R}}L(D^p;R)}_{text{Reward Agent}})

The ranking loss induces a reward function (R) that attempts to satisfy each pairwise preference in the dataset as follows:

(mathbb{E}_{rho^i}[R(s,a)]lemathbb{E}_{rho^j}[R(s,a)]~~,~~forall rho^ipreceqrho^j in D^p)

Generalizing prior imitation learning approaches with (texttt{rank-game})

The (texttt{rank-game}) framework neatly encapsulates prior work in IRL and prior work in learning from preferences, respectively. First, let’s see how classical IRL is a part of this framework. Recall that the classical IRL/(texttt{imit-game}) optimization can be written as:

(text{argmin}_{piinPi}text{max}_{Rinmathcal{R}} [mathbb{E}_{rho^E(s,a)}[R(s,a)]-mathbb{E}_{rho^pi(s,a)}[R(s,a)]])

The inner optimization learns a reward function that ensures that the return gap under the reward function is maximized between the current policy’s behavior and the expert behavior. Thus, (texttt{imit-game}) can be seen to be a special case of (texttt{rank-game}) with: (1) a ranking dataset that prefers expert behavior more than the current agent behavior and (2) a form of ranking loss that maximizes the performance gap (termed as (textit{supremum loss})). A number of prior methods in the imitation learning domain can be understood as special cases of (texttt{rank-game}) under various ranking losses, classes of reward functions, and abilities to incorporate preferences (see Figure 2).

A table with a summary of imitation learning (IL) methods demonstrating the data modalities they can handle (expert data and/or preferences), their ranking-loss functions, the assumptions they make on reward function, and whether they require availability of an external agent to provide preferences during training.  

  

The IL methods MaxEntIRL, AdRIL, GAN-GCL, GAIL, f-MAX, and AIRL don’t use offline preferences or active human query, enable Learning from Demonstration (LfD) when incorporating expert data, and use the supremum ranking loss function and a non-linear reward function. 

  

BCO, GAIfO, DACfO, OPOLO, and f-IRL don’t use offline preferences or active human query, enable Learning from Observation (LfO), and use the supremum ranking loss function and a non-linear reward function. 

  

TREX and DREX use offline preferences, the Bradley-Terry ranking loss function and a non-linear reward function; they don’t use active human query or enable LfO or LfD. 

  

BREX uses offline preferences, the Bradley-Terry ranking loss function, and a linear reward function; it doesn’t use active human query or enable LfO or LfD. 

  

DemPref uses offline preferences, the Bradley-Terry ranking loss function, a linear reward function, and active human query; it enables LfO and LfD. 

  

Ibarz et al. (2018) uses offline preferences, the Bradley-Terry ranking loss function, a non-linear reward function, and active human query; it enables LfD. 

  

Rank-game uses offline preferences, a new principled ranking loss that can naturally incorporate rankings provided by diverse sources, and a non-linear reward function; it enables LfO and LfD and doesn’t use active human query.
Figure 2: Previous methods that learn from expert demonstrations or preferences form a special case of (texttt{rank-game}) under a specific choice of ranking loss and a reward function class. Also noted in the table is whether a method enables learning from demonstration (LfD)—that is, learning from both expert states and actions—or learning from observations (LfO), where an agent learns from expert states alone.

Setting up the ranking game

To develop a framework that successfully combines learning from demonstrations and learning from preferences, we addressed several questions:

  1. What is the ranking loss function that allows for the reward to satisfy the preferences in the dataset?
  2. Where do we get the dataset of pairwise preferences?
  3. How can we effectively optimize this two-player game?

Step 1: A new ranking loss function for reward learning

Our proposed framework requires learning a reward function such that the rankings in the dataset are satisfied. While several loss functions exist in prior literature to enable this, such as Luce Shepard, Lovász-Bregman divergences, and the earlier discussed supremum loss, we introduce a new loss function:

(L_k(mathcal{D}^p;R) = mathbb{E}_{(rho^{pi^i},rho^{pi^j})sim mathcal{D}^p} Big[mathbb{E}_{s,asimrho^{pi^i}}{[(R(s,a)-0)^2]} + mathbb{E}_{s,asimrho^{pi^j}}{[(R(s,a)-k)^2]}Big])

The loss function is simple and intuitive: For all the preference pairs in the dataset, the less preferred behavior is regressed to a return of 0 and more preferred behavior is regressed to a return of user-defined parameter (k). This loss function allows us to learn a reward function with user-defined scale (k), which plays an important role in enabling better policy optimization; it’s principled and facilitates near-optimal imitation learning; and by design, it allows us to incorporate preferences.

Step 2: Getting the ranking dataset

Besides giving more information about the expert’s intent and being easy to obtain, another benefit of preferences is that they can also help learn a more informative, or shaped, reward function. This form of reward shaping can provide better guidance for policy optimization, reducing the burden of exploring the environment to find the optimal policy and increasing sample efficiency for IRL. Our initial ranking dataset is generated by the policy agent from its interactions with the environment; we always prefer expert’s behavior to be better or equal to current policy’s behavior in the rankings. To further leverage the benefits of preferences, we consider two methods for augmenting this ranking dataset:

  • Expert-annotated rankings: In situations where we have access to additional rankings, provided by humans or obtained from reward-annotated datasets, we can simply add them to our ranking dataset.
  • Automatically generated rankings: It turns out we can improve learning efficiency for imitation by using the rankings already present in the dataset of pairwise preferences to generate more preferences in a procedure similar to Mixup regularization in trajectory space.

Step 3: Improving optimization stability with Stackelberg game

Prior work has found the Stackelberg game framework to be a strong candidate for optimizing two-player games in various applications. A Stackelberg game is a bi-level optimization problem:

(text{max}_x (f(x,y_x)),~~~~text{s.t}~~y_xin text{min}_x(g(x,y)))

In this optimization, we have two players—Leader (x) and Follower (y)—that are trying to maximize and minimize their own payoff (f) and (g), respectively. We cast (texttt{rank-game}) as a Stackelberg game and propose two algorithms depending on which player is set to be the leader:

  • Policy as Leader (PAL): (text{max}_pi J(R,pi)~~~~~text{s.t}~~ R=text{argmin}_R~L(D^p;R))
  • Reward as Leader (RAL): (text{min}_R L(D^p;R)~~~text{s.t}~~pi = text{argmax}_pi~J(R;pi))

Aside from improving training stability, both methods have complementary benefits in the non-stationary imitation learning setting. PAL can adjust more quickly when the intent of the expert changes, while RAL can handle environmental changes better.

How well does (texttt{rank-game}) perform in practice?

In testing the capabilities of (texttt{rank-game}), one of the scenarios we consider is the learning from observations alone (LfO) setting, in which only expert observations are provided with no expert actions. This more challenging setting better reflects the learning conditions robots will operate under if we want them to be more widely deployed in both controlled and dynamic environments. People can more naturally provide demonstrations by performing tasks themselves (observations only) versus performing the task indirectly by operating a robot (observations and precise actions). We investigate the LfO performance of (texttt{rank-game}) on simulated locomotion tasks like hopping, walking, and running and benchmark it with respect to representative baselines. (texttt{Rank-game}) approaches require fewer environment interactions to succeed and outperform recent methods in final performance and training stability.

Additionally, our experiments reveal that none of the prior LfO methods can solve complex manipulation tasks such as door opening with a parallel jaw gripper and pen manipulation with a dexterous hand. This failure is potentially a result of the exploration requirements of LfO, which are high because of the unavailability of expert actions coupled with the fact that in these tasks observing successes is rare.

In this setting, we show that using only a handful of expert-annotated preferences in the (texttt{rank-game}) framework can allow us to solve these tasks. We cannot solve these tasks using only expert data—adding preferences is key.

Next steps

Equipping agents to learn from different sources of information present in the world is a promising direction toward more capable agents that can better assist people in the dynamic environments in which they live and work. The (texttt{rank-game}) framework has the potential to be extended directly to the setting where humans present their preferences interactively as the robot is learning. There are some promising future directions and open questions for researchers interested in this work. First, preferences obtained in the real world are usually noisy, and one limitation of (texttt{rank-game}) is that it does not suggest a way to handle noisy preferences. Second, (texttt{rank-game}) proposes modifications to learn a reward function amenable to policy optimization, but these hyperparameters are set manually. Future work can explore methods to automate such learning of reward functions. Third, despite learning effective policies, we observed that (texttt{rank-game}) did not learn reusable robust reward functions.

For additional details, including experiments in the learning from demonstration (LfD) setting, non-stationary imitation setting, and further framework analysis, check out the paper, project page, code, and video presentation.

Acknowledgments

This research was supported in part by the National Science Foundation, Air Force Office of Scientific Research, and Army Research Office.

The post Unifying learning from preferences and demonstration via a ranking game for imitation learning appeared first on Microsoft Research.

Read More

NVIDIA Studio Creators Take Collaboration to Bone-Chilling New Heights

NVIDIA Studio Creators Take Collaboration to Bone-Chilling New Heights

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

NVIDIA Omniverse, a pillar of the NVIDIA Studio suite of tools and apps built for creators, helps interconnect 3D artists’ workflows by replacing linear pipelines with live-sync creation.

This week’s In the NVIDIA Studio artists specializing in 3D, Gianluca Squillace and Pasquale Scionti, benefitted from just that — in their individual work and in collaborating to construct the final scene for their project, Cold Inside Diorama.

 

The artists set out to build a Viking scene that conveyed a cold, glacial mood — uniting Squillace’s character with Scionti’s environment while maintaining their individual styles. Such a workflow, which used to require endless exports and imports, was made simple with Omniverse and Universal Scene Description (USD), the open and extensible ecosystem for describing, composing, simulating and collaborating within 3D worlds.

It Takes Character

Powered by a GeForce RTX 4080 GPU, Squillace started with rich reference research to create the character. He took several realistic and stylized images, concepts and photos of characters and weapons to define all the details.

After defining the character, Squillace built a quick blockout in the ZBrush tool, before sculpting all the details to reach the definitive high-poly model. The character’s hairstyle and other aspects underwent several tests in this step of the process before arriving at the final version.

Squillace then moved on to the retopology and UV phase in Autodesk Maya, optimizing models for movements and deformations, which he said allows excellent control of the polygons with quad draw and simple management of the UVs. It also enables mirroring different UV shells and saving space by improving the final quality.

High-poly models take time to develop.

The artists brought the completed high- and low-poly into Adobe Substance 3D Painter for the bake and texturing phase. Squillace had already defined the main colors with polypaint in ZBrush, so he used the materials present in Painter to stylize different metals, leathers and woods and achieve the final textures.

 

After completing a quick rig and an animation idle in Maya, the animation in video games when the main character remains still, the artist took advantage of the USD file format to instantly pull the animation into NVIDIA Omniverse USD Composer (formerly known as Create) to see the results in real time.

“With the connectors plug-in installed in Substance 3D Painter and Maya, I could edit textures and animation in real time and immediately see the results in Omniverse USD Composer, without exporting extra files or having to close and reopen the software,” Squillace said. “It was really a great improvement in terms of workflow and speed.”

He next felt the benefit of working with USD when seamlessly importing files into Marmoset Toolbag for character-only renders.

 

Squillace took full advantage of his RTX GPU in nearly every step of production. “I was really impressed by the fast calculation of ray-traced lighting in the Marmoset Toolbag scene, which lets me make a lot of real-time renders and videos with an outstanding final result,” he said.

A Hostile Environment Built in a Friendly One

Another artist worked in parallel. Scionti built the cold, harsh environment using Omniverse’s collaboration capabilities and USD files. This combination enabled him to integrate his files with Squillace’s — and see edits in real time.

Scionti said he always starts his work by establishing its mood. With the cold, glacial tone set for this piece, he modeled some of the scene elements using Autodesk 3ds Max before importing them into Adobe Substance 3D Painter, where he created unique materials.

For some additional materials, he used Quixel software and completed the composition and design using Quixel Megascans. As with all his work, in this piece Scionti intentionally left room for interpretation, letting the audience imagine their own story.

The scene was then finalized with composition mood and lighting. Accelerated by his GeForce RTX 3090 GPU, Unreal Engine 5.1 and Lumen with hardware ray tracing helped Scionti achieve a higher level of realism with intricate details. Nanite meshes with improved virtualized geometry were useful to generate high-polygon models for close-up details. In the lighting phase, the artist used sun and sky with volumetric fog and high dynamic range.

 

“My GPU gives me so many realistic details in real time,” Scionti said.

The duo then brought Squillace’s work into Scionti’s Unreal Engine scene to integrate the character into the snowy environment. With their scene complete, the artists enjoyed the final render and reflected on its creation.

A stunning, emotion-evoking scene, built in Omniverse USD Composer.

“NVIDIA Omniverse was the center of experimentation for this project — it allowed me to work with different software at the same time, increasing the production speed of the final character,” Squillace said. “I think the system provides enormous potential, especially if combined with the standard production workflow of a 3D asset.”

 

Scionti added that Omniverse is “a great software to collaborate with other people around the globe and interact in real time on the same project.”

3D artists Gianluca Squillace and Pasquale Scionti.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. Get started with Omniverse and learn more on Instagram, Medium, Twitter and YouTube for additional resources and inspiration. Check out the Omniverse forums, and join our Discord server and Twitch channel to chat with the community.

Read More

Accelerating Large Language Models with Accelerated Transformers

Accelerating Large Language Models with Accelerated Transformers

TL;DR. We show how to use Accelerated PyTorch 2.0 Transformers and the newly introduced torch.compile() method to accelerate Large Language Models on the example of nanoGPT, a compact open-source implementation of the GPT model from Andrej Karpathy. Using the new scaled dot product attention operator introduced with Accelerated PT2 Transformers, we select the flash_attention custom kernel and achieve faster training time per batch (measured with Nvidia A100 GPUs), going from a ~143ms/batch baseline to ~113 ms/batch. In addition, the enhanced implementation using the SDPA operator offers better numerical stability. Finally, further optimizations are achieved using padded inputs, which when combined with flash attention lead to ~87ms/batch.

Recent times have seen exponential adoption of large language models (LLMs) and Generative AI in everyday life. Tightly coupled with these ever-growing models is the ever-growing training cost – in terms of both time and hardware utilization. The PyTorch team has tackled these challenges head on with Accelerated PyTorch 2 Transformers (previously known as “Better Transformer”) and JIT Compilation in PyTorch 2.0.

In this blog post, we explore training optimizations gained by utilizing custom kernel implementations of SDPA – also known as scaled dot product attention – a critical layer in transformer models. The custom kernel for SDPA replaces several discrete sequential operations with one globally optimized kernel which avoids allocating a large amount of intermediate CUDA memory. This approach offers a number of advantages, including but not limited to: higher performance computation of SDPA by reducing memory bandwidth bottleneck, reduced memory footprint to support larger batch sizes, and finally added numerical stability by prescaling input tensors. These optimizations are demonstrated on nanoGPT, an open-source implementation of GPT from Andrej Karpathy.

Background

Scaled dot product attention is the fundamental building block of multihead attention, as introduced in “Attention is All You Need”, and has a wide range of applications in LLM and Generative AI models.

The Transformer model architecture

Figure 1: The Transformer model architecture based on “Attention is All You Need”. With the new PyTorch SDPA operator, Multi-Head Attention is efficiently implemented by a linear layer for the in-projection, the SDPA operator, and a linear layer for the out-projection.

With the new scaled_dot_product_attention operator, multihead attention can be implemented in just 3 steps: in projection with a linear layer, SDPA, and out projection with a linear layer.

# In Projection
# variable descriptions:
# q,k,v = Query, Key, Value tensors
# bsz = batch size
# num_heads = Numner of heads for Multihead Attention
# tgt_len = Target length
# src_len = Source Length
# head_dim: Head Dimension
    q, k, v = _in_projection(query, key, value, q_proj_weight, k_proj_weight, v_proj_weight, b_q, b_k, b_v)
    q = q.view(bsz, num_heads, tgt_len, head_dim)
    k = k.view(bsz, num_heads, src_len, head_dim)
    v = v.view(bsz, num_heads, src_len, head_dim)

    # Scaled Dot Product Attention
    attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal)

    # Out Projection
    attn_output = attn_output.permute(2, 0, 1, 3).contiguous().view(bsz * tgt_len, embed_dim)
    attn_output = linear(attn_output, out_proj_weight, out_proj_bias)
    attn_output = attn_output.view(tgt_len, bsz, attn_output.size(1))

PyTorch 2. supports multiple different kernels optimized for specific use cases, with specific requirements. A kernel picker picks the best kernel for a particular combination of input parameters. If no optimized “custom kernel” for a particular combination of input parameters can be identified, the kernel picker selects a general kernel that can handle all input combinations.

While future releases may extend this set of operators, PyTorch 2.0 launches with 3 implementations for the SDPA operator:

  1. A generic kernel which implements the mathematical equation of SDPA in the function sdpa_math()
  2. An optimized kernel based on the paper “Flash Attention”, which supports evaluation of SDPA with 16 bit floating point data types on compute architecture SM80 (A100).
  3. An optimized kernel based on the paper “Self-Attention Does Not Need O(n^2) Memory” and implemented in xFormer, which supports both 32 and 16 bit floating data types on a wider range of architectures (SM40 and later). This blog post refers to this kernel as the mem_efficient kernel.

Note that both optimized kernels (two and three listed above), support a key padding mask and limit the supported attention mask to causal attention. Accelerated PyTorch 2.0 Transformers today only support the causal mask when it is specified using the is_causal boolean. When a mask is specified, the general-purpose kernel will be selected because it is too expensive to analyze the contents of a provided mask to determine if it is the causal mask. Additional explanations on the constraints for each kernel can be found in the Accelerated PT2 Transformer blog.

Enabling Accelerated Transformers with nanoGPT

The SDPA operator being a critical component of the GPT model, we identified the open source nanoGPT model as an excellent candidate for both demonstrating the ease of implementation and benefits of PyTorch 2.0’s Accelerated Transformers. The following demonstrates the exact process by which Accelerated Transformers was enabled on nanoGPT.

This process largely revolves around replacing the existing SDPA implementation with the newly added F.scaled_dot_product_attention operator from functional.py. This process can be easily adapted to enable the operator in many other LLMs. Alternatively, users can instead choose to call F.multi_head_attention_forward() or utilize the nn.MultiHeadAttention module directly where applicable. The following code snippets are adapted from Karpathy’s nanoGPT repository.

Step 1: Identify the existing SDPA implementation

In the case of nanoGPT, SDPA is implemented in the model’s CausalSelfAttention class. The original implementation at time of writing is adapted below for this post.

The original implementation at time of writing

Step 2: Replace with Torch’s scaled_dot_product_attention

At this point we can note the following:

  • Lines 36 – 42 define the mathematical implementation of SDPA which we are replacing
  • The mask applied on line 39 is no longer relevant since we are using scaled_dot_product_attention’s is_causal flag.
  • The dropout layer used in line 41 is also now unnecessary.

Swapping out the SDPA implementation for torch’s scaled_dot_product_attention and removing the now redundant code yields the following implementation.

Swapping out the SDPA implementation for torch’s scaled_dot_product_attention and removing the now redundant code yields the following implementation.

Alternatively, the original mask can be passed into the attn_mask field however due to the mentioned kernel constraints that would limit the implementation to only support the generic sdpa_math kernel.

Step 3 (Bonus): Faster matmuls with padding

On top of the performance improvements from SDPA, our analysis yielded a nice ancillary win. In Andrej’s words “The most dramatic optimization to nanoGPT so far (~25% speedup) is to simply increase the vocab size from 50257 to 50304 (nearest multiple of 64).”

Tweet by Andrej Karpathy

The vocab size determines the dimensions of matmuls in the output layer of GPT, and these are so large that they were taking a majority of the time for the entire training loop! We discovered that they were achieving performance significantly below the peak throughput achievable on the A100 GPU, and guessed from NVIDIA’s matmul documentation that 64-element alignment would yield better results. Indeed, padding these matmuls achieves nearly a 3x speedup! The underlying cause is that unaligned memory accesses significantly reduce efficiency. A deeper analysis can be found in this Twitter thread.

With this optimization we were able to further reduce training time from ~113 ms (using flash attention) to ~87 ms per batch.

Results

The figure below demonstrates the performance gained using Pytorch custom kernels. Here are the exact figures:

  • baseline (nanoGPT implementation): ~143ms
  • sdpa_math (generic): ~134ms (6.71% faster)
  • mem_efficient kernel: ~119ms (20.16% faster)
  • flash_attention kernel: ~113ms (26.54% faster)
  • flash_attention + padded vocab: ~87ms (64.37% faster)

All code was run on an 8 x NVIDIA Corporation A100 server with 80 GB HBM [A100 SXM4 80GB], and for the purpose of this experiment dropout was set to 0.

Using scaled dot product attention with custom kernels and torch.compile delivers significant speedups for training large language models

Figure 2: Using scaled dot product attention with custom kernels and torch.compile delivers significant speedups for training large language models, such as for nanoGPT shown here.

Enhancing Numerical Model Stability

In addition to being faster, PyTorch’s implementation offers increased numerical stability by avoiding loss of precision in many execution scenarios. There is a great explanation here, but essentially the PyTorch implementation scales the Query and Key matrices before multiplication, which is said to be more stable and avoid loss of precision. Because of the merged custom kernel architecture of SDPA, this scaling does not introduce additional overhead in the computation of the attention result. In comparison, an implementation from the individual computational components would require separate pre-scaling at additional cost. For an additional explanation, see Appendix A.

Improved Memory Consumption

Yet another large advantage of using the torch SDPA kernels is the reduced memory footprint, which allows for the utilization of larger batch sizes. The following chart compares the best validation loss after one hour of training for both flash attention and the baseline implementations of causal attention. As can be seen, the maximum batch size achieved with the baseline causal attention implementation (on 8 x NVIDIA Corporation A100 server with 80 GB HBM) was 24, significantly less then the maximum achieved with flash attention, which was 39.

Using Flash Attention enables the usage of larger batch sizes

Figure 3: Using Flash Attention enables the usage of larger batch sizes, allowing users to achieve lower validation loss after one hour of training (smaller is better).

Conclusion

Accelerated PyTorch 2 Transformers were designed to make the training and production deployment of state-of-the-art transformer models affordable and integrated with PyTorch 2.0 model JIT compilation. The newly introduced PyTorch SDPA operator provides improved performance for training Transformer models and is particularly valuable for the expensive Large Language Model training. In this post we demonstrate a number of optimizations on the exemplary nanoGPT model including:

  • Over 26% training speedup, when compared against the baseline with constant batch size
  • An additional speedup achieved with padded vocabulary, bringing the total optimization to approximately 64% compared to the baseline
  • Additional numerical stability

Appendix A: Analyzing Attention Numeric Stability

In this section we provide a more in depth explanation of the previously mentioned enhanced numerical stability which is gained by prescaling SDPA’s input vectors. The following is a simplified version of nanoGPT’s mathematical implementation of SDPA. The important thing to note here is that the query undergoes matrix multiplication without being scaled.

# nanoGPT implementation of SDPA
# notice q (our query vector) is not scaled !
att = (q @ k.transpose(-2, -1)) * (1.0 / math.sqrt(k.size(-1)))
att = att.masked_fill(self.bias[:,:,:T,:T] == 0, float('-inf'))
att = F.softmax(att, dim=-1)

# Dropout is set to 0, so we can safely ignore this line in the implementation# att = self.attn_dropout(att) 

y_nanogpt = att @ v # (B, nh, T, T) x (B, nh, T, hs) -> (B, nh, T, hs)

The following is the equivalent mathematical implementation in torch’s scaled_dot_product_attention.

# PyTorch implementation of SDPA
embed_size = q.size(-1)
scaling_factor = math.sqrt(math.sqrt(embed_size))
q = q / scaling_factor 	# notice q _is_ scaled here !

# same as above, but with scaling factor
att = q @ (k.transpose(-2, -1) / scaling_factor)
att = att.masked_fill(self.bias[:,:,:T,:T] == 0, float('-inf'))
att = F.softmax(att0, dim=-1)

# Dropout is set to 0, so we can safely ignore this line in the implementation# att = self.attn_dropout(att) 

y_scale_before = att @ v

Mathematically both approaches should be equivalent, however our experimentation shows that in practice we receive different results from each approach.

Using the approach above, we verified y_scale_before matches the expected output from using the scaled_dot_product_attention method while y_nanogpt does not.

The torch.allclose method was used to test equivalence. Specifically, we showed that:

y_sdpa = torch.nn.functional._scaled_dot_product_attention(
	q,
	k,
	v,
	attn_mask=self.bias[:,:,:T,:T] != 0,
	dropout_p=0.0,
	need_attn_weights=False,
	is_causal=False,
)

torch.allclose(y_sdpa, y_nanogpt) # False, indicating fp issues
torch.allclose(y_sdpa, y_scale_before) # True, as expected

Appendix B: Reproducing Experiment Results

Researchers seeking to reproduce these results should start with the following commit from Andrej’s nanoGPT repository – b3c17c6c6a363357623f223aaa4a8b1e89d0a465. This commit was used as the baseline when measuring the per batch speed improvements. For results which include padded vocabulary optimizations (which yielded the most significant improvements to batch speed), use the following commit – 77e7e04c2657846ddf30c1ca2dd9f7cbb93ddeab. From either checkout, selecting kernels for experimentation is made trivial with the use of the torch.backends API.

The desired kernel can be selected via a context manager:

with torch.backends.cuda.sdp_kernel (
    enable_math = False,
    enable_flash = False,
    enable_mem_efficient = True
):
    train(model)

Read More

Differentially private heatmaps

Differentially private heatmaps

Recently, differential privacy (DP) has emerged as a mathematically robust notion of user privacy for data aggregation and machine learning (ML), with practical deployments including the 2022 US Census and in industry. Over the last few years, we have open-sourced libraries for privacy-preserving analytics and ML and have been constantly enhancing their capabilities. Meanwhile, new algorithms have been developed by the research community for several analytic tasks involving private aggregation of data.

One such important data aggregation method is the heatmap. Heatmaps are popular for visualizing aggregated data in two or more dimensions. They are widely used in many fields including computer vision, image processing, spatial data analysis, bioinformatics, and more. Protecting the privacy of user data is critical for many applications of heatmaps. For example, heatmaps for gene microdata are based on private data from individuals. Similarly, a heatmap of popular locations in a geographic area are based on user location check-ins that need to be kept private.

Motivated by such applications, in “Differentially Private Heatmaps” (presented at AAAI 2023), we describe an efficient DP algorithm for computing heatmaps with provable guarantees and evaluate it empirically. At the core of our DP algorithm for heatmaps is a solution to the basic problem of how to privately aggregate sparse input vectors (i.e., input vectors with a small number of non-zero coordinates) with a small error as measured by the Earth Mover’s Distance (EMD). Using a hierarchical partitioning procedure, our algorithm views each input vector, as well as the output heatmap, as a probability distribution over a number of items equal to the dimension of the data. For the problem of sparse aggregation under EMD, we give an efficient algorithm with error asymptotically close to the best possible.

Algorithm description

Our algorithm works by privatizing the aggregated distribution (obtained by averaging over all user inputs), which is sufficient for computing a final heatmap that is private due to the post-processing property of DP. This property ensures that any transformation of the output of a DP algorithm remains differentially private. Our main contribution is a new privatization algorithm for the aggregated distribution, which we will describe next.

The EMD measure, which is a distance-like measure of dissimilarity between two probability distributions originally proposed for computer vision tasks, is well-suited for heatmaps since it takes the underlying metric space into account and considers “neighboring” bins. EMD is used in a variety of applications including deep learning, spatial analysis, human mobility, image retrieval, face recognition, visual tracking, shape matching, and more.

To achieve DP, we need to add noise to the aggregated distribution. We would also like to preserve statistics at different scales of the grid to minimize the EMD error. So, we create a hierarchical partitioning of the grid, add noise at each level, and then recombine into the final DP aggregated distribution. In particular, the algorithm has the following steps:

  1. Quadtree construction: Our hierarchical partitioning procedure first divides the grid into four cells, then divides each cell into four subcells; it recursively continues this process until each cell is a single pixel. This procedure creates a quadtree over the subcells where the root represents the entire grid and each leaf represents a pixel. The algorithm then calculates the total probability mass for each tree node (obtained by adding up the aggregated distribution’s probabilities of all leaves in the subtree rooted at this node). This step is illustrated below.
    In the first step, we take the (non-private) aggregated distribution (top left) and repeatedly divide it to create a quadtree. Then, we compute the total probability mass is each cell (bottom).
  2. Noise addition: To each tree node’s mass we then add Laplace noise calibrated to the use case.
  3. Truncation: To help reduce the final amount of noise in our DP aggregated distribution, the algorithm traverses the tree starting from the root and, at each level, it discards all but the top w nodes with highest (noisy) masses together with their descendants.
  4. Reconstruction: Finally, the algorithm solves a linear program to recover the aggregated distribution. This linear program is inspired by the sparse recovery literature where the noisy masses are viewed as (noisy) measurements of the data.
In step 2, noise is added to each cell’s probability mass. Then in step 3, only top-w cells are kept (green) whereas the remaining cells are truncated (red). Finally, in the last step, we write a linear program on these top cells to reconstruct the aggregation distribution, which is now differentially private.

Experimental results

We evaluate the performance of our algorithm in two different domains: real-world location check-in data and image saliency data. We consider as a baseline the ubiquitous Laplace mechanism, where we add Laplace noise to each cell, zero out any negative cells, and produce the heatmap from this noisy aggregate. We also consider a “thresholding” variant of this baseline that is more suited to sparse data: only keep top t% of the cell values (based on the probability mass in each cell) after noising while zeroing out the rest. To evaluate the quality of an output heatmap compared to the true heatmap, we use Pearson coefficient, KL-divergence, and EMD. Note that when the heatmaps are more similar, the first metric increases but the latter two decrease.

The locations dataset is obtained by combining two datasets, Gowalla and Brightkite, both of which contain check-ins by users of location-based social networks. We pre-processed this dataset to consider only check-ins in the continental US resulting in a final dataset consisting of ~500,000 check-ins by ~20,000 users. Considering the top cells (from an initial partitioning of the entire space into a 300 x 300 grid) that have check-ins from at least 200 unique users, we partition each such cell into subgrids with a resolution of ∆ × ∆ and assign each check-in to one of these subgrids.

In the first set of experiments, we fix ∆ = 256. We test the performance of our algorithm for different values of ε (the privacy parameter, where smaller ε means stronger DP guarantees), ranging from 0.1 to 10, by running our algorithms together with the baseline and its variants on all cells, randomly sampling a set of 200 users in each trial, and then computing the distance metrics between the true heatmap and the DP heatmap. The average of these metrics is presented below. Our algorithm (the red line) performs better than all versions of the baseline across all metrics, with improvements that are especially significant when ε is not too large or small (i.e., 0.2 ≤ ε ≤ 5).

Metrics averaged over 60 runs when varying ε for the location dataset. Shaded areas indicate 95% confidence interval.

Next, we study the effect of varying the number n of users. By fixing a single cell (with > 500 users) and ε, we vary n from 50 to 500 users. As predicted by theory, our algorithms and the baseline perform better as n increases. However, the behavior of the thresholding variants of the baseline are less predictable.

We also run another experiment where we fix a single cell and ε, and vary the resolution ∆ from 64 to 256. In agreement with theory, our algorithm’s performance remains nearly constant for the entire range of ∆. However, the baseline suffers across all metrics as ∆ increases while the thresholding variants occasionally improve as ∆ increases.

Effect of the number of users and grid resolution on EMD.

We also experiment on the Salicon image saliency dataset (SALICON). This dataset is a collection of saliency annotations on the Microsoft Common Objects in Context image database. We downsized the images to a fixed resolution of 320 × 240 and each [user, image] pair consists of a sequence of coordinates in the image where the user looked. We repeat the experiments described previously on 38 randomly sampled images (with ≥ 50 users each) from SALICON. As we can see from the examples below, the heatmap obtained by our algorithm is very close to the ground truth.

Example visualization of different algorithms for two different natural images from SALICON for ε = 10 and n = 50 users. The algorithms from left to right are: original heatmap (no privacy), baseline, and ours.

Additional experimental results, including those on other datasets, metrics, privacy parameters and DP models, can be found in the paper.

Conclusion

We presented a privatization algorithm for sparse distribution aggregation under the EMD metric, which in turn yields an algorithm for producing privacy-preserving heatmaps. Our algorithm extends naturally to distributed models that can implement the Laplace mechanism, including the secure aggregation model and the shuffle model. This does not apply to the more stringent local DP model, and it remains an interesting open question to devise practical local DP heatmap/EMD aggregation algorithms for “moderate” number of users and privacy parameters.

Acknowledgments

This work was done jointly with Junfeng He, Kai Kohlhoff, Ravi Kumar, Pasin Manurangsi, and Vidhya Navalpakkam.

Read More

Financial text generation using a domain-adapted fine-tuned large language model in Amazon SageMaker JumpStart

Financial text generation using a domain-adapted fine-tuned large language model in Amazon SageMaker JumpStart

Large language models (LLMs) with billions of parameters are currently at the forefront of natural language processing (NLP). These models are shaking up the field with their incredible abilities to generate text, analyze sentiment, translate languages, and much more. With access to massive amounts of data, LLMs have the potential to revolutionize the way we interact with language. Although LLMs are capable of performing various NLP tasks, they are considered generalists and not specialists. In order to train an LLM to become an expert in a particular domain, fine-tuning is usually required.

One of the major challenges in training and deploying LLMs with billions of parameters is their size, which can make it difficult to fit them into single GPUs, the hardware commonly used for deep learning. The sheer scale of these models requires high-performance computing resources, such as specialized GPUs with large amounts of memory. Additionally, the size of these models can make them computationally expensive, which can significantly increase training and inference times.

In this post, we demonstrate how we can use Amazon SageMaker JumpStart to easily fine-tune a large language text generation model on a domain-specific dataset in the same way you would train and deploy any model on Amazon SageMaker. In particular, we show how you can fine-tune the GPT-J 6B language model for financial text generation using both the JumpStart SDK and Amazon SageMaker Studio UI on a publicly available dataset of SEC filings.

JumpStart helps you quickly and easily get started with machine learning (ML) and provides a set of solutions for the most common use cases that can be trained and deployed readily with just a few steps. All the steps in this demo are available in the accompanying notebook Fine-tuning text generation GPT-J 6B model on a domain specific dataset.

Solution overview

In the following sections, we provide a step-by-step demonstration for fine-tuning an LLM for text generation tasks via both the JumpStart Studio UI and Python SDK. In particular, we discuss the following topics:

  • An overview of the SEC filing data in the financial domain that the model is fine-tuned on
  • An overview of the LLM GPT-J 6B model we have chosen to fine-tune
  • A demonstration of two different ways we can fine-tune the LLM using JumpStart:
    • Use JumpStart programmatically with the SageMaker Python SDK
    • Access JumpStart using the Studio UI
  • An evaluation of the fine-tuned model by comparing it with the pre-trained model without fine-tuning

Fine-tuning refers to the process of taking a pre-trained language model and training it for a different but related task using specific data. This approach is also known as transfer learning, which involves transferring the knowledge learned from one task to another. LLMs like GPT-J 6B are trained on massive amounts of unlabeled data and can be fine-tuned on smaller datasets, making the model perform better in a specific domain.

As an example of how performance improves when the model is fine-tuned, consider asking it the following question:

“What drives sales growth at Amazon?”

Without fine-tuning, the response would be:

“Amazon is the world’s largest online retailer. It is also the world’s largest online marketplace. It is also the world”

With fine tuning, the response is:

“Sales growth at Amazon is driven primarily by increased customer usage, including increased selection, lower prices, and increased convenience, and increased sales by other sellers on our websites.”

The improvement from fine-tuning is evident.

We use financial text from SEC filings to fine-tune a GPT-J 6B LLM for financial applications. In the next sections, we introduce the data and the LLM that will be fine-tuned.

SEC filing dataset

SEC filings are critical for regulation and disclosure in finance. Filings notify the investor community about companies’ business conditions and the future outlook of the companies. The text in SEC filings covers the entire gamut of a company’s operations and business conditions. Because of their potential predictive value, these filings are good sources of information for investors. Although these SEC filings are publicly available to anyone, downloading parsed filings and constructing a clean dataset with added features is a time-consuming exercise. We make this possible in a few API calls in the JumpStart Industry SDK.

Using the SageMaker API, we downloaded annual reports (10-K filings; see How to Read a 10-K for more information) for a large number of companies. We select Amazon’s SEC filing reports for years 2021–2022 as the training data to fine-tune the GPT-J 6B model. In particular, we concatenate the SEC filing reports of the company in different years into a single text file except for the “Management Discussion and Analysis” section, which contains forward-looking statements by the company’s management and are used as the validation data.

The expectation is that after fine-tuning the GPT-J 6B text generation model on the financial SEC documents, the model is able to generate insightful financial related textual output, and therefore can be used to solve multiple domain-specific NLP tasks.

GPT-J 6B large language model

GPT-J 6B is an open-source, 6-billion-parameter model released by Eleuther AI. GPT-J 6B has been trained on a large corpus of text data and is capable of performing various NLP tasks such as text generation, text classification, and text summarization. Although this model is impressive on a number of NLP tasks without the need for any fine-tuning, in many cases you will need to fine-tune the model on a specific dataset and NLP tasks you are trying to solve for. Use cases include custom chatbots, idea generation, entity extraction, classification, and sentiment analysis.

Access LLMs on SageMaker

Now that we have identified the dataset and the model we are going to fine-tune on, JumpStart provides two avenues to get started using text generation fine-tuning: the SageMaker SDK and Studio.

Use JumpStart programmatically with the SageMaker SDK

We now go over an example of how you can use the SageMaker JumpStart SDK to access an LLM (GPT-J 6B) and fine-tune it on the SEC filing dataset. Upon completion of fine-tuning, we will deploy the fine-tuned model and make inference against it. All the steps in this post are available in the accompanying notebook: Fine-tuning text generation GPT-J 6B model on domain specific dataset.

In this example, JumpStart uses the SageMaker Hugging Face Deep Learning Container (DLC) and DeepSpeed library to fine-tune the model. The DeepSpeed library is designed to reduce computing power and memory use and to train large distributed models with better parallelism on existing computer hardware. It supports single node distributed training, utilizing gradient checkpointing and model parallelism to train large models on a single SageMaker training instance with multiple GPUs. With JumpStart, we integrate the DeepSpeed library with the SageMaker Hugging Face DLC for you and take care of everything under the hood. You can easily fine-tune the model on your domain-specific dataset without manually setting it up.

Fine-tune the pre-trained model on domain-specific data

To fine-tune a selected model, we need to get that model’s URI, as well as the training script and the container image used for training. To make things easy, these three inputs depend solely on the model name, version (for a list of the available models, see Built-in Algorithms with pre-trained Model Table), and the type of instance you want to train on. This is demonstrated in the following code snippet:

from sagemaker import image_uris, model_uris, script_uris, hyperparameters

model_id, model_version = "huggingface-textgeneration1-gpt-j-6b", "*"
training_instance_type = "ml.g5.12xlarge"

# Retrieve the docker image
train_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    model_id=model_id,
    model_version=model_version,
    image_scope="training",
    instance_type=training_instance_type,
)

# Retrieve the training script
train_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="training"
)

# Retrieve the pre-trained model tarball to further fine-tune
train_model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="training"
)

We retrieve the model_id corresponding to the same model we want to use. In this case, we fine-tune huggingface-textgeneration1-gpt-j-6b.

Defining hyperparameters involves setting the values for various parameters used during the training process of an ML model. These parameters can affect the model’s performance and accuracy. In the following step, we establish the hyperparameters by utilizing the default settings and specifying custom values for parameters such as epochs and learning_rate:

from sagemaker import hyperparameters

# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)

# [Optional] Override default hyperparameters with custom values
hyperparameters["epochs"] = "6"

hyperparameters["learning_rate"] = "2e-04"
print(hyperparameters)

JumpStart provides an extensive list of hyperparameters available to tune. The following list provides an overview of part of the key hyperparameters utilized in fine-tuning the model. For a full list of hyperparameters, see the notebook Fine-tuning text generation GPT-J 6B model on domain specific dataset.

  • epochs – Specifies at most how many epochs of the original dataset will be iterated.
  • learning_rate – Controls the step size or learning rate of the optimization algorithm during training.
  • eval_steps – Specifies how many steps to run before evaluating the model on the validation set during training. The validation set is a subset of the data that is not used for training, but instead is used to check the performance of the model on unseen data.
  • weight_decay – Controls the regularization strength during model training. Regularization is a technique that helps prevent the model from overfitting the training data, which can result in better performance on unseen data.
  • fp16 – Controls whether to use fp16 16-bit (mixed) precision training instead of 32-bit training.
  • evaluation_strategy – The evaluation strategy used during training.
  • gradient_accumulation_steps – The number of updates steps to accumulate the gradients for, before performing a backward/update pass.

For further details regarding hyperparameters, refer to the official Hugging Face Trainer documentation.

You can now fine-tune this JumpStart model on your own custom dataset using the SageMaker SDK. We use the SEC filing data we described earlier. The train and validation data is hosted under train_dataset_s3_path and validation_dataset_s3_path. The supported format of the data includes CSV, JSON, and TXT. For the CSV and JSON data, the text data is used from the column called text or the first column if no column called text is found. Because this is for text generation fine-tuning, no ground truth labels are required. The following code is an SDK example of how to fine-tune the model:

from sagemaker.estimator import Estimator
from sagemaker.utils import name_from_base
from sagemaker.tuner import HyperparameterTuner
from sagemaker.huggingface import HuggingFace

train_dataset_s3_path = "s3://jumpstart-cache-prod-us-west-2/training-datasets/tc/data.csv"
validation_dataset_s3_path = "s3://jumpstart-cache-prod-us-west-2/training-datasets/tc/data.csv"

training_job_name = name_from_base(f"jumpstart-example-{model_id}")

metric_definitions=[
    {'Name': 'train:loss', 'Regex': "'loss': ([0-9]+.[0-9]+)"},
    {'Name': 'eval:loss', 'Regex': "'eval_loss': ([0-9]+.[0-9]+)"},
    {'Name': 'eval:runtime', 'Regex': "'eval_runtime': ([0-9]+.[0-9]+)"},
    {'Name': 'eval:samples_per_second', 'Regex': "'eval_samples_per_second': ([0-9]+.[0-9]+)"},
    {'Name': 'eval:eval_steps_per_second', 'Regex': "'eval_steps_per_second': ([0-9]+.[0-9]+)"},
]

# # Create SageMaker Estimator instance
tg_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    hyperparameters=hyperparameters,
    output_path=s3_output_location,
    base_job_name=training_job_name,
    enable_network_isolation=True,
    metric_definitions=metric_definitions
)

# Launch a SageMaker Training job by passing s3 path of the training data
tg_estimator.fit({"train": train_dataset_s3_path, "validation": validation_dataset_s3_path}, logs=True)

After we have set up the SageMaker Estimator with the required hyperparameters, we instantiate a SageMaker estimator and call the .fit method to start fine-tuning our model, passing it the Amazon Simple Storage Service (Amazon S3) URI for our training data. As you can see, the entry_point script provided is named transfer_learning.py (the same for other tasks and models), and the input data channel passed to .fit must be named train and validation.

JumpStart also supports hyperparameter optimization with SageMaker automatic model tuning. For details, see the example notebook.

Deploy the fine-tuned model

When training is complete, you can deploy your fine-tuned model. To do so, all we need to obtain is the inference script URI (the code that determines how the model is used for inference once deployed) and the inference container image URI, which includes an appropriate model server to host the model we chose. See the following code:

from sagemaker.predictor import Predictor
from sagemaker import image_uris
from sagemaker.utils import name_from_base
import boto3

sagemaker_session = sagemaker.Session(boto_session=boto3.Session(region_name="us-west-2"))

#Retrieve the inference docker container uri
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)
    
endpoint_name = name_from_base(f"jumpstart-example-{model_id}")

# Use the estimator from the previous step to deploy to a SageMaker endpoint
finetuned_predictor = tg_estimator.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.12xlarge",
    image_uri=image_uri,
    endpoint_name=endpoint_name,
)

After a few minutes, our model is deployed and we can get predictions from it in real time!

Access JumpStart through the Studio UI

Another way to fine-tune and deploy JumpStart models is through the Studio UI. This UI provides a low-code/no-code solution to fine-tuning LLMs.

On the Studio console, choose Models, notebooks, solutions under SageMaker JumpStart in the navigation pane.

In the search bar, search for the model you want to fine-tune and deploy.

In our case, we chose the GPT-J 6B model card. Here we can directly fine-tune or deploy the LLM.

Model evaluation

When evaluating an LLM, we can use perplexity (PPL). PPL is a common measure of how well a language model is able to predict the next word in a sequence. In simpler terms, it’s a way to measure how well the model can understand and generate human-like language.

A lower perplexity score means that the model is shown to perform better at predicting the next word. In practical terms, we can use perplexity to compare different language models and determine which one performs better on a given task. We can also use it to track the performance of a single model over time. For more details, refer to Perplexity of fixed-length models.

We evaluate the model’s performance through a comparison of its pre- and post-fine-tuning performance. PPL is emitted in the training job’s Amazon CloudWatch logs. In addition, we look at the output generated by the model in response to specific test prompts.

Evaluation metric on the validation data Before fine-tuning After fine-tuning
Perplexity (PPL) 8.147 1.437

The model’s enhanced performance is demonstrated by two experiments:

  • Text generation – We show that the output the fine-tuned GPT-J generates text that has a resemblance to the SEC dataset, indicating that it has learned to emulate the language patterns and conventions of the financial regulatory domain
  • Zero-shot learning – We demonstrate that the fine-tuned GPT-J shows significantly better performance than the non-fine-tuned GPT-J on zero-shot question answering

For the text generation experiment, we use the following inference parameters to achieve our results: max_length of 400 (number of tokens returned), num_return_sequences of 1 (number of response text examples), top_k of 250 (pick from one of the top 250 possible continuations), top_p of 0.8 (select only from continuations that are at least 80% probable), do_sample is True (sampling), and temperature of 1 (high temperature is greater randomness in responses). For more details about the parameters, refer to GenerationConfig. The following table compares the fine-tuned GPT-J and non-fine-tuned one on a text completion task.

Inference example Output from GPT-J 6B Before Fine-Tuning Output from GPT-J 6B After Fine-Tuning
This Form 10-K report shows that

This Form 10-K report shows that:

The Companys net income attributable to the Company for the year ended December 31, 2016 was $3,923,000, or $0.21 per diluted share, compared to $3,818,000, or $0.21 per diluted share, for the year ended December 31, 2015.

The Companys net income attributable to the Company for the year ended December 31, 2016 was $4,816,000, or $0.28 per diluted share, compared to $4,726,000, or $0.27 per diluted share, for the year ended December 31, 2015.

The Companys net income attributable to the Company for the year ended December 31, 2016 was $5,828,000, or $0.34 per diluted share, compared to $5,716,000, or $0.33 per diluted share, for the year ended December 31, 2015.

On August 21, 2009, the Company filed a Form 10-Q for the quarter ended December 31, 2008.

On August 21, 2009, the Company filed a Form 10-Q for the quarter ended September 30, 2008.

On August 21, 2009, the Company filed a Form 10-Q for the quarter ended March 31, 2009.

This Form 10-K report shows that:(1) The registrant’s name, address, and telephone number, are as follows:Amazon.com, Inc. c/o Ernst & Young LLPSeattle, Washington Attention: Corporate Secretary (2) For the purpose of Article III of the Securities Exchange Act of 1934, the registrant’s name and address are as follows:(3) The registrant’s Exchange Act reportable time period is from and includingJanuary 1, 2020 to the present.(4) The registrant’s subjective assessment is that the financial statements, and other financial information included in this report, fairlypresent in all material respects the financial condition, results of operations and cashflows of the registrant as of, and for, the periods presented in this report.(5) The registrant’s management’s assessment is that the financial statements, and other financial information included in this report, fairlypresent in all material respects the financial condition, results of operations and cashflows of the registrant as of, and for, the periods presented in this report./s/ Ernst & Young LLPSeattle, WashingtonJanuary 29, 2020EX-31.1 5 amzn-20201231xex311.htm EX-31.1 DocumentExhibit 31.1CERTIFICATIONSI, Andrew R. Jassy, certify that:1. I have reviewed this Form 10-K of Amazon.com, Inc.;2. Based on my knowledge, this report does not contain any untrue statementof a material fact or omit to state a material fact necessary to make thestatements made, in light of the circumstances under which such statementswere made, not misleading with respect to the period covered by this report;
We serve consumers through

We serve consumers through:

Our Blog

Are you a business owner looking to grow your business? Do you want to create a more effective and efficient business? Do you want to learn how to grow your business? If you answered “yes” to any of these questions, then you need to read this article.

The first step to growing your business is to understand your business. You must be able to answer the following questions about your business:

Who are your customers?

What is your product or service?

What is your unique selling proposition?

What is your marketing strategy?

What is your sales forecast?

What is your sales forecast?

What is your sales forecast?

What is your sales forecast?

What is your sales forecast?

We serve consumers through: Amazon.com, a retailer of physical and digital products; Amazon Web Services, which provides technology infrastructure services that enable virtually any type of business; Amazon Enterprise Solutions, which provides companies with custom technology and services to enable better business processes and improve efficiency and productivity; and Audible, a digital audiobook publisher. We serve developers and enterprises of all sizes through AWS, which offers a broad set of global compute, storage, database, and other service offerings. We also manufacture and sell electronic devices. We have organized our operations into two principal segments: North America and International. See Item 8 of Part II, “”Financial Statements and Supplementary Data-Note 12-Segment Information.”” See Item 7 of Part II, “”Management’s Discussion and Analysis of Financial Condition and Results ofOperations-Results of Operations-Supplemental Information”” for supplemental informationabout our net sales.ConsumersWe serve consumers through our online and physical stores and focus on selection,price, and convenience. We design our stores to enable hundreds of millions ofunique products to be sold by us and by third parties across dozens of productcategories. Customers access our offerings through our websites, mobile apps,Alexa, and physically visiting our stores. We also manufacture and sell Kindledevices. We strive to offer our customers the lowest prices possible throughlow everyday product pricing and shipping offers, and to improve ouroperating efficiencies so that we can continue to lower prices for ourcustomers. We also provide easy-to-use functionality, fast and reliablefulfillment, and timely customer service.In addition, we offer Amazon Prime, a membership program that includes unlimitedfree shipping on over 100 million items, access to unlimited streaming of tens ofthousands of movies and TV episodes, and other benefits.
This year we particularly focus on

This year we particularly focus on:

The role of the health care system in the prevention of obesity and diabetes

The role of the health care system in the prevention of obesity and diabetes

The role of the health care system in the prevention of obesity and diabetes

The role of the health care system in the prevention of obesity and diabetes

The role of the health care system in the prevention of obesity and diabetes

The role of the health care system in the prevention of obesity and diabetes

The role of the health care system in the prevention of obesity and diabetes

This year we particularly focus on: -Continued international expansion, including the development of newbusinesses in existing international market segments, as well as thedevelopment of new international market segments; -Expanding our customer base, including increasing sales to existingcustomers and attracting new customers; -Developing new business opportunities and entering into commercialagreements, including through licensing and other arrangements, with othercompanies, or through the acquisition of other companies; -Developing technology that provides for enhanced customer service andefficiently delivers our products and services; -Developing new product and service offerings and incorporating these intoexisting and new product and service offerings; -Developing efficient and effective systems for supporting our business;and -Other business and strategic initiatives. We believe that offering low prices to our customers is fundamental to ourfuture success. One way we offer lower prices is through free-shipping offersthat result in a net cost to us in delivering products, and through membershipin Amazon Prime, which provides free-shipping on millions of items andaccess to movies and other content. We also offer other promotions that enableus to turn a lower net cost of sales.We have organized our operations into two principal segments: North Americaand International. See Item 8 of Part II, “”Financial Statements andSupplementary Data-Note 12-Segment Information.”” See Item 7 of Part II,””Management’s Discussion and Analysis of Financial Condition and Results ofOperations-Results of Operations-Supplemental Information”” for supplementalinformation about our net sales.ConsumersWe serve consumers through our retail websites and physical stores and focuson selection, price, and convenience.

For the second experiment with a zero-shot question answering task, we set the max_length to be a different value (40, 200, and 40, respectively) based on the output we expect from each question. The parameters of top_k, top_p, do_sample, temperature are set as 250, 0.8, False, and 0, respectively. We set temperature close to 0 to control the randomness of the output, whereas in the text generation task we set it close to 1 to make the model more creative. The following table compares the results. For each question, the output from fine-tuned GPT-J is more accurate than the non-fine-tuned one.

Input Output from GPT-J 6B Before Fine-Tuning Output from GPT-J 6B After Fine-Tuning
What is Item 7 of the 10-K SEC filing about?:

What is Item 7 of the 10-K SEC filing about?:

Item 7 of the 10-K SEC filing is the following:

“The Company has not received any material adverse

Item 7. Management’s Discussion and Analysis of Financial Condition and Results ofOperations Forward-Looking Statements
How did the COVID-19 pandemic impact Amazon’s business?:

How did the COVID-19 pandemic impact Amazon’s business?:

Amazon’s business is being impacted by the COVID-19 pandemic in a number of ways.

The company is seeing a decline in the number of Prime memberships, which is a big part of its business.

The company is also seeing a decline in the number of Prime memberships.

Amazon is also seeing a decline in the number of orders placed on its website.

The company is also seeing a decline in the number of people who are shopping on its website.

Amazon is also seeing a decline in the number of people who are shopping on its website.

Amazon is also seeing a decline in the number of people who are shopping on its website.

Amazon is also seeing a decline in the number of people who are shopping on its website.

Amazon is also seeing a decline in the number of people who are shopping

How did the COVID-19 pandemic impact Amazon’s business?:The COVID-19 pandemic has impacted our business in several ways. First, it has disrupted our supply chain, and it has affected customer demand, which in turn has affected our sales and operating results. Second, it has increased our spending on advertising and marketing, which may not be effective in the long run. Third, it has increased our spending on technology infrastructure and computing capacity, which may not be effective in the long run. Fourth, it has increased our spending on fulfillment and customer service, which may not be effective in the long run. Finally, it has increased our spending on content, which may not be effective in the long run. See Item 8 of Part II, “Financial Statements and Supplementary Data — Note 1 — Description of Business and Accounting Policies.
What drives sales growth at Amazon?: Amazon is the world’s largest online retailer. It is also the world’s largest online marketplace. It is also the world’ Sales growth at Amazon is driven primarily by increased customer usage, including increased selection, lower prices, and increased convenience, and increased sales by other sellers on our websites.

Clean up

To avoid ongoing charges, delete the SageMaker inference endpoints. You can delete the endpoints via the SageMaker console or from the notebook using the following commands:

# Delete the SageMaker endpoint and the attached resources
finetuned_predictor.delete_model()
finetuned_predictor.delete_endpoint()

Conclusion

JumpStart is a capability in SageMaker that allows you to quickly get started with ML. JumpStart uses open-source, pre-trained models to solve common ML problems like image classification, object detection, text classification, sentence pair classification, and question answering.

In this post, we showed you how to fine-tune and deploy a pre-trained LLM (GPT-J 6B) for text generation based on the SEC filling dataset. We demonstrated how the model transformed into a finance domain expert by undergoing the fine-tuning process on just two annual reports of the company. This fine-tuning enabled the model to generate content with an understanding of financial topics and greater precision. Try out the solution on your own and let us know how it goes in the comments.

Important: This post is for demonstrative purposes only. It is not financial advice and should not be relied on as financial or investment advice. The post used models pre-trained on data obtained from the SEC EDGAR database. You are responsible for complying with EDGAR’s access terms and conditions if you use SEC data.

To learn more about JumpStart, check out the following posts:


About the Authors

Dr. Xin Huang is a Senior Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A.

Marc Karp is an ML Architect with the Amazon SageMaker Service team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.

Dr. Sanjiv Das is an Amazon Scholar and the Terry Professor of Finance and Data Science at Santa Clara University. He holds post-graduate degrees in Finance (M.Phil and PhD from New York University) and Computer Science (MS from UC Berkeley), and an MBA from the Indian Institute of Management, Ahmedabad. Prior to being an academic, he worked in the derivatives business in the Asia-Pacific region as a Vice President at Citibank. He works on multimodal machine learning in the area of financial applications.

Arun Kumar Lokanatha is a Senior ML Solutions Architect with the Amazon SageMaker Service team. He focuses on helping customers build, train, and migrate ML production workloads to SageMaker at scale. He specializes in deep learning, especially in the area of NLP and CV. Outside of work, he enjoys running and hiking.

Read More

Announcing the updated Microsoft OneDrive connector (V2) for Amazon Kendra

Announcing the updated Microsoft OneDrive connector (V2) for Amazon Kendra

Amazon Kendra is an intelligent search service powered by machine learning (ML), enabling organizations to provide relevant information to customers and employees, when they need it.

Amazon Kendra uses ML algorithms to enable users to use natural language queries to search for information scattered across multiple data souces in an enterprise, including commonly used document storage systems like Microsoft OneDrive.

OneDrive is an online cloud storage service that allows you to host your content and have it automatically sync across multiple devices. Amazon Kendra can index document formats like Microsoft OneNote, HTML, PDF, Microsoft Word, Microsoft PowerPoint, Microsoft Excel, Rich Text, JSON, XML, CSV, XSLT, and plain text.

We’re excited to announce that we have updated the OneDrive connector for Amazon Kendra to add even more capabilities. For example, we have added support to search OneNote documents. Additionally, you can now choose to use identity or ACL information to make your searches more granular.

The connector helps to index documents and their access control information to limit the search results to only those documents the user is allowed to access. To show the search results based on user access rights and using only the user information, the connector provides an identity crawler to load principal information, such as user and group mappings into a principal store.

In this post, we demonstrate how to configure multiple data sources in Amazon Kendra to provide a central place to search across your document repository.

Solution overview

For our solution, we demonstrate how to index a OneDrive repository or folder using the Amazon Kendra connector for OneDrive. The solution consists of the following steps:

  1. Create and configure an app on Microsoft Azure Portal and get the authentication credentials.
  2. Create a OneDrive data source via the Amazon Kendra console.
  3. Index the data in the OneDrive repository.
  4. Run a sample query to get the information.
  5. Filter the query by users or groups.

Prerequisites

To try out the Amazon Kendra connector for OneDrive, you need the following:

Configure an Azure application and assign connection permissions

Before we set up the OneDrive data source, we need a few details about the OneDrive repository. Complete the following steps:

  1. Log in to Azure.
  2. After logging in with your account credentials, choose App registrations, then choose New registration.
  3. Give an appropriate name to your application and register the application.
  4. Collect the information about the client ID, tenant ID, and other details of the application.
  5. To get a client secret, choose Add a certificate or secret under Client credentials.
  6. Choose New client secret and provide the proper description and expiry.
  7. Note the client-id, tenant-id, and secret-id values. We use these for authenticating the OAuth2 application.
  8. Navigate to App, choose API permissions in the navigation pane, and choose Add a permission.
  9. Choose Microsoft Graph.
  10. Under Application permissions, enter File in the search bar and under Files, select Files.Read.All.
  11. Choose Add permissions
  12. Similarly, add the following permissions on the Microsoft Graph option for the application you created:
    1. Group.Read.All
    2. Notes.Read.All

On completion, the API permissions will look like the following screenshot.

Configure the Amazon Kendra connector for OneDrive

To configure the Amazon Kendra connector, complete the following steps:

  1. On the Amazon Kendra console, choose Create an Index.
  2. For Index name, enter a name for the index (for example, my-onedrive-index).
  3. Enter an optional description.
  4. Choose Create a new role.
  5. For Role name, enter an IAM role name.
  6. Configure optional encryption settings and tags
  7. Choose Next
  8. In the Configure user access control section, select Yes under Access control settings.
  9.  For Token type, choose JSON on the drop-down menu.
  10. Leave the remaining values as their default values.
  11. Choose Next

Before we move to the next configuration step, we need to provide Amazon Kendra with a role that has the permissions necessary for connecting to the site. These include permission to get and decrypt the AWS Secrets Manager secret that contains the application ID and secret key necessary to connect to the OneDrive site.

  1. Open another tab for the AWS account, and on the IAM console, navigate to the role that you created earlier (for example, AmazonKendra-us-west-2-onedrive).
  2. Choose Add permissions and Create inline policy.
  3. For Service, choose Kendra.
  4. For Actions¸choose Write and specify BatchPutDocument.
  5. For Resources, choose All resources.
  6. Choose Review policy.
  7. For Name, enter a name (for example, BatchPutPolicy).
  8. Choose Create policy.
  9. Add this policy to the role you created.
  10. Additionally, attach the SecretsManagerReadWrite AWS managed policy to the role
  11. Return to the Amazon Kendra tab.
  12. Select Developer edition and choose Create.

This creates and propagates the IAM role and then creates the Amazon Kendra index, which can take up to 30 minutes.

  1. Return to the Amazon Kendra console, choose Data sources in the navigation pane, and choose Add data source.
  2. Under OneDrive connector V2.0, choose Add connector.
  3. For Data source name, enter a name (for example, my-onedrive).
  4. Enter an optional description.
  5. Choose Next.
  6. For OneDrive Tenant ID, enter the tenant ID you gathered earlier.
  7. For Configure VPC and security group, leave the default (No VPC).
  8. Keep Identity crawler is on selected. This imports identity information into the index.
  9. For IAM role, choose Create a new role.
  10. Enter a role name, such as AmazonKendra-us-west-2-onedrive, then choose Next.
  11. In the Authentication section, choose Create and add a secret.
  12. Create a secret with clientId and clientSecret as keys.
  13. Add their respective values with the information you collected earlier.
  14. Choose Next.
  15. In the Configure sync settings section, add the OneDrive users whose documents you want to index.
  16. Select the sync mode for the index. For this post, we select New, modified or deleted content sync.
  17. Choose the frequency of indexing as Run on demand, then choose Next.

Field mappings enable allow you to set the searchability and relevance of fields. For example, the lastUpdatedAt field can sort or boost the ranking of the documents based on how recently it was updated.

  1. Keep all the defaults in the Set field mappings section and choose Next.
  2. On the review page, choose Add data source

  3. Choose Sync now

The sync can take up to 30 minutes to complete.

Test the solution

Now that you have indexed the content from OneDrive, you can test it by querying the index.

  1. Go to your index on the Amazon Kendra console and choose Search indexed content in the navigation pane.
  2. Enter a search term and press Enter.

Notice that without a token, the ACLs prevent a search result from being returned.

  1. Expand Test query with an access token and choose Apply token.
  2. Enter the appropriate token with a user who has permissions to read the file and choose Apply.
  3. Search for information present in OneDrive again.

You can verify that Amazon Kendra presents the ranked results as expected.

Congratulations, you have configured Amazon Kendra to index and search documents in OneDrive and control access to them using ACL.

Conclusion

With the Microsoft OneDrive V2 connector for Amazon Kendra, organizations can tap into commonly used enterprise document stores, securely using intelligent search powered by Amazon Kendra. You can enhance the search experience by integrating the data source with the Custom Document Enrichment (CDE) capability in Amazon Kendra to perform additional attribute mapping logic and even custom content transformation during ingestion.


About the authors

Pravinchandra Varma is a Senior Customer Delivery Architect with the AWS Professional Services team and is passionate about applications of machine learning and artificial intelligence services.

Supratim Barat is a Software Developer Engineer with AWS Kendra Yellowbadge Team and is a blockchain and cyber security enthusiast

Read More

How RallyPoint and AWS are personalizing job recommendations to help military veterans and service providers transition back into civilian life using Amazon Personalize

How RallyPoint and AWS are personalizing job recommendations to help military veterans and service providers transition back into civilian life using Amazon Personalize

This post was co-written with Dave Gowel, CEO of RallyPoint. In his own words,RallyPoint is an online social and professional network for veterans, service members, family members, caregivers, and other civilian supporters of the US armed forces. With two million members on the platform, the company provides a comfortable place for this deserving population to connect with each other and programs designed to support them.”

All those who serve – and those who support them – often face a variety of employment challenges when a servicemember transitions back into civilian life. RallyPoint has identified the transition period to a civilian career as a major opportunity to improve the quality of life for this population by creating automated and compelling job recommendations. However, the team historically employed a rule-based curation method to recommend jobs throughout its user experience, which doesn’t allow members to get job recommendations personalized to their individual experience, expertise, and interests.

“To improve this experience for its members, we at RallyPoint wanted to explore how machine learning (ML) could help. We don’t want our servicemembers, veterans, and their loved ones to waste time searching for a fulfilling civilian career path when they decide to leave the military. It should be an easy process. We want our members to tell us about their military experiences, any schools they’ve attended, and their personal preferences. Then by leveraging what we know from our millions of military and veteran members, relevant open jobs should be easily surfaced instead of laboriously searched. This free service for our members is also expected to drive revenue by at least seven figures from employers seeking the right military and veteran talent, allowing us to build more free capabilities for our members.”

This blog post summarizes how the Amazon Machine Learning Solution Lab (MLSL) partnered with RallyPoint to drive a 35% improvement in personalized career recommendations and a 66x increase in coverage, amongst other improvements for RallyPoint members from the current rule-based implementation.

“MLSL helped RallyPoint save and improve the lives of the US military community. Fortunate to work on multiple complex and impactful projects with MLSL to support the most deserving of populations, RallyPoint accelerated growth in multiple core organizational metrics in the process. MLSL’s high caliber talent, culture, and focus on aiding our realization of measurable and compelling results from machine learning investments enabled us to reduce suicide risk, improve career transition, and speed up important connections for our service members, veterans, and their families.”

Screenshot of the RallyPoint Website

*Photo provided by the RallyPoint team.

The following sections cover the business and technical challenges, the approach taken by the AWS and RallyPoint teams, and the performance of implemented solution that leverages Amazon Personalize.

Amazon Personalize makes it easy for developers to build applications capable of delivering a wide array of personalization experiences, including specific product recommendations, personalized product re-ranking, and customized direct marketing. Amazon Personalize is a fully managed ML service that goes beyond rigid, static rule-based recommendation systems by training, tuning, and deploying custom ML models to deliver highly customized recommendations to customers across industries such as retail and media and entertainment.

Business and Technical challenges

Multiple business challenges inspired this partnership. The most pertinent was the clickthrough rate on the top 10 recommended jobs on the RallyPoint website. RallyPoint analyzed user engagement within their platform and discovered that they needed to increase the number of relevant jobs that users are clicking. The idea is that the more relevant a recommended job is, the higher the likelihood of members applying to those jobs, leading to improved employment outcomes.

The next challenge was to increase the engagement by members on job services offered on the site. RallyPoint offers the opportunity for people to “Build your brand and engage the military community, advertise your products and services, run recruitment marketing campaigns, post jobs, and search veteran talent.” They once again identified an opportunity to apply AWS Personalize to help more people transition to civilian life, and sought to improve their click-to-customer conversion numbers, leading to better outcomes for RallyPoint’s direct customers.

From a technical perspective, like many traditional recommender system problems, data sparsity and a long tail was a challenge to overcome. The sample set of de-identified, already publicly shared data included thousands of anonymized user profiles, with more than fifty user-metadata points, but many had inconsistent or missing meta-data/profile information. To tackle this, the team leveraged the Amazon Personalize cold start recommendation functionality for relevant users.

Solution overview

To solve the problem, MLSL collaborated with RallyPoint to construct a custom Amazon Personalize pipeline for RallyPoint. Some of the services used include Amazon Simple Storage Service (Amazon S3), Amazon SageMaker Notebook Instances, and Amazon Personalize. The following diagram illustrates the solution architecture.

The anonymized raw data used for the solution consisted of a history of interactions with job postings along with metadata on user profiles and job positions. This was stored in S3. The MLSL team used Amazon SageMaker Notebook Instances to prepare data as input to Amazon Personalize. This step included data preprocessing, feature engineering, and creating dataset groups and schemas required for Amazon Personalize. For more information refer to Creating a Custom dataset group.

The next step was to create a solution in Amazon Personalize. A solution refers to the combination of an Amazon Personalize recipe, customized parameters, and one or more solution versions. For more information refer to Creating a solution. The team used the User-Personalization recipe to generate user-specific job recommendations for users in a validation set. The Amazon Personalize outputs, including the job recommendations and performance metrics, are stored in an Amazon S3 bucket for further analysis.

In the final step, the team used a notebook instance to prepare the output recommendations for external evaluation by human annotators, as described in the Using Domain Experts section.

Evaluation of Amazon Personalize results

The performance of an Amazon Personalize solution version can be evaluated using offline metrics, online metrics, and A/B testing. Offline metrics allow you to view the effects of modifying hyperparameters and algorithms used to train your models, calculated against historical data. Online metrics are the empirical results observed in your user’s interactions with real-time recommendations provided in a live environment (such as clickthrough rate). A/B testing is an online method of comparing the performance of multiple solution versions to a default solution. Users are randomly assigned to either the control (default) group or one of the treatment (test) groups. The control group users receive recommendations from the default solution (baseline), whereas each of the treatment groups interact with a different solution version. Statistical significance tests are used to compare the performance metrics (such as clickthrough rate or latency) and business metrics (such as revenue) to that of the default solution.

Amazon Personalize measures offline metrics during training a solution version. The team used offline metrics such as Mean Reciprocal Rank (MRR), normalized discounted cumulative gain (NCDG@k), Precision@k, and Coverage. For the definitions of all available offline metrics, refer to Metric definitions.

Although Amazon Personalize provides an extensive list of offline metrics that the team can use to objectively measure the performance of solutions during training, online metrics and A/B testing are recommended to track and validate model performance. One caveat to these tests is that they require users to interact with Amazon Personalize recommendations in real time. Because the RallyPoint Amazon Personalize model wasn’t deployed prior to this publication, the team didn’t have results to report for these tests.

Using Domain Experts

A/B testing is the preferred method of analyzing the quality of a recommendation system however, using domain experts to annotate recommendations is a viable precursor. Since online testing was not an option, to test the robustness of the recommendations, the team asked domain experts in RallyPoint to annotate the recommendations generated by the models and count the number of job positions the experts agreed should be recommended (given a user’s information and indicated preferences) as the number of “correct” recommendations. This metric was used to compare solution versions. A popularity solution (the current rule-based criteria) was used as a baseline which consisted of recommending top five most popular job positions to every user. Moreover, a solution with default settings was used as another baseline model called Amazon Personalize baseline solution.

Results

Using the best performing model resulted in a 35% improvement in the number of “correct” recommendations over the Amazon Personalize baseline solution and a 54% improvement over the popularity solution. The team could also achieve a 66x improvement in coverage, 30x improvement in MRR, and 2x improvement in precision@10 when compared to the popularity solution. In addition to the popularity solution, the team observed up to 2x increase in MRR and precision@10 when compared to the Amazon Personalize baseline solution.

Summary

RallyPoint recognized an opportunity to better serve their customers with more personalized career recommendations. They began their user personalization journey with customer obsession in mind, partnering with the Machine Learning Solutions Lab. RallyPoint now has the opportunity to give their users more valuable career recommendations, through this solution. Incorporating this improved recommendation system into their website will result in RallyPoint users seeing more relevant jobs in their career feed, easing the path to more fulfilling careers and an improved quality of life for their members.

Use Amazon Personalize to provide an individualized experience for your users today! If you’d like to collaborate with experts to bring ML solutions to your organization, contact the Amazon ML Solutions Lab.

Additional resources

For more information about Amazon Personalize, see the following:


About the Authors

Dave Gowel is an Army veteran and the CEO of RallyPoint. Dave is a graduate of West Point and the US Army Ranger School, served in Iraq as a tank platoon leader, and taught as an assistant professor at the Massachusetts Institute of Technology ROTC program. RallyPoint is the third technology company for which Dave has been CEO.

Matthew Rhodes is a Data Scientist working in the Amazon ML Solutions Lab. He specializes in building machine learning pipelines that involve concepts such as natural language processing and computer vision.

Amin Tajgardoon is an Applied Scientist at the Amazon ML Solutions Lab. He has an extensive background in computer science and machine learning. In particular, Amin’s focus has been on deep learning and forecasting, prediction explanation methods, model drift detection, probabilistic generative models, and applications of AI in the healthcare domain.

Yash Shah is a Science Manager in the Amazon ML Solutions Lab. He and his team of applied scientists and machine learning engineers work on a range of machine learning use cases from healthcare, sports, automotive and manufacturing.

Vamshi Krishna Enabothala is a Sr. Applied AI Specialist Architect at AWS. He works with customers from different sectors to accelerate high-impact data, analytics, and machine learning initiatives. He is passionate about recommendation systems, NLP, and computer vision areas in AI and ML. Outside of work, Vamshi is an RC enthusiast, building RC equipment (planes, cars, and drones), and also enjoys gardening.

Greg Tolmie is an Account Manager on the AWS Public Sector ISV partners team. Greg supports a portfolio of AWS public sector ISV partners to help them grow and mature their adoption of AWS services while maximizing benefits of the AWS partner network.

Read More

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

Generate actionable insights for predictive maintenance management with Amazon Monitron and Amazon Kinesis

Reliability managers and technicians in industrial environments such as manufacturing production lines, warehouses, and industrial plants are keen to improve equipment health and uptime to maximize product output and quality. Machine and process failures are often addressed by reactive activity after incidents happen or by costly preventive maintenance, where you run the risk of over-maintaining the equipment or missing issues that could happen between the periodic maintenance cycles. Predictive condition-based maintenance is a proactive strategy that is better than reactive or preventive ones. Indeed, this approach combines continuous monitoring, predictive analytics, and just-in-time action. This enables maintenance and reliability teams to service equipment only when necessary, based on the actual equipment condition.

There have been common challenges with condition-based monitoring to generate actionable insights for large industrial asset fleets. These challenges include but are not limited to: build and maintain a complex infrastructure of sensors collecting data from the field, obtain a reliable high-level summary of industrial asset fleets, efficiently manage failure alerts, identify possible root causes of anomalies, and effectively visualize the state of industrial assets at scale.

Amazon Monitron is an end-to-end condition monitoring solution that enables you to start monitoring equipment health with the aid of machine learning (ML) in minutes, so you can implement predictive maintenance and reduce unplanned downtime. It includes sensor devices to capture vibration and temperature data, a gateway device to securely transfer data to the AWS Cloud, the Amazon Monitron service that analyzes the data for anomalies with ML, and a companion mobile app to track potential failures in your machinery. Your field engineers and operators can directly use the app to diagnose and plan maintenance for industrial assets.

From the operational technology (OT) team standpoint, using the Amazon Monitron data also opens up new ways to improve how they operate large industrial asset fleets thanks to AI. OT teams can reinforce the predictive maintenance practice from their organization by building a consolidated view across multiple hierarchies (assets, sites, and plants). They can combine actual measurement and ML inference results with unacknowledged alarms, sensors or getaways connectivity status, or asset state transitions to build a high-level summary for the scope (asset, site, project) they are focused on.

With the recently launched Amazon Monitron Kinesis data export v2 feature, your OT team can stream incoming measurement data and inference results from Amazon Monitron via Amazon Kinesis to AWS Simple Storage Service (Amazon S3) to build an Internet of Things (IoT) data lake. By leveraging the latest data export schema, you can obtain sensors connectivity status, gateways connectivity status, measurement classification results, closure reason code and details of asset state transition events.

Use cases overview

The enriched data stream Amazon Monitron now exposes enables you to implement several key use cases such as automated work order creation, enriching an operational single pane of glass or automating failure reporting. Let’s dive into these use cases.

You can use the Amazon Monitron Kinesis data export v2 to create work orders in Enterprise Asset Management (EAM) systems such as Infor EAM, SAP Asset Management, or IBM Maximo. For example, in the video avoiding mechanical issues with predictive maintenance & Amazon Monitron, you can discover how our Amazon Fulfillment Centers are avoiding mechanical issues on conveyor belts with Amazon Monitron sensors integrated with third-party software such as the EAM used at Amazon as well as with the chat rooms technicians used. This shows how you can naturally integrate Amazon Monitron insights into your existing workflows. Stay tuned in the coming months to read the next installment of this series with an actual implementation of this integration works.

You can also use the data stream to ingest Amazon Monitron insights back into a shop floor system such as a Supervisory Control and Data Acquisition (SCADA) or a Historian. Shop floor operators are more efficient when all the insights about their assets and processes are provided in a single pane of glass. In this concept, Amazon Monitron doesn’t become yet another tool technicians have to monitor, but another data source with insights provided in the single view they are already used to. Later this year, we will also describe an architecture you can use to perform this task and send Amazon Monitron feedback to major third-party SCADA systems and Historians.

Last but not least, the new data stream from Amazon Monitron includes the asset state transitions and closure codes provided by users when acknowledging alarms (which trigger the transition to a new state). Thanks to this data, you can automatically build visualizations that provide real-time reporting of the failures and actions taken while operating their assets.

Your team can then build a broader data analytics dashboard to support your industrial fleet management practice by combining this asset state data with Amazon Monitron measurement data and other IoT data across large industrial asset fleets by using key AWS services, which we describe in this post. We explain how to build an IoT data lake, the workflow to produce and consume the data, as well as a summary dashboard to visualize Amazon Monitron sensors data and inference results. We use an Amazon Monitron dataset coming from about 780 sensors installed in an industrial warehouse, which has been running for more than 1 year. For the detailed Amazon Monitron installation guide, refer to Getting started with Amazon Monitron.

Solution overview

Amazon Monitron provides ML inference of asset health status after 21 days of the ML model training period for each asset. In this solution, the measurement data and ML inference from these sensors are exported to Amazon S3 via Amazon Kinesis Data Streams by using the latest Amazon Monitron data export feature. As soon as Amazon Monitron IoT data is available in Amazon S3, a database and table are created in Amazon Athena by using an AWS Glue crawler. You can query Amazon Monitron data via AWS Glue tables with Athena, and visualize the measurement data and ML inference with Amazon Managed Grafana. With Amazon Managed Grafana, you can create, explore, and share observability dashboards with your team, and spend less time managing your Grafana infrastructure. In this post, you connect Amazon Managed Grafana to Athena, and learn how to build a data analytics dashboard with Amazon Monitron data to help you plan industrial asset operations at scale.

The following screenshot is an example of what you can achieve at the end of this post. This dashboard is divided into three sections:

  • Plant View – Analytical information from all sensors across plants; for example, the overall counts of various states of sensors (Healthy, Warning, or Alarm), number of unacknowledged and acknowledged alarms, gateway connectivity, and average time for maintenance
  • Site View – Site-level statistics, such as asset status statistics at each site, total number of days that an alarm remains unacknowledged, top/bottom performing assets at each site, and more
  • Asset View – Summary information for the Amazon Monitron project at the asset level, such as the alarm type for an unacknowledged alarm (ISO or ML), the timeline for an alarm, and more

These panels are examples that can help strategic operational planning, but they are not exclusive. You can use a similar workflow to customize the dashboard according to your targeted KPI.



Architecture overview

The solution you will build in this post combines Amazon Monitron, Kinesis Data Streams, Amazon Kinesis Data Firehose, Amazon S3, AWS Glue, Athena, and Amazon Managed Grafana.

The following diagram illustrates the solution architecture. Amazon Monitron sensors measure and detect anomalies from equipment. Both measurement data and ML inference outputs are exported at a frequency of once per hour to a Kinesis data stream, and they are delivered to Amazon S3 via Kinesis Data Firehose with a 1-minute buffer. The exported Amazon Monitron data is in JSON format. An AWS Glue crawler analyzes the Amazon Monitron data in Amazon S3 at a chosen frequency of once per hour, builds a metadata schema, and creates tables in Athena. Finally, Amazon Managed Grafana uses Athena to query the Amazon S3 data, allowing dashboards to be built to visualize both measurement data and device health status.

To build this solution, you complete the following high-level steps:

  1. Enable a Kinesis Data Stream export from Amazon Monitron and create a data stream.
  2. Configure Kinesis Data Firehose to deliver data from the data stream to an S3 bucket.
  3. Build the AWS Glue crawler to create a table of Amazon S3 data in Athena.
  4. Create a dashboard of Amazon Monitron devices with Amazon Managed Grafana.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Additionally, make sure that all the resources you deploy are in the same Region.

Enable a Kinesis data stream export from Amazon Monitron and create a data stream

To configure your data stream export, complete the following steps:

  1. On the Amazon Monitron console, from your project’s main page, choose Start live data export.
  2. Under Select Amazon Kinesis data stream, choose Create a new data stream.
  3. Under Data stream configuration, enter your data stream name.
  4. For Data stream capacity, choose On-demand.
  5. Choose Create data stream.

Note that any live data export enabled after April 4th, 2023 will stream data following the Kinesis Data Streams v2 schema. If you have an existing data export that was enabled before this date, the schema will follow the v1 format.

You can now see live data export information on the Amazon Monitron console with your specified Kinesis data stream.

Configure Kinesis Data Firehose to deliver data to an S3 bucket

To configure your Firehose delivery stream, complete the following steps:

  1. On the Kinesis console, choose Delivery streams in the navigation pane.
  2. Choose Create delivery stream.
  3. For Source, select Amazon Kinesis Data Streams.
  4. For Destination, select Amazon S3.
  5. Under Source settings, for Kinesis data stream, enter the ARN of your Kinesis data stream.
  6. Under Delivery stream name, enter the name of your Kinesis data stream.
  7. Under Destination settings, choose an S3 bucket or enter a bucket URI. You can either use an existing S3 bucket to store Amazon Monitron data, or you can create a new S3 bucket.
  8. Enable dynamic partitioning using inline parsing for JSON:
    • Choose Enabled for Dynamic partitioning.
    • Choose Enabled for Inline parsing for JSON.
    • Under Dynamic partitioning keys, add the following partition keys:
Key Name JQ Expression
project .projectName| "project=(.)"
site .eventPayload.siteName| "site=(.)"
asset .eventPayload.assetName| "asset=(.)"
position .eventPayload.positionName| "position=(.)"
time .timestamp| sub(" [0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3}$"; "")| "time=(.)"
  1. Choose Apply dynamic partitioning keys and confirm the generated S3 bucket prefix is:
!{partitionKeyFromQuery:project}/!{partitionKeyFromQuery:site}/!{partitionKeyFromQuery:asset}/!{partitionKeyFromQuery:position}/!{partitionKeyFromQuery:time}/.
  1. Enter a prefix for S3 bucket error output prefix. Any JSON payload that doesn’t contain the keys described earlier will be delivered in this prefix. For instance, thegatewayConnectedand gatewayDisconnected events are not linked to a given asset or position. Therefore, they won’t contain the assetName and positionName fields. Specifying this optional prefix here allows you to monitor this location and process these events accordingly.
  2. Choose Create delivery stream.

You can inspect the Amazon Monitron data in the S3 bucket. Note that the Amazon Monitron data will export live data at a frequency of once per hour, so wait for 1 hour to inspect the data.

This Kinesis Data Firehose setup enables dynamic partitioning, and the S3 objects delivered will use the following key format:

/project={projectName}/site={siteDisplayName}/asset={assetDisplayName}/ position={sensorPositionDisplayName}/time={yyyy-mm-dd 00:00:00}/{filename}.

Build the AWS Glue crawler to create a table of Amazon S3 data in Athena

After the live data has been exported to Amazon S3, we use an AWS Glue crawler to generate the metadata tables. In this post, we use AWS Glue crawlers to automatically infer database and table schema from Amazon Monitron data exported in Amazon S3, and store the associated metadata in the AWS Glue Data Catalog. Athena then uses the table metadata from the Data Catalog to find, read, and process the data in Amazon S3. Complete the following steps to create your database and table schema:

  1. On the AWS Glue console, choose Crawlers in the navigation pane.
  2. Choose Create crawler.
  3. Enter a name for the crawler (for example,XXX_xxxx_monitron).
  4. Choose Next.
  5. For Is your data already mapped to Glue tables, choose Not yet.
  6. For Data Source, choose S3.
  7. For Location of S3 data, choose In this Account, and enter the path of your S3 bucket directory you set up in the previous section (s3://YourBucketName).
  8. For Repeat crawls of S3 data stores, select Crawl all sub-folders.
  9. Finally, choose Next.
  10. Select Create new IAM role and enter a name for the role.
  11. Choose Next.
  12. Select Add Database, and enter a name for the database. This creates the Athena database where your metadata tables are located after the crawler is complete.
  13. For Crawler Schedule, select a preferred time-based scheduler (for example, hourly) to refresh the Amazon Monitron data in the database, and choose Next.
  14. Review the crawler details and choose Create.
  15. On the Crawlers page of the AWS Glue console, select the crawler you created and choose Run crawler.

You may need to wait a few minutes, depending on the size of the data. When it’s complete, the crawler’s status shows as Ready. To see the metadata tables, navigate to your database on the Databases page and choose Tables in the navigation pane.

You can also view data by choosing Table data on the console.

You’re redirected to the Athena console to view the top 10 records of the Amazon Monitron data in Amazon S3.

Create a dashboard of Amazon Monitron devices with Amazon Managed Grafana

In this section, we build a customized dashboard with Amazon Managed Grafana to visualize Amazon Monitron data in Amazon S3, so that OT team can get streamlined access to assets in alarm across their whole Amazon Monitron sensors fleet. This will enable the OT team to plan next step actions based on the possible root cause of the anomalies.

To create a Grafana workspace, complete the following steps:

  1. Ensure that your user role is admin or editor.
  2. On the Amazon Managed Grafana console, choose Create workspace.
  3. For Workspace name, enter a name for the workspace.
  4. Choose Next.
  5. For Authentication access, select AWS IAM Identity Center (successor to AWS Single Sign-On). You can use the same AWS IAM Identity Center user that you used to set up your Amazon Monitron project.
  6. Choose Next.
  7. For this first workspace, confirm that Service managed is selected for Permission type. This selection enables Amazon Managed Grafana to automatically provision the permissions you need for the AWS data sources that you use for this workspace.
  8. Choose Current account.
  9. Choose Next.
  10. Confirm the workspace details, and choose Create workspace. The workspace details page appears. Initially, the status is CREATING.
  11. Wait until the status is ACTIVE to proceed to the next step.

To configure your Athena data source, complete the following steps:

  1. On the Amazon Managed Grafana console, choose the workspace you want to work on.
  2. On the Data sources tab, select Amazon Athena, and choose Actions, Enable service-managed policy.
  3. Choose Configure in Grafana in the Amazon Athena row.
  4. Sign in to the Grafana workspace console using IAM Identity Center if necessary. The user should have the Athena access policy attached to the user or role to have access to the Athena data source. See AWS managed policy: AmazonGrafanaAthenaAccess for more info.
  5. On the Grafana workspace console, in the navigation pane, choose the lower AWS icon (there are two) and then choose Athena on the Data sources menu.
  6. Select the default Region that you want the Athena data source to query from, select the accounts that you want, then choose Add data source.
  7. Follow the steps to configure Athena details.

If your workgroup in Athena doesn’t have an output location configured already, you need to specify an S3 bucket and folder to use for query results. After setting up the data source, you can view or edit it in the Configuration pane.

In the following subsections, we demonstrate several panels in the Amazon Monitron dashboard built in Amazon Managed Grafana to gain operational insights. The Athena data source provides a standard SQL query editor that we’ll use to analyze the Amazon Monitron data to generate desired analytics.

First, if there are many sensors in the Amazon Monitron project and they are in different states (healthy, warning, alarm, and needs maintenance), the OT team wants to visually see the count of positions that sensors are in various states. You can obtain such information as a pie chart widget in Grafana via the following Athena query:

Select * FROM (Select latest_status, COUNT(assetdisplayname)OVER (PARTITION BY latest_status) AS asset_health_count FROM (SELECT timestamp, sitedisplayname, assetdisplayname, assetState.newState as latest_status, RANK() OVER (PARTITION BY assetdisplayname ORDER BY timestamp DESC)AS rnk FROM "AwsDataCatalog"."Replace with your Athena database name"."Replace with your Athena table name") tt WHERE tt.rnk=1) GROUP BY latest_status, asset_health_count; 

The following screenshot shows a panel with the latest distribution of Amazon Monitron sensor status.

To format your SQL query for Amazon Monitron data, refer to Understanding the data export schema.

Next, your Operations Technology team may want to plan predictive maintenance based on assets that are in alarm status, and therefore they want to quickly know the total number of acknowledged alarms vs. unacknowledged alarms. You can show the summary information of alarm state as simple stats panels in Grafana:

Select COUNT(*) FROM (Select timestamp, sitedisplayname, assetdisplayname, assetState.newState as latest_status, RANK() OVER (PARTITION BY assetdisplayname ORDER BY timestamp DESC)AS rnk FROM "AwsDataCatalog"."Replace with your Athena database name"."Replace with your Athena table name") tt WHERE tt.rnk=1 AND tt.latest_status = 'Alarm';

The following panel shows acknowledged and unacknowledged alarms.

The OT team can also query the amount of time the sensors remain in alarm status, so that they can decide their maintenance priority:

Select c.assetdisplayname, b.sensorpositiondisplayname, b.alarm_date FROM (Select a.assetdisplayname, a.sensorpositiondisplayname, COUNT(*)/24+1 AS number_of_days_in_alarm_state FROM (Select * FROM "AwsDataCatalog"."Replace with your Athena database name"."Replace with your Athena table name" WHERE (assetState.newState = 'ALARM' AND assetState.newState = assetState.previousState) ORDER BY timestamp DESC) a GROUP BY a.assetdisplayname, a.sensorpositiondisplayname) b INNER JOIN (Select * FROM (Select timestamp, sitedisplayname, assetdisplayname, assetState.newState AS latest_status, RANK() OVER (PARTITION BY assetdisplayname ORDER BY timestamp DESC)AS rnk FROM "AwsDataCatalog"."Replace with your Athena database name"."Replace with your Athena table name") tt WHERE tt.rnk=1 AND tt.latest_status = 'ALARM') c ON b.assetdisplayname = c.assetdisplayname;

The output of this analysis can be visualized by a bar chart in Grafana, and the alarm in alarm state can be easily visualized as shown in the following screenshot.

To analyze top/bottom asset performance based on the total amount of time the assets are in an alarm or need maintenance state, use the following query:

Select s.sitedisplayname, s.assetdisplayname, COUNT(s.timestamp)/24 AS trouble_time FROM (Select timestamp, sitedisplayname, assetdisplayname, sensorpositiondisplayname, assetState.newState FROM "AwsDataCatalog"."Replace with your Athena database name"."Replace with your Athena table name" WHERE assetState.newState = 'ALARM' OR assetState.newState = 'NEEDS_MAINTENANCE') AS s GROUP BY s.assetdisplayname, s.sitedisplayname ORDER BY trouble_time, s.assetdisplayname ASC LIMIT 5;

The following bar gauge is used to visualize the preceding query output, with the top performing assets showing 0 days of alarm states, and the bottom performing assets showing accumulated alarming states over the past year.

To help the OT team understand the possible root cause of an anomaly, the alarm types can be displayed for these assets still in alarm state with the following query:

Select a.assetdisplayname, a.sensorpositiondisplayname, a.latest_status, CASE WHEN a.temperatureML != 'HEALTHY' THEN 'TEMP' WHEN a.vibrationISO != 'HEALTHY' THEN 'VIBRATION_ISO' ELSE 'VIBRATION_ML' END AS alarm_type  FROM (Select sitedisplayname, assetdisplayname, sensorpositiondisplayname, models.temperatureML.persistentClassificationOutput as temperatureML, models.vibrationISO.persistentClassificationOutput as vibrationISO, models.vibrationML.persistentClassificationOutput as vibrationML, assetState.newState as latest_status FROM (Select *, RANK() OVER (PARTITION BY assetdisplayname, sensorpositiondisplayname ORDER BY timestamp DESC)AS rnk FROM "AwsDataCatalog"."Replace with your Athena database name"."Replace with your Athena table name") tt WHERE tt.rnk=1 AND assetState.newState = 'ALARM' ) a WHERE (a.temperatureML != 'HEALTHY' OR a. vibrationISO != 'HEALTHY' OR a. vibrationML != 'HEALTHY');

You can visualize this analysis as a table in Grafana. In this Amazon Monitron project, two alarms were triggered by ML models for vibration measurement.

The Amazon Managed Grafana dashboard is shown here for illustration purposes. You can adapt the dashboard design according to your own business needs.

Failure Reports

When a user acknowledges an alarm in the Amazon Monitron app, the associated assets transition to a new state. The user also has the opportunity to provide some details about this alarm:

  • Failure cause – This can be one of the following: ADMINISTRATION, DESIGN, FABRICATION, MAINTENANCE, OPERATION, OTHER, QUALITY, WEAR, or UNDEDETERMINED
  • Failure mode – This can be one of the following: NO_ISSUE, BLOCKAGE, CAVITATION, CORROSION, DEPOSIT, IMBALANCE, LUBRICATION, MISALIGNMENT, OTHER, RESONANCE, ROTATING_LOOSENESS, STRUCTURAL_LOOSENESS, TRANSMITTED_FAULT, or UNDETERMINED
  • Action taken – This can be ADJUST, CLEAN, LUBRICATE, MODIFY, OVERHAUL, REPLACE, NO_ACTION, or OTHER

The event payload associated to the asset state transition contains all this information, the previous state of the asset, and the new state of the asset. Stay tuned for an update of this post with more details on how you can use this information in an additional Grafana panel to build Pareto charts of the most common failures and actions taken across your assets.

Conclusion

Enterprise customers of Amazon Monitron are looking for a solution to build an IoT data lake with Amazon Monitron’s live data, so they can manage multiple Amazon Monitron projects and assets, and generate analytics reports across multiple Amazon Monitron projects. This post provide a detailed walkthrough of a solution to build this IoT data lake with the latest Amazon Monitron Kinesis data export v2 feature. This solution also showed how to use other AWS services, such as AWS Glue and Athena to query the data, generate analytics outputs, and visualize such outputs with Amazon Managed Grafana with frequent refresh.

As a next step, you can expand this solution by sending ML inference results to other EAM systems that you might use for work order management. This will allow your operation team to integrate Amazon Monitron with other enterprise applications, and improve their operation efficiency. You can also start building more in-depth insights into your failure modes and actions taken by processing the asset state transitions and the closure codes that are now part of the Kinesis data stream payload.


About the authors

Julia Hu is a Sr. AI/ML Solutions Architect at Amazon Web Services. She has extensive experience in IoT architecture and Applied Data Science, and is part of both the Machine Learning and IoT Technical Field Community. She works with customers, ranging from start-ups to enterprises, to develop AWSome IoT machine learning (ML) solutions, at the Edge and in the Cloud. She enjoys leveraging latest IoT and big data technology to scale up her ML solution, reduce latency, and accelerate industry adoption.

Bishr Tabbaa is a solutions architect at Amazon Web Services. Bishr specializes in helping customers with machine learning, security, and observability applications. Outside of work, he enjoys playing tennis, cooking, and spending time with family.

Shalika Pargal is a Product Manager at Amazon Web Services. Shalika focuses on building AI products and services for Industrial customers. She brings significant experience at the intersection of Product, Industrial and Business Development. She recently shared Monitron’s success story at Reinvent 2022.

Garry Galinsky is a Principal Solutions Architect supporting Amazon on AWS. He has been involved with Monitron since its debut and has helped integrate and deploy the solution into Amazon’s worldwide fulfillment network. He recently shared Amazon’s Monitron success story at re:Invent 2022.

Michaël Hoarau is an AI/ML Specialist Solutions Architect at AWS who alternates between data scientist and machine learning architect, depending on the moment. He is passionate about bringing the AI/ML power to the shop floors of his industrial customers and has worked on a wide range of ML use cases, ranging from anomaly detection to predictive product quality or manufacturing optimization. He published a book on time series analysis in 2022 and regularly writes about this topic on LinkedIn and Medium. When not helping customers develop the next best machine learning experiences, he enjoys observing the stars, traveling, or playing the piano.

Read More