Reduce energy consumption of your machine learning workloads by up to 90% with AWS purpose-built accelerators

Machine learning (ML) engineers have traditionally focused on striking a balance between model training and deployment cost vs. performance. Increasingly, sustainability (energy efficiency) is becoming an additional objective for customers. This is important because training ML models and then using the trained models to make predictions (inference) can be highly energy-intensive tasks. In addition, more and more applications around us have become infused with ML, and new ML-powered applications are conceived every day. A popular example is OpenAI’s ChatGPT, which is powered by a state-of-the-art large language model (LMM). For reference, GPT-3, an earlier generation LLM has 175 billion parameters and requires months of non-stop training on a cluster of thousands of accelerated processors. The Carbontracker study estimates that training GPT-3 from scratch may emit up to 85 metric tons of CO2 equivalent, using clusters of specialized hardware accelerators.

There are several ways AWS is enabling ML practitioners to lower the environmental impact of their workloads. One way is through providing prescriptive guidance around architecting your AI/ML workloads for sustainability. Another way is by offering managed ML training and orchestration services such as Amazon SageMaker Studio, which automatically tears down and scales up ML resources when not in use, and provides a host of out-of-the-box tooling that saves cost and resources. Another major enabler is the development of energy efficient, high-performance, purpose-built accelerators for training and deploying ML models.

The focus of this post is on hardware as a lever for sustainable ML. We present the results of recent performance and power draw experiments conducted by AWS that quantify the energy efficiency benefits you can expect when migrating your deep learning workloads from other inference- and training-optimized accelerated Amazon Elastic Compute Cloud (Amazon EC2) instances to AWS Inferentia and AWS Trainium. Inferentia and Trainium are AWS’s recent addition to its portfolio of purpose-built accelerators specifically designed by Amazon’s Annapurna Labs for ML inference and training workloads.

AWS Inferentia and AWS Trainium for sustainable ML

To provide you with realistic numbers of the energy savings potential of AWS Inferentia and AWS Trainium in a real-world application, we have conducted several power draw benchmark experiments. We have designed these benchmarks with the following key criteria in mind:

First, we wanted to make sure that we captured direct energy consumption attributable to the test workload, including not just the ML accelerator but also the compute, memory, and network. Therefore, in our test setup, we measured power draw at that level.
Second, when running the training and inference workloads, we ensured that all instances were operating at their respective physical hardware limits and took measurements only after that limit was reached to ensure comparability.
Finally, we wanted to be certain that the energy savings reported in this post could be achieved in a practical real-world application. Therefore, we used common customer-inspired ML use cases for benchmarking and testing.

The results are reported in the following sections.

Inference experiment: Real-time document understanding with LayoutLM

Inference, as opposed to training, is a continuous, unbounded workload that doesn’t have a defined completion point. It therefore makes up a large portion of the lifetime resource consumption of an ML workload. Getting inference right is key to achieving high performance, low cost, and sustainability (better energy efficiency) along the full ML lifecycle. With inference tasks, customers are usually interested in achieving a certain inference rate to keep up with the ingest demand.

The experiment presented in this post is inspired by a real-time document understanding use case, which is a common application in industries like banking or insurance (for example, for claims or application form processing). Specifically, we select LayoutLM, a pre-trained transformer model used for document image processing and information extraction. We set a target SLA of 1,000,000 inferences per hour, a value often considered as real time, and then specify two hardware configurations capable of meeting this requirement: one using Amazon EC2 Inf1 instances, featuring AWS Inferentia, and one using comparable accelerated EC2 instances optimized for inference tasks. Throughout the experiment, we track several indicators to measure inference performance, cost, and energy efficiency of both hardware configurations. The results are presented in the following figure.

Performance, Cost and Energy Efficiency Results of Inference Benchmarks

AWS Inferentia delivers 6.3 times higher inference throughput. As a result, with Inferentia, you can run the same real-time LayoutLM-based document understanding workload on fewer instances (6 AWS Inferentia instances vs. 33 other inference-optimized accelerated EC2 instances, equivalent to an 82% reduction), use less than a tenth (-92%) of the energy in the process, all while achieving significantly lower cost per inference (USD 2 vs. USD 25 per million inferences, equivalent to a 91% cost reduction).

Training experiment: Training BERT Large from scratch

Training, as opposed to inference, is a finite process that is repeated much less frequently. ML engineers are typically interested in high cluster performance to reduce training time while keeping cost under control. Energy efficiency is a secondary (yet growing) concern. With AWS Trainium, there is no trade-off decision: ML engineers can benefit from high training performance while also optimizing for cost and reducing environmental impact.

To illustrate this, we select BERT Large, a popular language model used for natural language understanding use cases such as chatbot-based question answering and conversational response prediction. Training a well-performing BERT Large model from scratch typically requires 450 million sequences to be processed. We compare two cluster configurations, each with a fixed size of 16 instances and capable of training BERT Large from scratch (450 million sequences processed) in less than a day. The first uses traditional accelerated EC2 instances. The second setup uses Amazon EC2 Trn1 instances featuring AWS Trainium. Again, we benchmark both configurations in terms of training performance, cost, and environmental impact (energy efficiency). The results are shown in the following figure.

Performance, Cost and Energy Efficiency Results of Training Benchmarks

In the experiments, AWS Trainium-based instances outperformed the comparable training-optimized accelerated EC2 instances by a factor of 1.7 in terms of sequences processed per hour, cutting the total training time by 43% (2.3h versus 4h on comparable accelerated EC2 instances). As a result, when using a Trainium-based instance cluster, the total energy consumption for training BERT Large from scratch is approximately 29% lower compared to a same-sized cluster of comparable accelerated EC2 instances. Again, these performance and energy efficiency benefits also come with significant cost improvements: cost to train for the BERT ML workload is approximately 62% lower on Trainium instances (USD 787 versus USD 2091 per full training run).

Getting started with AWS purpose-built accelerators for ML

Although the experiments conducted here all use standard models from the natural language processing (NLP) domain, AWS Inferentia and AWS Trainium excel with many other complex model architectures including LLMs and the most challenging generative AI architectures that users are building (such as GPT-3). These accelerators do particularly well with models with over 10 billion parameters, or computer vision models like stable diffusion (see Model Architecture Fit Guidelines for more details). Indeed, many of our customers are already using Inferentia and Trainium for a wide variety of ML use cases.

To run your end-to-end deep learning workloads on AWS Inferentia- and AWS Trainium-based instances, you can use AWS Neuron. Neuron is an end-to-end software development kit (SDK) that includes a deep learning compiler, runtime, and tools that are natively integrated into the most popular ML frameworks like TensorFlow and PyTorch. You can use the Neuron SDK to easily port your existing TensorFlow or PyTorch deep learning ML workloads to Inferentia and Trainium and start building new models using the same well-known ML frameworks. For easier setup, use one of our Amazon Machine Images (AMIs) for deep learning, which come with many of the required packages and dependencies. Even simpler: you can use Amazon SageMaker Studio, which natively supports TensorFlow and PyTorch on Inferentia and Trainium (see the aws-samples GitHub repo for an example).

One final note: while Inferentia and Trainium are purpose built for deep learning workloads, many less complex ML algorithms can perform well on CPU-based instances (for example, XGBoost and LightGBM and even some CNNs). In these cases, a migration to AWS Graviton3 may significantly reduce the environmental impact of your ML workloads. AWS Graviton-based instances use up to 60% less energy for the same performance than comparable accelerated EC2 instances.

Conclusion

There is a common misconception that running ML workloads in a sustainable and energy-efficient fashion means sacrificing on performance or cost. With AWS purpose-built accelerators for machine learning, ML engineers don’t have to make that trade-off. Instead, they can run their deep learning workloads on highly specialized purpose-built deep learning hardware, such as AWS Inferentia and AWS Trainium, that significantly outperforms comparable accelerated EC2 instance types, delivering lower cost, higher performance, and better energy efficiency—up to 90%—all at the same time. To start running your ML workloads on Inferentia and Trainium, check out the AWS Neuron documentation or spin up one of the sample notebooks. You can also watch the AWS re:Invent 2022 talk on Sustainability and AWS silicon (SUS206), which covers many of the topics discussed in this post.

About the Authors

Karsten Schroer is a Solutions Architect at AWS. He supports customers in leveraging data and technology to drive sustainability of their IT infrastructure and build data-driven solutions that enable sustainable operations in their respective verticals. Karsten joined AWS following his PhD studies in applied machine learning & operations management. He is truly passionate about technology-enabled solutions to societal challenges and loves to dive deep into the methods and application architectures that underlie these solutions.

Kamran Khan is a Sr. Technical Product Manager at AWS Annapurna Labs. He works closely with AI/ML customers to shape the roadmap for AWS purpose-built silicon innovations coming out of Amazon’s Annapurna Labs. His specific focus is on accelerated deep-learning chips including AWS Trainium and AWS Inferentia. Kamran has 18 years of experience in the semiconductor industry. Kamran has over a decade of experience helping developers achieve their ML goals.

NVIDIA CEO: Creators Will Be “Supercharged” by Generative AI

Generative AI will “supercharge” creators across industries and content types, NVIDIA founder and CEO Jensen Huang said today at the Cannes Lions Festival, on the French Riviera.

“For the very first time, the creative process can be amplified in content generation, and the content generation could be in any modality — it could be be text, images, 3D, videos,” Huang said in a conversation with Mark Read, CEO of WPP — the world’s largest marketing and communications services company.

Huang and Read backstage at Cannes Lions

At the event attended by thousands of creators, marketers and brand execs from around the world, Huang outlined the impact of AI on the $700 billion digital advertising industry. He also touched on the ways AI can enhance creators’ abilities, as well as the importance of responsible AI development.

“You can do content generation at scale, but infinite content doesn’t imply infinite creativity,” he said. “Through our thoughts, we have to direct this AI to generate content that has to be aligned to your values and your brand tone.”

The discussion followed Huang’s recent keynote at COMPUTEX, where NVIDIA and WPP announced a collaboration to develop a content engine powered by generative AI and the NVIDIA Omniverse platform for building and operating metaverse applications.

Driving Forces of the Generative AI Era

NVIDIA has been pushing the boundaries of graphics technology for 30 years and been at the forefront of the AI revolution for a decade. This combination of expertise in graphics and AI uniquely positions the company to enable the new era of generative AI applications.

Huang said that “the biggest moment of modern AI” can be traced back to an academic contest in 2012, when a team of University of Toronto researchers led by Alex Krizhevsky showed that NVIDIA GPUs could train an AI model that recognized objects better than any computer vision algorithm that came before it.

Since then, developers have taught neural networks to recognize images, videos, speech, protein structures, physics and more.

“You could learn the language of almost anything,” Huang said. “Once you learn the language, you can apply the language — and the application of language is generation.”

Generative AI models can create text, pixels, 3D objects and realistic motion, giving professionals superpowers to more quickly bring their ideas to life. Like a creative director working with a team of artists, users can direct AI models with prompts, and fine-tune the output to align with their vision.

“You have to give the machine feedback like the best creative director,” Read said.

These tools aren’t a replacement for human creativity, Huang emphasized. They augment the skills of artists and marketing professionals to help them feed demand from clients by producing content more quickly and in multiple forms tailored to different audiences.

“We will democratize content generation,” Huang said.

Reimagining How We Live, Work and Create With AI

Generative AI’s key benefit for the creative industry is its ability to scale up content generation, rapidly generating options for text and visuals that can be used in advertising, marketing and film.

“In the old days, you’d create hundreds of different ad options that are retrieved based on the medium,” Huang said. “In the future, you won’t retrieve — you’ll generate billions of different ads. But every single one of them has to be tone appropriate, has to be brand perfect.”

For use by professional creators, these AI tools must also produce high-quality visuals that meet or exceed the standard of content captured through traditional methods.

It all starts with a digital twin, a true-to-reality simulation of a real-world physical asset. The NVIDIA Omniverse platform enables the creation of stunning, photorealistic visuals that accurately represent physics and materials — whether for images, videos, 3D objects or immersive virtual worlds.

“Omniverse is a virtual world,” Huang said. “We created a virtual world where AI could learn how to create an AI that’s physically based and grounded by physics.”

“This virtual world has the ability to ingest assets and content that’s created by any tool, because we have this interface called USD,” he said, referring to the Universal Scene Description framework for collaborating in 3D. With it, artists and designers can combine assets developed using popular tools from companies like Adobe and Autodesk with virtual worlds developed using generative AI.

NVIDIA Picasso, a foundry for custom generative AI models for visual design unveiled earlier this year, also supports best-in-class image, video and 3D generative AI capabilities developed in collaboration with partners including Adobe, Getty Images and Shutterstock.

“We created a platform that makes it possible for our partners to train from data that was licensed properly from, for example, Getty, Shutterstock, Adobe,” Huang said. “They’re respectful of the content owners. The training data comes from that source, and whatever economic benefits come from that could accrete back to the creators.”

Like any groundbreaking technology, it’s critical that AI is developed and deployed thoughtfully, Read and Huang said. Technology to watermark AI-generated assets and to detect whether a digital asset was modified or counterfeited will support these goals.

“We have to put as much energy into the capabilities of AI as we do the safety of AI,” Huang said. “In the world of advertising, safety is brand alignment, brand integrity, appropriate tone and truth.”

Collaborating on Content Engine for Digital Advertising

As a leader in digital advertising, WPP is embracing AI as a tool to boost creativity and personalization, helping creators across the industry craft compelling messages that reach the right consumer.

“From the creative process to the customer, there’s going to have to be ad agencies in the middle that understand the technology,” Huang said. “That entire process in the middle requires humans in the loop. You have to understand the voice of the brand you’re trying to represent.”

Using Omniverse Cloud, WPP’s creative professionals can build physically accurate digital twins of products using a brand’s specific product-design data. This real-world data can be combined with AI-generated objects and digital environments — licensed through partners such as Adobe and Getty Images — to create virtual sets for marketing content.

“WPP is going to unquestionably become an AI company,” Huang said. “You’ll create an AI factory where the input is creativity, thoughts and prompts, and what comes out of it is content.”

Enhanced by responsibly trained, NVIDIA-accelerated generative AI, this content engine will boost creative teams’ speed and efficiency, helping them quickly render brand-accurate advertising content at scale.

“The type of content you’ll be able to help your clients generate will be practically infinite,” Huang said. “From the days of hundreds of examples of content that you create for a particular brand or for a particular campaign, it’s going to eventually become billions of generated content for every individual.”

Learn more about NVIDIA’s collaboration with WPP.

On-device fetal ultrasound assessment with TensorFlow Lite

Posted by Angelica Willis and Akib Uddin, Health AI Team, Google Research

How researchers at Google are working to expand global access to maternal healthcare with the help of AI

TensorFlow Lite* is an open-source framework to run machine learning models on mobile and edge devices. It’s popular for use cases ranging from image classification, object detection, speech recognition, natural language tasks, and more. From helping parents of deaf children learn sign language, to predicting air quality, projects using TensorFlow Lite are demonstrating how on-device ML could directly and positively impact lives by making these socially beneficial applications of AI more accessible, globally. In this post, we describe how TensorFlow Lite is being used to help develop ultrasound tools in under-resourced settings.

Motivation

According to the WHO, complications from pregnancy and childbirth contribute to roughly 287,000 maternal deaths and 2.4 million neonatal deaths worldwide each year. As many as 95% of these deaths occur in under-resourced settings and many are preventable if detected early. Obstetric diagnostics, such as determining gestational age and fetal presentation, are important indicators in planning prenatal care, monitoring the health of the birthing parent and fetus, and determining when intervention is required. Many of these factors are traditionally determined by ultrasound.

Advancements in sensor technology have made ultrasound devices more affordable and portable, integrating directly with smartphones. However, ultrasound requires years of training and experience, and, in many rural or underserved regions, there is a shortage of trained ultrasonography experts, making it difficult for people to access care. Due to this global lack of availability, it has been estimated that as many as two-thirds of pregnant people in these settings do not receive ultrasound screening during pregnancy.

Expanding access by enabling non-experts

Google Research is building AI models to help expand access to ultrasound, including models to predict gestational age and fetal presentation, to allow health workers with no background in ultrasonography to collect clinically useful ultrasound scans. These models make predictions from ultrasound video obtained using an easy-to-teach operating procedure, a blind sweep protocol, in which a user blindly sweeps the ultrasound probe over the patient’s abdomen. In our recent paper, “A mobile-optimized artificial intelligence system for gestational age and fetal malpresentation assessment”, published in Nature Communications Medicine, we demonstrated that, when utilizing blind sweeps, these models enable these non-experts to match standard of care performance in predicting these diagnostics.

Blind Sweep Operating Procedure

Moving images depicting blind-sweep ultrasound technique from a transverse view on the left and aerial view on the right

This blind-sweep ultrasound acquisition procedure can be performed by non-experts with only a few hours of ultrasound training.

Figure A on the left is a chart sshowing gestatinal age model performance comparing the blind sweep based gestational age with the clinical standard of care method indicating the percentile absolute error in days. Figure B on the right is a graphical representation of the fetal presentation model performance showing the Receiver Operating Characteristic curves for blind sweep based fetal malpresentation for cases collected by novices or expert sonographers.

Figure A compares our blind sweep-based gestational age regression model performance with that of the clinical standard of care method for fetal age estimation from fetal biometry measured by expert sonographers. Boxes indicate 25th, 50th, and 75th percentile absolute error in days, and whiskers indicate 5th and 95th percentile absolute error (n = 407 study participants). Figure B shows the Receiver Operating Characteristic (ROC) curves for our blind sweep-based fetal malpresentation classification model, as well as specific performance curves for cases in which blind sweeps were collected by expert sonographers or novices (n = 623 study participants). See our recent paper for further details and additional analysis.

Model development

Understanding that our target deployment environment is one in which users might not have reliable access to power and internet, we designed these models to be mobile-optimized. Our grouped convolutional LSTM architecture utilizes MobileNetV2 for feature extraction on each video frame as it is received. The final feature layer produces a sequence of image embeddings which are processed by the convolutional LSTM cell state. Since the recurrent connections only operate on the less memory-intensive embeddings, this model can run efficiently in a mobile environment.

For each subsequence of video frames that make up a sweep, we generate a clip-level diagnostic result, and in the case of gestational age, also produce a model confidence estimate represented as the predicted variance in the detected age. Clip-level gestational age predictions are aggregated via inverse variance weighting to produce a final case-level prediction.

Flow chart depicting Gestational Age LSTM Video Model

Optimization through TensorFlow Lite

On-device ML has many advantages, including providing enhanced privacy and security by ensuring that sensitive input data never needs to leave the device. Another important advantage of on-device ML, particularly for our use case, is the ability to leverage ML offline in regions with low internet connectivity, including where smartphones serve as a stand-in for more expensive traditional devices. Our prioritization of on-device ML made TensorFlow Lite a natural choice for optimizing and evaluating the memory use and execution speed of our existing models, without significant changes to model structure or prediction performance.

After converting our models to TensorFlow Lite using the converter API, we explored various optimization strategies, including post-training quantization and alternative delegate configurations. Leveraging a TensorFlow Lite GPU delegate, optimized for sustained inference speed, provided the most significant boost to execution speed. There was a roughly 2x speed improvement with no loss in model accuracy, which equated to real-time inference of more than 30 frames/second with both the gestational age and fetal presentation models running in parallel on Pixel devices. We benchmarked model initialization time, inference time and memory usage for various delegate configurations using TensorFlow Lite performance measurement tools, finding the optimal configuration across multiple mobile device manufacturers.

These critical speed improvements allow us to leverage the model confidence estimate to provide sweep-quality feedback to the user immediately after the sweep was captured. When low-quality sweeps are detected, users can be provided with tips on how their sweep can be improved (for example, applying more pressure or ultrasound gel), then prompted to re-do the sweep.

Screen capture of sweep exam being conducted

We developed a mobile application that demonstrates what a potential user experience could look like and allows us to evaluate our TensorFlow Lite models in realistic environments. This app enables ultrasound video frames to be received directly from portable ultrasound devices that support this use case.

Looking ahead

Our vision is to enable safer pregnancy journeys using AI-driven ultrasound that could broaden access globally. We want to be thoughtful and responsible in how we develop our AI to maximize positive benefits and address challenges, guided by our AI Principles. TensorFlow Lite has helped enable our research team to explore, prototype, and de-risk impactful care-delivery strategies designed with the needs of lower-resource communities in mind.

This research is in its early stages and we look forward to opportunities to expand our work. To achieve our goals and scale this technology for wider reach globally, partnerships are critical. We are excited about our partnerships with Northwestern Medicine in the US and Jacaranda Health in Kenya to further develop and evaluate these models. With more automated and accurate evaluations of maternal and fetal health risks, we hope to lower barriers and help people get timely care.

Acknowledgements

This work was developed by an interdisciplinary team within Google Research: Ryan G. Gomes, Chace Lee, Angelica Willis, Marcin Sieniek, Christina Chen, James A. Taylor, Scott Mayer McKinney, George E. Dahl, Justin Gilmer, Charles Lau, Terry Spitz, T. Saensuksopa, Kris Liu, Tiya Tiyasirichokchai, Jonny Wong, Rory Pilgrim, Akib Uddin, Greg Corrado, Lily Peng, Katherine Chou, Daniel Tse, & Shravya Shetty.

This work was developed in collaboration with:

Department of Obstetrics and Gynaecology, University of Zambia School of Medicine, Lusaka, Zambia

Department of Obstetrics and Gynecology, University of North Carolina School of Medicine, Chapel Hill, NC, USA

UNC Global Projects—Zambia, LLC, Lusaka, Zambia

Special thanks to: Yun Liu, Cameron Chen, Sami Lachgar, Lauren Winer, Annisah Um’rani, and Sachin Kotwani

*TensorFlow Lite has not been certified or validated for clinical, medical, or diagnostic purposes. TensorFlow Lite users are solely responsible for their use of the framework and independently validating any outputs generated by their project.

RoboCat: A self-improving robotic agent

Robots are quickly becoming part of our everyday lives, but they’re often only programmed to perform specific tasks well. While harnessing recent advances in AI could lead to robots that could help in many more ways, progress in building general-purpose robots is slower in part because of the time needed to collect real-world training data. Our latest paper introduces a self-improving AI agent for robotics, RoboCat, that learns to perform a variety of tasks across different arms, and then self-generates new training data to improve its technique.Read More

RoboCat: A self-improving robotic agent

Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning

We present Spatial LibriSpeech, a spatial audio dataset with over 570 hours of 19-channel audio, first-order ambisonics, and optional distractor noise. Spatial LibriSpeech is designed for machine learning model training, and it includes labels for source position, speaking direction, room acoustics and geometry. Spatial LibriSpeech is generated by augmenting LibriSpeech samples with >220k simulated acoustic conditions across >8k synthetic rooms. To demonstrate the utility of our dataset, we train models on four fundamental spatial audio tasks, resulting in a median absolute error of 6.60° on…Apple Machine Learning Research

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

m*= Equal Contributors
Training stability is of great importance to Transformers. In this work, we investigate the training dynamics of Transformers by examining the evolution of the attention layers. In particular, we track the attention entropy for each attention head during the course of training, which is a proxy for model sharpness. We identify a common pattern across different architectures and tasks, where low attention entropy is accompanied by high training instability, which can take the form of oscillating loss or divergence. We denote the pathologically low attention entropy…Apple Machine Learning Research

Private Online Prediction from Experts: Separations and Faster Rates

*= Equal Contributors
Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries. For approximate differential privacy, our algorithms achieve regret bounds of for the stochastic setting and for oblivious adversaries (where is the number of experts). For pure DP, our algorithms are the first to obtain sub-linear regret for oblivious adversaries in the…Apple Machine Learning Research

The Monge Gap: A Regularizer to Learn All Transport Maps

Optimal transport (OT) theory has been been used in machine learning to study and characterize maps that can push-forward efficiently a probability measure onto another.
Recent works have drawn inspiration from Brenier’s theorem, which states that when the ground cost is the squared-Euclidean distance, the “best” map to morph a continuous measure in into another must be the gradient of a convex function.
To exploit that result, , Makkuva et al. (2020); Korotin et al. (2020) consider maps , where is an input convex neural network (ICNN), as defined by Amos et al. 2017, and fit with SGD using…Apple Machine Learning Research

Onboard users to Amazon SageMaker Studio with Active Directory group-specific IAM roles

Amazon SageMaker Studio is a web-based integrated development environment (IDE) for machine learning (ML) that lets you build, train, debug, deploy, and monitor your ML models. For provisioning Studio in your AWS account and Region, you first need to create an Amazon SageMaker domain—a construct that encapsulates your ML environment. More concretely, a SageMaker domain consists of an associated Amazon Elastic File System (Amazon EFS) volume, a list of authorized users, and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations.

When creating your SageMaker domain, you can choose to use either AWS IAM Identity Center (successor to AWS Single Sign-On) or AWS Identity and Access Management (IAM) for user authentication methods. Both authentication methods have their own set of use cases; in this post, we focus on SageMaker domains with IAM Identity Center, or single sign-on (SSO) mode, as the authentication method.

With SSO mode, you set up an SSO user and group in IAM Identity Center and then grant access to either the SSO group or user from the Studio console. Currently, all SSO users in a domain inherit the domain’s execution role. This may not work for all organizations. For instance, administrators may want to set up IAM permissions for a Studio SSO user based on their Active Directory (AD) group membership. Furthermore, because administrators are required to manually grant SSO users access to Studio, the process may not scale when onboarding hundreds of users.

In this post, we provide prescriptive guidance for the solution to provision SSO users to Studio with least privilege permissions based on AD group membership. This guidance enables you to quickly scale for onboarding hundreds of users to Studio and achieve your security and compliance posture.

Solution overview

The following diagram illustrates the solution architecture.

The workflow to provision AD users in Studio includes the following steps:

Set up a Studio domain in SSO mode.
For each AD group:
1. Set up your Studio execution role with appropriate fine-grained IAM policies
2. Record an entry in the AD group-role mapping Amazon DynamoDB table.
Alternatively, you can adopt a naming standard for IAM role ARNs based on the AD group name and derive the IAM role ARN without needing to store the mapping in an external database.
Sync your AD users and groups and memberships to AWS Identity Center:
1. If you’re using an identity provider (IdP) that supports SCIM, use the SCIM API integration with IAM Identity Center.
2. If you are using self-managed AD, you may use AD Connector.
When the AD group is created in your corporate AD, complete the following steps:
1. Create a corresponding SSO group in IAM Identity Center.
2. Associate the SSO group to the Studio domain using the SageMaker console.
When an AD user is created in your corporate AD, a corresponding SSO user is created in IAM Identity Center.
When the AD user is assigned to an AD group, an IAM Identity Center API (CreateGroupMembership) is invoked, and SSO group membership is created.
The preceding event is logged in AWS CloudTrail with the name AddMemberToGroup.
An Amazon EventBridge rule listens to CloudTrail events and matches the AddMemberToGroup rule pattern.
The EventBridge rule triggers the target AWS Lambda function.
This Lambda function will call back IAM Identity Center APIs, get the SSO user and group information, and perform the following steps to create the Studio user profile (CreateUserProfile) for the SSO user:
1. Look up the DynamoDB table to fetch the IAM role corresponding to the AD group.
2. Create a user profile with the SSO user and the IAM role obtained from the lookup table.
3. The SSO user is granted access to Studio.
The SSO user is redirected to the Studio IDE via the Studio domain URL.

Note that, as of writing, Step 4b (associate the SSO group to the Studio domain) needs to be performed manually by an admin using the SageMaker console at the SageMaker domain level.

Set up a Lambda function to create the user profiles

The solution uses a Lambda function to create the Studio user profiles. We provide the following sample Lambda function that you can copy and modify to meet your needs for automating the creation of the Studio user profile. This function performs the following actions:

Receive the CloudTrail AddMemberToGroup event from EventBridge.
Retrieve the Studio DOMAIN_ID from the environment variable (you can alternatively hard-code the domain ID or use a DynamoDB table as well if you have multiple domains).
Read from a dummy markup table to match AD users to execution roles. You can change this to fetch from the DynamoDB table if you’re using a table-driven approach. If you use DynamoDB, your Lambda function’s execution role needs permissions to read from the table as well.
Retrieve the SSO user and AD group membership information from IAM Identity Center, based on the CloudTrail event data.
Create a Studio user profile for the SSO user, with the SSO details and the matching execution role.

import os
import json
import boto3
DOMAIN_ID = os.environ.get('DOMAIN_ID', 'd-xxxx')


def lambda_handler(event, context):
    
    print({"Event": event})

    client = boto3.client('identitystore')
    sm_client = boto3.client('sagemaker')
    
    event_detail = event['detail']
    group_response = client.describe_group(
        IdentityStoreId=event_detail['requestParameters']['identityStoreId'],
        GroupId=event_detail['requestParameters']['groupId'],
    )
    group_name = group_response['DisplayName']
    
    user_response = client.describe_user(
        IdentityStoreId=event_detail['requestParameters']['identityStoreId'],
        UserId=event_detail['requestParameters']['member']['memberId']
    )
    user_name = user_response['UserName']
    print(f"Event details: {user_name} has been added to {group_name}")
    
    mapping_dict = {
        "ad-group-1": "<execution-role-arn>",
        "ad-group-2": "<execution-role-arn>”
    }
    
    user_role = mapping_dict.get(group_name)
    
    if user_role:
        response = sm_client.create_user_profile(
            DomainId=DOMAIN_ID,
            SingleSignOnUserIdentifier="UserName",
            SingleSignOnUserValue=user_name,
            # if the SSO user_name value is an email, 
	  #  add logic to handle it since Studio user profiles don’t accept @ character
            UserProfileName=user_name, 
            UserSettings={
                "ExecutionRole": user_role
            }
        )
        print(response)
    else:
        response = "Group is not authorized to use SageMaker. Doing nothing."
        print(response)
    return {
        'statusCode': 200,
        'body': json.dumps(response)
    }

Note that by default, the Lambda execution role doesn’t have access to create user profiles or list SSO users. After you create the Lambda function, access the function’s execution role on IAM and attach the following policy as an inline policy after scoping down as needed based on your organization requirements.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "identitystore:DescribeGroup",
                "identitystore:DescribeUser"
            ],
            "Effect": "Allow",
            "Resource": "*"
        },
        {
            "Action": "sagemaker:CreateUserProfile",
            "Effect": "Allow",
            "Resource": "*"
        },
        {
            "Action": "iam:PassRole",
            "Effect": "Allow",
            "Resource": [
                "<list-of-studio-execution-roles>"
            ]
        }
    ]
}

Set up the EventBridge rule for the CloudTrail event

EventBridge is a serverless event bus service that you can use to connect your applications with data from a variety of sources. In this solution, we create a rule-based trigger: EventBridge listens to events and matches against the provided pattern and triggers a Lambda function if the pattern match is successful. As explained in the solution overview, we listen to the AddMemberToGroup event. To set it up, complete the following steps:

On the EventBridge console, choose Rules in the navigation pane.
Choose Create rule.
Provide a rule name, for example, AddUserToADGroup.
Optionally, enter a description.
Select default for the event bus.
Under Rule type, choose Rule with an event pattern, then choose Next.
On the Build event pattern page, choose Event source as AWS events or EventBridge partner events.

Under Event pattern, choose the Custom patterns (JSON editor) tab and enter the following pattern:

{
  "source": ["aws.sso-directory"],
  "detail-type": ["AWS API Call via CloudTrail"],
  "detail": {
    "eventSource": ["sso-directory.amazonaws.com"],
    "eventName": ["AddMemberToGroup"]
  }
}

Choose Next.
On the Select target(s) page, choose the AWS service for the target type, the Lambda function as the target, and the function you created earlier, then choose Next.
Choose Next on the Configure tags page, then choose Create rule on the Review and create page.

After you’ve set the Lambda function and the EventBridge rule, you can test out this solution. To do so, open your IdP and add a user to one of the AD groups with the Studio execution role mapped. Once you add the user, you can verify the Lambda function logs to inspect the event and also see the Studio user provisioned automatically. Additionally, you can use the DescribeUserProfile API call to verify that the user is created with appropriate permissions.

Supporting multiple Studio accounts

To support multiple Studio accounts with the preceding architecture, we recommend the following changes:

Set up an AD group mapped to each Studio account level.
Set up a group-level IAM role in each Studio account.
Set up or derive the group to IAM role mapping.
Set up a Lambda function to perform cross-account role assumption, based on the IAM role mapping ARN and created user profile.

Deprovisioning users

When a user is removed from their AD group, you should remove their access from the Studio domain as well. With SSO, when a user is removed, the user is disabled in IAM Identity Center automatically if the AD to IAM Identity Center sync is in place, and their Studio application access is immediately revoked.

However, the user profile on Studio still persists. You can add a similar workflow with CloudTrail and a Lambda function to remove the user profile from Studio. The EventBridge trigger should now listen for the DeleteGroupMembership event. In the Lambda function, complete the following steps:

Obtain the user profile name from the user and group ID.
List all running apps for the user profile using the ListApps API call, filtering by the UserProfileNameEquals parameter. Make sure to check for the paginated response, to list all apps for the user.
Delete all running apps for the user and wait until all apps are deleted. You can use the DescribeApp API to view the app’s status.
When all apps are in a Deleted state (or Failed), delete the user profile.

With this solution in place, ML platform administrators can maintain group memberships in one central location and automate the Studio user profile management through EventBridge and Lambda functions.

The following code shows a sample CloudTrail event:

"AddMemberToGroup": 
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "Unknown",
        "accountId": "<account-id>",
        "accessKeyId": "30997fec-b566-4b8b-810b-60934abddaa2"
    },
    "eventTime": "2022-09-26T22:24:18Z",
    "eventSource": "sso-directory.amazonaws.com",
    "eventName": "AddMemberToGroup",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "54.189.184.116",
    "userAgent": "Okta SCIM Client 1.0.0",
    "requestParameters": {
        "identityStoreId": "d-906716eb24",
        "groupId": "14f83478-a061-708f-8de4-a3a2b99e9d89",
        "member": {
            "memberId": "04c8e458-a021-702e-f9d1-7f430ff2c752"
        }
    },
    "responseElements": null,
    "requestID": "b24a123b-afb3-4fb6-8650-b0dc1f35ea3a",
    "eventID": "c2c0873b-5c49-404c-add7-f10d4a6bd40c",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "<account-id>",
    "eventCategory": "Management",
    "tlsDetails": {
        "tlsVersion": "TLSv1.2",
        "cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
        "clientProvidedHostHeader": "up.sso.us-east-1.amazonaws.com"
    }
}

The following code shows a sample Studio user profile API request:

create-user-profile \
--domain-id d-xxxxxx \
--user-profile-name ssouserid
--single-sign-on-user-identifier 'userName' \
--single-sign-on-user-value 'ssouserid‘ \
--user-settings ExecutionRole=arn:aws:iam::<account id>:role/name

Conclusion

In this post, we discussed how administrators can scale Studio onboarding for hundreds of users based on their AD group membership. We demonstrated an end-to-end solution architecture that organizations can adopt to automate and scale their onboarding process to meet their agility, security, and compliance needs. If you’re looking for a scalable solution to automate your user onboarding, try this solution, and leave you feedback below! For more information about onboarding to Studio, see Onboard to Amazon SageMaker Domain.

About the authors

Ram Vittal is an ML Specialist Solutions Architect at AWS. He has over 20 years of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his 2-year-old sheep-a-doodle!

Durga Sury is an ML Solutions Architect in the Amazon SageMaker Service SA team. She is passionate about making machine learning accessible to everyone. In her 4 years at AWS, she has helped set up AI/ML platforms for enterprise customers. When she isn’t working, she loves motorcycle rides, mystery novels, and hiking with her 5-year-old husky.

Vedere AI

Monthly Archives: June 2023

Reduce energy consumption of your machine learning workloads by up to 90% with AWS purpose-built accelerators

AWS Inferentia and AWS Trainium for sustainable ML

Inference experiment: Real-time document understanding with LayoutLM

Training experiment: Training BERT Large from scratch

Getting started with AWS purpose-built accelerators for ML

Conclusion

About the Authors

NVIDIA CEO: Creators Will Be “Supercharged” by Generative AI

Driving Forces of the Generative AI Era

Reimagining How We Live, Work and Create With AI

Collaborating on Content Engine for Digital Advertising

On-device fetal ultrasound assessment with TensorFlow Lite

How researchers at Google are working to expand global access to maternal healthcare with the help of AI

Motivation

Expanding access by enabling non-experts

Blind Sweep Operating Procedure

Model development

Optimization through TensorFlow Lite

Looking ahead

Acknowledgements

RoboCat: A self-improving robotic agent

RoboCat: A self-improving robotic agent

Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

Private Online Prediction from Experts: Separations and Faster Rates

The Monge Gap: A Regularizer to Learn All Transport Maps

Onboard users to Amazon SageMaker Studio with Active Directory group-specific IAM roles

Solution overview

Set up a Lambda function to create the user profiles

Set up the EventBridge rule for the CloudTrail event

Supporting multiple Studio accounts

Deprovisioning users

Conclusion

About the authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.