The collaboration will focus on advancing innovation in core robotics and AI technologies and their applications.Read More
Implement MLOps using AWS pre-trained AI Services with AWS Organizations
The AWS Machine Learning Operations (MLOps) framework is an iterative and repetitive process for evolving AI models over time. Like DevOps, practitioners gain efficiencies promoting their artifacts through various environments (such as quality assurance, integration, and production) for quality control. In parallel, customers rapidly adopt multi-account strategies through AWS Organizations and AWS Control Tower to create secure, isolated environments. This combination can introduce challenges for implementing MLOps with AWS pre-trained AI Services, such as Amazon Rekognition Custom Labels. This post discusses design patterns for reducing that complexity while still maintaining security best practices.
Overview
Customers across every industry vertical recognize the value of operationalizing machine learning (ML) efficiently and reducing the time to deliver business value. Most AWS pre-trained AI Services address this situation through out-of-the-box capabilities for computer vision, translation, and fraud detection, among other common use cases. Many use cases require domain-specific predictions that go beyond generic answers. The AI Services can fine-tune the predictive model results using customer-labeled data for those scenarios.
Over time, the domain-specific vocabulary changes and evolves. For example, suppose a tool manufacturer creates a computer vision model to detect its products in images (such as hammers and screwdrivers). In a future release, the business adds support for wrenches and saws. These new labels necessitate code changes on the manufacturer’s websites and custom applications. Now, there are dependencies that both artifacts must release simultaneously.
The AWS MLOps framework addresses these release challenges through iterative and repetitive processes. Before reaching production end-users, the model artifacts must traverse various quality gates like application code. You typically implement those quality gates using multiple AWS accounts within an AWS organization. This approach gives the flexibility to centrally manage these application domains and enforce guardrails and business requirements. It’s becoming increasingly common to have tens or even hundreds of accounts within your organization. However, you must balance your workload isolation needs against the team size and complexity.
MLOps practitioners have standard procedures for promoting artifacts between accounts (such as QA to production). These patterns are straightforward to implement, relying on copying code and binary resources between Amazon Simple Storage Service (Amazon S3) buckets. However, AWS pre-trained AI Services don’t currently support copying the trained custom model across AWS accounts. Until such a mechanism exists, you need to retrain the models in each AWS account using the same dataset. This approach involves time and cost for retraining the model in a new account. This mechanism can be a viable option for some customers. However in this post, we demonstrate the means to define and evolve these custom models centrally while securely sharing them across an AWS organization’s accounts.
Solution overview
This post discusses design patterns for securely sharing AWS pre-trained AI Service domain-specific models. These services include Amazon Fraud Detector, Amazon Transcribe, and Amazon Rekognition, to name a few. Although these strategies are broadly applicable, we focus on Rekognition Custom Labels as a concrete example. We intentionally avoid diving too deep into Rekognition Custom Labels-specific nuances.
The architecture begins with a configured AWS Control Tower in the management account. AWS Control Tower provides the easiest way to set up and govern a secure, multi-account AWS environment. As shown in the following diagram, we use Account Factory in AWS Control Tower to create five AWS accounts:
- CI/CD account for deployment orchestration (for example, with AWS CodeStar)
- Production account for external end-users (for example, a public website)
- Quality assurance account for internal development teams (such as preproduction)
- ML account for custom models and supporting systems
- AWS Lake House account holding proprietary customer data
This configuration might be too granular or coarse, depending on your regulatory requirements, industry, and size. Refer to Managing the multi-account environment using AWS Organizations and AWS Control Tower for more guidance.
Create the Rekognition Custom Labels model
The first step to creating a Rekognition Custom Labels model is choosing the AWS account to host it. You might begin your ML journey using a single ML account. This approach consolidates any tooling and procedures into one place. However, this centralization can cause bloat in the individual account and lead to monolithic environments. More mature enterprises segment this ML account by team or workload. Regardless of the granularity, the object is the same to define centrally and train models once.
This post demonstrates using a Rekognition Custom Labels model with a single ML account and a separate data lake account (see the following diagram). When the data resides in a different account, you must configure a resource policy to provide cross-account access to the S3 bucket objects. This procedure securely shares the bucket’s contents with the ML account. See the quick start samples for more information on creating an Amazon Rekognition domain-specific model.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AWSRekognitionS3AclBucketRead20191011",
"Effect": "Allow",
"Principal": {
"Service": "rekognition.amazonaws.com"
},
"Action": [
"s3:GetBucketAcl",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::S3:"
},
{
"Sid": "AWSRekognitionS3GetBucket20191011",
"Effect": "Allow",
"Principal": {
"Service": "rekognition.amazonaws.com"
},
"Action": [
"s3:GetObject",
"s3:GetObjectAcl",
"s3:GetObjectVersion",
"s3:GetObjectTagging"
],
"Resource": "arn:aws:s3:::S3:/*"
},
{
"Sid": "AWSRekognitionS3ACLBucketWrite20191011",
"Effect": "Allow",
"Principal": {
"Service": "rekognition.amazonaws.com"
},
"Action": "s3:GetBucketAcl",
"Resource": "arn:aws:s3:::S3:"
},
{
"Sid": "AWSRekognitionS3PutObject20191011",
"Effect": "Allow",
"Principal": {
"Service": "rekognition.amazonaws.com"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::S3:/*",
"Condition": {
"StringEquals": {
"s3:x-amz-acl": "bucket-owner-full-control"
}
}
}
]
}
Enable cross-account access
After you build and deploy the model, the endpoint is only available within the ML account. Do not use a static key to share access. You must delegate access to the production (or QA) account using AWS Identity and Access Management (IAM) roles. To create a cross-account role in the ML account, complete the following steps:
- On the Rekognition Custom Labels console, choose Projects and choose your project name.
- Choose Models and your model name.
- On the Use model tab, scroll down to the Use your model section.
- Copy the model Amazon Resource Name (ARN). It should be formatted as follows:
arn:aws:rekognition:region-name:account-id:project/model-name/version/version-id/timestamp
. - Create a role with
rekognition:DetectCustomLabels
permissions to the model ARN and a trust policy allowingsts:AssumeRole
from the production (or QA) account (for example,arn:aws:iam::PROD_ACCOUNT_ID_HERE:root
). - Optionally, attach additional policies for any workload-specific actions (such as accessing S3 buckets).
- Optionally, configure the condition element to enforce additional delegation requirements.
- Record the new role’s ARN to use in the next section.
Invoke the endpoint
With the security policies in place, it’s time to test the configuration. A simple approach involves creating an Amazon Elastic Compute Cloud (Amazon EC2) instance and using the AWS Command Line Interface (AWS CLI). Invoke the endpoint with the following steps:
- In the production (or QA) account, create a role for Amazon EC2.
- Attach a policy allowing
sts:AssumeRole
to the ML account’s cross-role ARN. - Launch an Amazon Linux 2 instance with the role from the previous step.
- Wait for it to provision, then connect to the Linux instance using SSH.
- Invoke the command
aws iam assume-role
to switch to the cross-account role from the previous section. - Start the model endpoint, if not already running, using the Rekognition console or the start-project-version AWS CLI command.
- Invoke the command aws rekognition detect-custom-labels to test the operation.
You can also perform this test using the AWS SDK and another compute resource (for example, AWS Lambda).
Avoiding the public internet
In the previous section, the detect-custom-labels request uses the virtual private cloud’s (VPC) internet gateway and traverses the public internet. TLS/SSL encryption sufficiently secures the communication channel for many workloads. You can use AWS PrivateLink to enable connections between the VPC and supporting services without requiring an internet gateway, NAT device, VPN connection, transit gateway, or AWS Direct Connect connection. Then, the detect-custom-labels request never leaves the AWS network exposed to the public internet. AWS PrivateLink supports all services used within this post. You can also enforce pre-trained AI Services using private connectivity with IAM in the cross-role policy. This control adds another level of protection that prevents misconfigured clients from using the pre-trained AI Service’s internet-facing endpoint. For additional information, see Using Amazon Rekognition with Amazon VPC endpoints, AWS PrivateLink for Amazon S3, and Using AWS STS interface VPC endpoints.
The following diagram illustrates the VPC endpoint configuration between the production account, ML account, and QA account.
Build a CI/CD pipeline for promoting models
AWS recommends continuously providing more training and test data to custom label the Amazon Rekognition project dataset to improve models. After you add more data to a project, a new model can enhance accuracy or alter labels.
In MLOps, model artifacts must be consistent. To accomplish this with pre-trained AI Services, AWS recommends promoting the model endpoint by updating the code’s reference to the new model version’s ARN. This approach avoids retraining the domain-specific model in each environment (such as QA and production accounts). Your applications can use the new model’s ARN as a runtime variable using AWS Systems Manager within a multi-account or multi-stage environment.
Three granularity levels limit access to the cross-account model, specifically at the account, project, and model version level. Models are idempotent and have a unique ARN that maps to specific point-in-time training: arn:aws:rekognition:account:region:project/project_name/version/name/timestamp
.
The following diagram illustrates the model rotation from QA to production.
In the preceding architecture, the production and QA applications make API calls to use the v2 or v3 model endpoints through their respective VPC endpoints. They receive the ARN from its configuration store (for example, Amazon Systems Manager Parameter Store or AWS AppConfig). This process works with n number of environments, but we demonstrate only using two accounts for simplicity. Optionally, removing the superseded model versions prevents additional consumption of those resources.
The ML account has an IAM role for each environment-specific (such as the Production account) that requires access. The CI/CD pipeline as part of the deployment alters the inline policy of the IAM role to allow for access to the appropriate model.
Consider the scenario of promoting Model-v2 from the QA account to the production account. This process requires the following steps:
- On the Rekognition Custom Labels console, transition the Model-v2 endpoint into a running state.
- Grant the IAM cross-account role in the ML account access to the new version of Model-v2.
Note that the resource element supports wildcards in the ARN.
- Send a test invocation to Model-v2 from the production application using the delegation role.
- Optionally, remove the cross-account role’s access to Model-v1.
- Optionally, repeat steps 2–3 for each additional AWS account.
- Optionally, stop the Model-v1 endpoint to avoid incurring costs.
Global policy propagation from the IAM control plane to the IAM data plane in every Region is an eventually consistent operation. This design can create slight delays for multi-Regional configurations.
Create guardrails through service control policies
Using cross-account roles creates a secure mechanism for sharing pre-trained managed AI resources. But what happens when that role’s policy is too permissive? You can mitigate these risks by using service control policies (SCPs) to set permission guardrails across accounts. Guardrails specify the maximum permissions available for an IAM identity. These capabilities can prevent a model consumer account from, for example, stopping the shared Amazon Rekognition endpoint. After defining appropriate guardrail requirements, organizational units within Organizations allow centrally managing those policies across multiple accounts.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyModifyingRekgnotionProjects",
"Effect": "Deny",
"Action": [
"rekognition:CreateProject*",
"rekognition:DeleteProject*",
"rekognition:StartProject*",
"rekognition:StopProject*",
],
"Resource": [
“arn:aws:rekognition:*:*:project/*
]
}
]
}
You can also configure detective controls to monitor their configuration and make sure it doesn’t drift out of compliance. AWS IAM Access Analyzer supports assessing policies across the organization and reporting unused permissions. Additionally, AWS Config enables assessing, auditing, and evaluating configurations of AWS resources. This capability supports standard security and compliance requirements, such as verifying and remediating the S3 bucket’s encryption settings.
Conclusion
You need out-of-the-box solutions to add ML capabilities like computer vision, translation, and fraud detection. You also need security boundaries that isolate your different environments for quality control, compliance, and regulatory purposes. AWS pre-trained AI services and AWS Control Tower deliver that functionality in a manner that is easily accessible and secure.
AWS pre-trained AI services don’t currently support copying the trained custom model across AWS accounts. Until such a mechanism exists, you need to retrain the models in each AWS account using the same dataset. This post demonstrates an alternative design approach using IAM cross-account policies to share model endpoints while maintaining robust security control. Furthermore, you can stop paying for redundant training jobs! For more information on cross-account policies, see IAM tutorial: Delegate access across AWS accounts using IAM roles.
About the Authors
Nate Bachmeier is an AWS Senior Solutions Architect that nomadically explores New York, one cloud integration at a time. He specializes in migrating and modernizing customers’ workloads. Besides this, Nate is a full-time student and has two kids.
Mario Bourgoin is a Senior Partner Solutions Architect for AWS, an AI/ML specialist, and the global tech lead for MLOps. He works with enterprise customers and partners deploying AI solutions in the cloud. He has more than 30 years experience doing machine learning and AI at startups and in enterprises, starting with creating one of the first commercial machine learning systems for big data. Mario spends the balance of his time playing with his three Belgian Tervurens, cooking dinners for his family, and learning about mathematics and cosmology.
Tim Murphy is a Senior Solutions Architect for AWS, working with enterprise customers in various industries to build business based solutions in the cloud. He has spent the last decade working with startups, non-profits, commercial enterprise, and government agencies, deploying infrastructure at scale. In his spare time when he isn’t tinkering with technology, you’ll most likely find him in far flung areas of the earth hiking mountains, surfing waves, or biking through a new city.
Amazon expands SURE program to boost diversity in STEM education
New programs with Georgia Tech and the University of Southern California are established; existing Columbia University program expands.Read More
Improve high-value research with Hugging Face and Amazon SageMaker asynchronous inference endpoints
Many of our AWS customers provide research, analytics, and business intelligence as a service. This type of research and business intelligence enables their end customers to stay ahead of markets and competitors, identify growth opportunities, and address issues proactively. For example, some of our financial services sector customers do research for equities, hedge funds, and investment management companies to help them understand trends and identify portfolio strategies. In the health industry, an increasingly large portion of health research is now information-based. A great deal of research entails the analysis of data that was initially collected for diagnostic, treatment, or for other research projects, and is now being used for new research purposes. These forms of health research have led to effective primary prevention to avoid new cases, secondary prevention for early detection, and prevention for better disease management. The research outcomes not only improve the quality of life but also help reduce healthcare expenses.
Customers tend to digest the information from public and private sources. They then apply established or custom natural language processing (NLP) models to summarize and identify a trend and generate insights based on this information. The NLP models used for these types of research tasks deal with large models and usually involve long articles to be summarized considering the size of the corpus—and dedicated endpoints, which aren’t cost-optimized at the moment. These applications receive a burst of incoming traffic at different times of the day.
We believe customers would greatly benefit from the ability to scale down to zero and ramp up their inference capability on as needed basis. This optimizes the research cost and still doesn’t compromise on quality of inferences. This post discusses how Hugging Face along with Amazon SageMaker asynchronous inference can help achieve this.
You can build text summarization models with multiple deep-learning frameworks like TensorFlow, PyTorch, and Apache MXNet. These models typically have a large input payload of multiple text documents of varying size. Advanced deep learning models require compute-intensive preprocessing before model inference. Processing times can be as long as a few minutes, which removes the option to run real-time inference by passing payloads over an HTTP API. Instead, you need to process input payloads asynchronously from an object store like Amazon Simple Storage Service (Amazon S3) with automatic queuing and a predefined concurrency threshold. The system should be able to receive status notifications and reduce unnecessary costs by cleaning up resources when the tasks are complete.
SageMaker helps data scientists and developers prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML. SageMaker provides the most advanced open-source model-serving containers for XGBoost (container, SDK), Scikit-Learn (container, SDK), PyTorch (container, SDK), TensorFlow (container, SDK), and Apache MXNet (container, SDK).
- Real-time inference endpoints are suitable for workloads that need to be processed with low latency requirements in the order of ms to seconds.
- Batch transform is ideal for offline predictions on large batches of data.
- Amazon SageMaker Serverless Inference (in preview mode and not recommended for production workloads as of this writing) is a purpose-built inference option that makes it easy for you to deploy and scale ML models. Serverless Inference is ideal for workloads that have idle periods between traffic spurts and can tolerate cold starts.
- Asynchronous Inference endpoints queue incoming requests. They’re ideal for workloads where the request sizes are large (up to 1 GB) and inference processing times are in the order of minutes (up to 15 minutes). Asynchronous inference enables you to save on costs by auto scaling the instance count to zero when there are no requests to process.
Solution overview
In this post, we deploy a PEGASUS model that was pre-trained to do text summarization from Hugging Face to SageMaker hosting services. We use the model as is from Hugging Face for simplicity. However, you can fine-tune the model based on a custom dataset. You can also try out other models available in the Hugging Face Model Hub. We also provision an asynchronous inference endpoint that hosts this model, from which you can get predictions.
The asynchronous inference endpoint’s inference handler expects an article as input payload. The summarized text of the article is the output. The output is stored in the database for analyzing the trends or be fed downstream for further analytics. This downstream analysis derives data insights that help with the research.
We demonstrate how asynchronous inference endpoints enable you to have user-defined concurrency and completion notifications. We configure auto scaling of instances behind the endpoint to scale down to zero when traffic subsides and scale back up as the request queue fills up.
We also use Amazon CloudWatch metrics to monitor the queue size, total processing time, and invocations processed.
In the following diagram, we show the steps involved while performing inference using an asynchronous inference endpoint.
- Our pre-trained PEGASUS ML model is first hosted on the scaling endpoint.
- The user uploads the article to be summarized to an input S3 bucket.
- The asynchronous inference endpoint is invoked using an API.
- After the inference is complete, the result is saved to the output S3 bucket.
- An Amazon Simple Notification Service (Amazon SNS) notification is sent to the user notifying them of the completed success or failure.
Create an asynchronous inference endpoint
We create the asynchronous inference endpoint similar to a real-time hosted endpoint. The steps include creating a SageMaker model, followed by configuring the endpoint and deploying the endpoint. The difference between the two types of endpoints is that the asynchronous inference endpoint configuration contains an AsyncInferenceConfig
section. Here we specify the S3 output path for the results from the endpoint invocation and optionally include SNS topics for notifications on success and failure. We also specify the maximum number of concurrent invocations per instance as determined by the customer. See the following code:
For details on the API to create an endpoint configuration for asynchronous inference, see Create an Asynchronous Inference Endpoint.
Invoke the asynchronous inference endpoint
The following screenshot shows a brief article that we use as our input payload:
The following code uploads the article as an input.json
file to Amazon S3:
We use the Amazon S3 URI to the input payload file to invoke the endpoint. The response object contains the output location in Amazon S3 to retrieve the results after completion:
The following screenshot shows the sample output post summarization:
For details on the API to invoke an asynchronous inference endpoint, see Invoke an Asynchronous Inference Endpoint.
Queue the invocation requests with user-defined concurrency
The asynchronous inference endpoint automatically queues the invocation requests. This is a fully managed queue with various monitoring metrics and doesn’t require any further configuration. It uses the MaxConcurrentInvocationsPerInstance
parameter in the preceding endpoint configuration to process new requests from the queue after previous requests are complete. MaxConcurrentInvocationsPerInstance
is the maximum number of concurrent requests sent by the SageMaker client to the model container. If no value is provided, SageMaker chooses an optimal value for you.
Auto scaling instances within the asynchronous inference endpoint
We set the auto scaling policy with a minimum capacity of zero and a maximum capacity of five instances. Unlike real-time hosted endpoints, asynchronous inference endpoints support scaling down instances to zero by setting the minimum capacity to zero. We use the ApproximateBacklogSizePerInstance
metric for the scaling policy configuration with a target queue backlog of five per instance to scale out further. We set the cooldown period for ScaleInCooldown
to 120 seconds and the ScaleOutCooldown
to 120 seconds. The value for ApproximateBacklogSizePerInstance
is chosen based on the traffic and your sensitivity to scaling speed. The faster you scale in, the less cost you incur, but the more likely you’ll have to scale up again when new requests come in. The slower you scale in, the more cost you incur, but you’re less likely to have a request come in when you’re under-scaled.
For details on the API to auto scale an asynchronous inference endpoint, see the Autoscale an Asynchronous Inference Endpoint.
Configure notifications from the asynchronous inference endpoint
We create two separate SNS topics for success and error notifications for each endpoint invocation result:
Other options for notifications include periodically checking the output of the S3 bucket, or using S3 bucket notifications to initialize an AWS Lambda function on file upload. SNS notifications are included in the endpoint configuration section as previously described.
For details on how to set up notifications from an asynchronous inference endpoint, see Check Prediction Results.
Monitor the asynchronous inference endpoint
We monitor the asynchronous inference endpoint with built-in additional CloudWatch metrics specific to asynchronous inference. For example, we monitor the queue length in each instance with ApproximateBacklogSizePerInstance
and total queue length with ApproximateBacklogSize
.
For a complete list of metrics, refer to Monitoring Asynchronous Inference Endpoints.
We can optimize the endpoint configuration to get the most cost-effective instance with high performance. For example, we can use an instance with Amazon Elastic Inference or AWS Inferentia. We can also gradually increase the concurrency level up to the throughput peak while adjusting other model server and container parameters.
CloudWatch graphs
We simulated a traffic of 10,000 inference requests flowing in over a period to the asynchronous inference endpoint enabled with the auto scaling policy described in the previous section.
The following screenshot shows instance metrics before requests started flowing in. We start with a live endpoint with zero instances running:
The following graph showcases how the BacklogSize
and BacklogSizePerInstance
metrics change as the auto scaling kicks in and the load on the endpoint is shared by multiple instances that were provisioned as part of the auto scaling process.
As shown in the following screenshot, the number of instances increased as the inference count scaled up:
The following screenshot shows how the scaling in brings back the endpoint to the initial state of zero running instances:
Clean up
After all the requests are complete, we can delete the endpoint similar to deleting real-time hosted endpoints. Note that if we set the minimum capacity of asynchronous inference endpoints to zero, there are no instance charges incurred after it scales down to zero.
If you enabled auto scaling for your endpoint, make sure you deregister the endpoint as a scalable target before deleting the endpoint. To do this, run the following code:
Remember to delete your endpoint after use as you will be charged for the instances used in this demo.
You also need to delete the S3 objects and SNS topics. If you created any other AWS resources to consume and action on the SNS notifications, you may also want to delete them.
Conclusion
In this post, we demonstrated how to use the new asynchronous inference capability from SageMaker to process a typical large input payload that is part of a summarization task. For inference, we used a model from Hugging Face and deployed it on asynchronous inference endpoint. We explained the common challenges of burst traffic, high model processing times, and large payloads involved with research analytics. The asynchronous inference endpoint’s inherent ability to manage internal queues, predefined concurrency limits, configure response notifications and to automatically scale down to zero helped us address these challenges. The complete code for this example is available on GitHub.
To get started with SageMaker asynchronous inference, check out Asynchronous Inference.
About the Authors
Dinesh Kumar Subramani is a Senior Solutions Architect with the UKIR SMB team, based in Edinburgh, Scotland. He specializes in artificial intelligence and machine learning. Dinesh enjoys working with customers across industries to help them solve their problems with AWS services. Outside of work, he loves spending time with his family, playing chess and enjoying music across genres.
Raghu Ramesha is an ML Solutions Architect with the Amazon SageMaker Service team. He focuses on helping customers build, deploy and migrate ML production workloads to SageMaker at scale. He specializes in machine learning, AI, and computer vision domains, and holds a master’s degree in Computer Science from UT Dallas. In his free time, he enjoys traveling and photography.
How Margarita Chli is using drones to go where people can’t
When it comes to search-and-rescue missions, dogs are second to none, but an Amazon Research Award recipient says they might have competition from drones.Read More
CAIT announces new fellowships, faculty research awards
The Columbia Center of AI Technology announced its inaugural recipients last year.Read More
Josh Miele: Amazon’s resident MacArthur Fellow
Miele has merged a lifelong passion for science with a mission to make the world more accessible for people with disabilities.Read More
Announcing the launch of the model copy feature for Amazon Comprehend custom models
Technology trends and advancements in digital media in the past decade or so have resulted in the proliferation of text-based data. The potential benefits of mining this text to derive insights, both tactical and strategic, is enormous. This is called natural language processing (NLP). You can use NLP, for example, to analyze your product reviews for customer sentiments, train a custom entity recognizer model to identify product types of interest based on customer comments, or train a custom text classification model to determine the most popular product categories.
Amazon Comprehend is an NLP service with ready-made intelligence to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Amazon Comprehend Custom uses automatic machine learning (Auto ML) to build NLP models on your behalf using your own data. This enables you to detect entities unique to your business or classify text or documents as per your requirements. Additionally, you can automate your entire NLP workflow with easy-to-use APIs.
Today we’re happy to announce the launch of the Amazon Comprehend custom model copy feature, which allows you to automatically copy your Amazon Comprehend custom models from a source account to designated target accounts in the same Region without requiring access to the datasets that the model was trained and evaluated on. Starting today, you can use the AWS Management Console, AWS Command Line Interface (AWS CLI), or the boto3 APIs (Python SDK for AWS) to copy trained custom models from a source account to a designated target account. This new feature is available for both Amazon Comprehend custom classification and custom entity recognition models.
Benefits of the model copy feature
This new feature has the following benefits:
- Multi-account MLOps strategy – Train a model one time and ensure predictable deployment in multiple environments in different accounts.
- Faster deployment – You can quickly copy a trained model between accounts, avoiding the time taken to retrain in every account.
- Protect sensitive datasets – Now you no longer need to share the datasets between different accounts or users. The training data needs to be available only on the account where the training is done. This is very important for certain industries like financial services, where data isolation and sandboxing are essential to meet regulatory requirements.
- Easy collaboration – Partners or vendors can now easily train in Amazon Comprehend Custom and share the models with their customers.
How model copy works
With the new model copy feature, you can copy custom models between AWS accounts in the same Region in a two-stage process. First, a user in one AWS account (account A), shares a custom model that’s in their account. Then, a user in another AWS account (account B) imports the model into their account.
Share a model
To share a custom model in account A, the user attaches an AWS Identity and Access Management (IAM) resource-based policy to a model version. This policy authorizes an entity in account B, such as an IAM user or role, to import the model version into Amazon Comprehend in their AWS account. You can configure a resource-based policy either through the console or with the Amazon Comprehend custom PutResourcePolicy
API.
Import a model
To import the model to account B, the user of this account provides Amazon Comprehend with the necessary details, such as the Amazon Resource Name (ARN) of the model. When they import the model, this user creates a new custom model in their AWS account that replicates the model that they imported. This model is fully trained and ready for inference jobs, such as document classification or named entity recognition. If the model is encrypted with an AWS Key Management Service (AWS KMS) key in the source, then the service role specified while importing the model needs to have access to the KMS key in order to decrypt the model during import. The target account can also specify a KMS key to encrypt the model during import. The importing of the shared model is also available both on the console and as an API.
Solution overview
To demonstrate the functionality of the model copy feature, we show you how to train, share, and import an Amazon Comprehend custom entity recognition model using both the Amazon Comprehend console and the AWS CLI. For this demonstration, we use two different accounts. The steps are applicable to Amazon Comprehend custom classification as well. The required steps are as follows:
- Train an Amazon Comprehend custom entity recognition model in the source account.
- Define the IAM resource policy for the trained model to allow cross-account access.
- Copy the trained model from the source account to the target account.
- Test the copied model through a batch job.
Train an Amazon Comprehend custom entity recognition model in the source account
The first step is to train an Amazon Comprehend custom entity recognition model in the source account. As an input dataset for the training, we use a CSV entity list and training documents for recognizing AWS service offerings in a given document. Make sure that the entity list and training documents are in an Amazon Simple Storage Service (Amazon S3) bucket in the source account. For instructions, see Adding Documents to Amazon S3.
Create an IAM role for Amazon Comprehend and provide required access to the S3 bucket with the training data. Note the role ARN and S3 bucket paths to use in later steps.
Train a model with the AWS CLI
Create an entity recognizer using the following AWS CLI command. Substitute your parameters for the S3 paths, IAM role, and Region. The response returns back the EntityRecognizerArn
.
The status of the training job can be monitored by calling the describe-entity-recognizer and checking the Status in the response.
Train a model via the console
To train a model via the console, complete the following steps:
- On the Amazon Comprehend console, under Customization, create a new custom entity recognizer model.
- Provide a model name and version.
- For Language, choose Engligh.
- For Custom entity type, add
AWS_OFFERING
.
To train a custom entity recognition model, you can choose one of two ways to provide data to Amazon Comprehend: annotations or entity lists. For simplicity, use the entity list method.
- For Data format, select CSV file.
- For Training type, select Using entity list and training docs.
- Provide the S3 location paths for the entity list CSV and training data.
- To grant permissions to Amazon Comprehend to access your S3 bucket, create an IAM service-linked role.
In the Resource-based policy section, you can authorize access for the model version. The accounts you grant access to can import this model into their account. We skip this step for now and add the policy after the model is trained and we’re satisfied with the model performance.
- Choose Create.
This submits your custom entity recognizer, which goes through a number of models, tunes your hyperparameters, and checks for cross-validation to make sure that your model is robust. These are all the same activities that data scientists perform.
Define the IAM resource policy for the trained model to allow cross-account access
When we’re satisfied with the training performance, we can go ahead and share the specific model version by adding a resource policy.
Add a resource-based policy from the AWS CLI
Authorize importing the model from the target account by adding a resource policy on the model, as shown in the following code. The policy can be tightly scoped to a particular model version and target principal. Substitute your trained entity recognizer ARN and target account to provide access to.
Add a resource-based policy via the console
When the training is complete, a custom entity recognition model version is generated. We can choose the trained model and version to view the training details, including performance of the trained model.
To update the policy, complete the following steps:
- On the Tags, VPC & Policy tab, edit the resource-based policy.
- Provide the policy name, Amazon Comprehend service principal (
comprehend.amazonaws.com
), target account ID, and IAM users in the target account authorized to import the model version.
We specify root
as the IAM entity to authorize all users in the target account.
Copy the trained model from the source account to the target account
Now the model is trained and shared from the source account. The authorized target account user can import the model and create a copy of the model in their own account.
To import a model, you need to specify the source model ARN and service role for Amazon Comprehend to perform the copy action on your account. You can specify an optional AWS KMS ID to encrypt the model in your target account.
Import the model through AWS CLI
To import your model with the AWS CLI, enter the following code:
Import the model via the console
To import the model via the console, complete the following steps:
- On the Amazon Comprehend console, under Custom entity recognition, choose Import version.
- For Model version ARN, enter the ARN for the model trained in the source account.
- Enter a model name and version for the target.
- Provide a service account role and choose Confirm to start the model import process.
After the model status changes to Imported
, we can view the model details, including the performance details of the trained model.
Test the copied model through a batch job
We test the copied model in the target account by detecting custom entities with a batch job. To test the model, download the test file and place it in an S3 bucket in your target account. Create an IAM role for Amazon Comprehend and provide the required access to the S3 bucket with the test data. You use the role ARN and S3 bucket paths that you noted earlier.
When the job is complete, you can verify the inference data in the specified output S3 bucket.
Test the model with the AWS CLI
To test the model using the AWS CLI, enter the following code:
Test the model via the console
To test the model via the console, complete the following steps:
- On the Amazon Comprehend console, choose Analysis jobs and choose Create job.
- For Name, enter a name for the job.
- For Analysis type¸ choose Custom entity recognition.
- Choose the model name and version of the imported model.
- Provide the S3 paths for the test file for the job and the output location where Amazon Comprehend stores the result.
- Choose or create an IAM role with permission to access the S3 buckets.
- Choose Create job.
When your analysis job is complete, you have JSON files in your output S3 bucket path, which you can download to verify the results of the entity recognition from the imported model.
Conclusion
In this post, we demonstrated the Amazon Comprehend custom entity model copy feature. This feature gives you the ability to train an Amazon Comprehend custom entity recognition or classification model in one account and then share the model with another account in the same Region. This simplifies the multi-account strategy where the model can be trained one time and shared between accounts within same Region without having to retrain or share the training datasets. This allows for a predicable deployment in every account as part of your MLOps workflow. For more information, see our documentation on Comprehend custom copy, or try out the walkthrough in this post either via the console or using a cloud shell with the AWS CLI.
As of this writing, the model copy feature in Amazon Comprehend is available in the following Regions:
- US East (Ohio)
- US East (N. Virginia)
- US West (Oregon)
- Asia Pacific (Mumbai)
- Asia Pacific (Seoul)
- Asia Pacific (Singapore)
- Asia Pacific (Sydney)
- Asia Pacific (Tokyo)
- EU (Frankfurt)
- EU (Ireland)
- EU (London)
- AWS GovCloud (US-West)
Give the feature a try, and please send us feedback either via the AWS forum for Amazon Comprehend or through your usual AWS support contacts.
About the Authors
Premkumar Rangarajan is an AI/ML specialist solutions architect at Amazon Web Services and has previously authored the book Natural Language Processing with AWS AI services. He has 26 years of experience in the IT industry in a variety of roles, including delivery lead, integration specialist, and enterprise architect. He helps enterprises of all sizes adopt AI and ML to solve for their real-world challenges.
Chethan Krishna is a Senior Partner Solutions Architect in India. He works with Strategic AWS Partners for establishing a robust cloud competency, adopting AWS best practices and solving customer challenges. He is a builder and enjoys experimenting with AI/ML, IoT, and analytics.
Sriharsha M S is an AI/ML specialist solution architect in the Strategic Specialist team at Amazon Web Services. He works with strategic AWS customers who are taking advantage of AI/ML to solve complex business problems. He provides technical guidance and design advice to implement AI/ML applications at scale. His expertise spans application architecture, bigdata, analytics and machine learning.
Balance your data for machine learning with Amazon SageMaker Data Wrangler
Amazon SageMaker Data Wrangler is a new capability of Amazon SageMaker that makes it faster for data scientists and engineers to prepare data for machine learning (ML) applications by using a visual interface. It contains over 300 built-in data transformations so you can quickly normalize, transform, and combine features without having to write any code.
Today, we’re excited to announce new transformations that allow you to balance your datasets easily and effectively for ML model training. We demonstrate how these transformations work in this post.
New balancing operators
The newly announced balancing operators are grouped under the Balance data transform type in the ADD TRANFORM pane.
Currently, the transform operators support only binary classification problems. In binary classification problems, the classifier is tasked with classifying each sample to one of two classes. When the number of samples in the majority class (bigger) is considerably larger than the number of samples in the minority (smaller) class, the dataset is considered imbalanced. This skew is challenging for ML algorithms and classifiers because the training process tends to be biased towards the majority class.
Balancing schemes, which augment the data to be more balanced before training the classifier, were proposed to address this challenge. The simplest balancing methods are either oversampling the minority class by duplicating minority samples or undersampling the majority class by removing majority samples. The idea of adding synthetic minority samples to tabular data was first proposed in the Synthetic Minority Oversampling Technique (SMOTE), where synthetic minority samples are created by interpolating pairs of the original minority points. SMOTE and other balancing schemes were extensively studied empirically and shown to improve prediction performance in various scenarios, as per the publication To SMOTE, or not to SMOTE.
Data Wrangler now supports the following balancing operators as part of the Balance data transform:
- Random oversampler – Randomly duplicate minority samples
- Random undersampler – Randomly remove majority samples
- SMOTE – Generate synthetic minority samples by interpolating real minority samples
Let’s now discuss the different balancing operators in detail.
Random oversample
Random oversampling includes selecting random examples from the minority class with a replacement and supplementing the training data with multiple copies of this instance. Therefore, it’s possible that a single instance may be selected multiple times. With the Random oversample transform type, Data Wrangler automatically oversamples the minority class for you by duplicating the minority samples in your dataset.
Random undersample
Random undersampling is the opposite of random oversampling. This method seeks to randomly select and remove samples from the majority class, consequently reducing the number of examples in the majority class in the transformed data. The Random undersample transform type lets Data Wrangler automatically undersample the majority class for you by removing majority samples in your dataset.
SMOTE
In SMOTE, synthetic minority samples are added to the data to achieve the desired ratio between majority and minority samples. The synthetic samples are generated by interpolation of pairs of the original minority points. The SMOTE transform supports balancing datasets including numeric and non-numeric features. Numeric features are interpolated by weighted average. However, you can’t apply weighted average interpolation to non-numeric features—it’s impossible to average “dog”
and “cat”
for example. Instead, non-numeric features are copied from either original minority sample according to the averaging weight.
For example, consider two samples, A and B:
Assume the samples are interpolated with weights 0.3 for sample A and 0.7 for sample B. Therefore, the numeric fields are averaged with these weights to yield 0.3 and 0.6, respectively. The next field is filled with “dog”
with probability 0.3 and “cow”
with probability 0.7. Similarly, the next one equals “carnivore”
with probability 0.3 and “herbivore”
with probability 0.7. The random copying is done independently for each feature, so sample C below is a possible result:
This example demonstrates how the interpolation process could result in unrealistic synthetic samples, such as an herbivore dog. This is more common with categorical features but can occur in numeric features as well. Even though some synthetic samples may be unrealistic, SMOTE could still improve classification performance.
To heuristically generate more realistic samples, SMOTE interpolates only pairs that are close in features space. Technically, each sample is interpolated only with its k-nearest neighbors, where a common value for k is 5. In our implementation of SMOTE, only the numeric features are used to calculate the distances between points (the distances are used to determine the neighborhood of each sample). It’s common to normalize the numeric features before calculating distances. Note that the numeric features are normalized only for the purpose of calculating the distance; the resulting interpolated features aren’t normalized.
Let’s now balance the Adult Dataset (also known as the Census Income dataset) using the built-in SMOTE transform provided by Data Wrangler. This multivariate dataset includes six numeric features and eight string features. The goal of the dataset is a binary classification task to predict whether the income of an individual exceeds $50,000 per year or not based on census data.
You can also see the distribution of the classes visually by creating a histogram using the histogram analysis type in Data Wrangler. The target distribution is imbalanced and the ratio of records with >50K
to <=50K
is about 1:4.
We can balance this data using the SMOTE operator found under the Balance Data transform in Data Wrangler with the following steps:
- Choose
income
as the target column.
We want the distribution of this column to be more balanced.
- Set the desired ratio to
0.66
.
Therefore, the ratio between the number of minority and majority samples is 2:3 (instead of the raw ratio of 1:4).
- Choose SMOTE as the transform to use.
- Leave the default values for Number of neighbors to average and whether or not to normalize.
- Choose Preview to get a preview of the applied transformation and choose Add to add the transform to your data flow.
Now we can create a new histogram similar to what we did before to see the realigned distribution of the classes. The following figure shows the histogram of the income
column after balancing the dataset. The distribution of samples is now 3:2, as was intended.
We can now export this new balanced data and train a classifier on it, which could yield superior prediction quality.
Conclusion
In this post, we demonstrated how to balance imbalanced binary classification data using Data Wrangler. Data Wrangler offers three balancing operators: random undersampling, random oversampling, and SMOTE to rebalance data in your unbalanced datasets. All three methods offered by Data Wrangler support multi-modal data including numeric and non-numeric features.
As next steps, we recommend you replicate the example in this post in your Data Wrangler data flow to see what we discussed in action. If you’re new to Data Wrangler or SageMaker Studio, refer to Get Started with Data Wrangler. If you have any questions related to this post, please add it in the comment section.
About the Authors
Yotam Elor is a Senior Applied Scientist at Amazon SageMaker. His research interests are in machine learning, particularly for tabular data.
Arunprasath Shankar is an Artificial Intelligence and Machine Learning (AI/ML) Specialist Solutions Architect with AWS, helping global customers scale their AI solutions effectively and efficiently in the cloud. In his spare time, Arun enjoys watching sci-fi movies and listening to classical music.
Launch processing jobs with a few clicks using Amazon SageMaker Data Wrangler
Amazon SageMaker Data Wrangler makes it faster for data scientists and engineers to prepare data for machine learning (ML) applications by using a visual interface. Previously, when you created a Data Wrangler data flow, you could choose different export options to easily integrate that data flow into your data processing pipeline. Data Wrangler offers export options to Amazon Simple Storage Service (Amazon S3), SageMaker Pipelines, and SageMaker Feature Store, or as Python code. The export options create a Jupyter notebook and require you to run the code to start a processing job facilitated by SageMaker Processing.
We’re excited to announce the general release of destination nodes and the Create Job feature in Data Wrangler. This feature gives you the ability to export all the transformations that you made to a dataset to a destination node with just a few clicks. This allows you to create data processing jobs and export to Amazon S3 purely via the visual interface without having to generate, run, or manage Jupyter notebooks, thereby enhancing the low-code experience. To demonstrate this new feature, we use the Titanic dataset and show how to export your transformations to a destination node.
Prerequisites
Before we learn how to use destination nodes with Data Wrangler, you should already understand how to access and get started with Data Wrangler. You also need to know what a data flow means with context to Data Wrangler and how to create one by importing your data from the different data sources Data Wrangler supports.
Solution overview
Consider the following data flow named example-titanic.flow
:
- It imports the Titanic dataset three times. You can see these different imports as separate branches in the data flow.
- For each branch, it applies a set of transformations and visualizations.
- It joins the branches into a single node with all the transformations and visualizations.
With this flow, you might want to process and save parts of your data to a specific branch or location.
In the following steps, we demonstrate how to create destination nodes, export them to Amazon S3, and create and launch a processing job.
Create a destination node
You can use the following procedure to create destination nodes and export them to an S3 bucket:
- Determine what parts of the flow file (transformations) you want to save.
- Choose the plus sign next to the nodes that represent the transformations that you want to export. (If it’s a collapsed node, you must select the options icon (three dots) for the node).
- Hover over Add destination.
- Choose Amazon S3.
- Specify the fields as shown in the following screenshot.
- For the second join node, follow the same steps to add Amazon S3 as a destination and specify the fields.
You can repeat these steps as many times as you need for as many nodes you want in your data flow. Later on, you pick which destination nodes to include in your processing job.
Launch a processing job
Use the following procedure to create a processing job and choose the destination node where you want to export to:
- On the Data Flow tab, choose Create job.
- For Job name¸ enter the name of the export job.
- Select the destination nodes you want to export.
- Optionally, specify the AWS Key Management Service (AWS KMS) key ARN.
The KMS key is a cryptographic key that you can use to protect your data. For more information about KMS keys, see the AWS Key Developer Guide.
- Choose Next, 2. Configure job.
- Optionally, you can configure the job as per your needs by changing the instance type or count, or adding any tags to associate with the job.
- Choose Run to run the job.
A success message appears when the job is successfully created.
View the final data
Finally, you can use the following steps to view the exported data:
A new tab opens showing the processing job on the SageMaker console.
- When the job is complete, review the exported data on the Amazon S3 console.
You should see a new folder with the job name you chose.
FAQ
In this section, we address a few frequently asked questions about this new feature:
- What happened to the Export tab? With this new feature, we removed the Export tab from Data Wrangler. You can still facilitate the export functionality via the Data Wrangler generated Jupyter notebooks from any nodes you created in the data flow with the following steps:
- How many destinations nodes can I include in a job? There is a maximum of 10 destinations per processing job.
- How many destination nodes can I have in a flow file? You can have as many destination nodes as you want.
- Can I add transformations after my destination nodes? No, the idea is destination nodes are terminal nodes that have no further steps afterwards.
- What are the supported sources I can use with destination nodes? As of this writing, we only support Amazon S3 as a destination source. Support for more destination source types will be added in the future. Please reach out if there is a specific one you would like to see.
Summary
In this post, we demonstrated how to use the newly launched destination nodes to create processing jobs and save your transformed datasets directly to Amazon S3 via the Data Wrangler visual interface. With this additional feature, we have enhanced the tool-driven low-code experience of Data Wrangler.
As next steps, we recommend you try the example demonstrated in this post. If you have any questions or want to learn more, see Export or leave a question in the comment section.
About the Authors
Alfonso Austin-Rivera is a Front End Engineer at Amazon SageMaker Data Wrangler. He is passionate about building intuitive user experiences that spark joy. In his spare time, you can find him fighting gravity at a rock-climbing gym or outside flying his drone.
Parsa Shahbodaghi is a Technical Writer in AWS specializing in machine learning and artificial intelligence. He writes the technical documentation for Amazon SageMaker Data Wrangler and Amazon SageMaker Feature Store. In his free time, he enjoys meditating, listening to audiobooks, weightlifting, and watching stand-up comedy. He will never be a stand-up comedian, but at least his mom thinks he’s funny.
Balaji Tummala is a Software Development Engineer at Amazon SageMaker. He helps support Amazon SageMaker Data Wrangler and is passionate about building performant and scalable software. Outside of work, he enjoys reading fiction and playing volleyball.
Arunprasath Shankar is an Artificial Intelligence and Machine Learning (AI/ML) Specialist Solutions Architect with AWS, helping global customers scale their AI solutions effectively and efficiently in the cloud. In his spare time, Arun enjoys watching sci-fi movies and listening to classical music.