Secure multi-account model deployment with Amazon SageMaker: Part 1

Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for machine learning (ML) that lets you build, train, debug, deploy, and monitor your ML models.

Although Studio provides all the tools you need to take your models from experimentation to production, you need a robust and secure model deployment process. This process must fulfill your organization’s operational and security requirements.

Amazon SageMaker and Studio provide a wide range of specialized functionality for building highly secure, scalable, and flexible MLOps platforms to cover your model deployment use cases and requirements. Three SageMaker services, SageMaker Pipelines, SageMaker Projects, and SageMaker Model Registry, build a foundation to implement enterprise-grade secure multi-account model deployment workflow.

In combination with other AWS services, such as Amazon Virtual Private Cloud (Amazon VPC), AWS CloudFormation, and AWS Identity and Access Management (IAM), SageMaker MLOps can deliver solutions for the most demanding security and governance requirements.

Using a multi-account data science environment to meet security, reliability, and operational needs is a good DevOps practice. A multi-account strategy is paramount to achieve strong workload and data isolation, support multiple unrelated teams and projects, ensure fine-grained security and compliance control, facilitate billing, and create cost transparency.

In this two-part post, we offer guidance for using AWS services and SageMaker functionalities, and recommend practices for implementing a production-grade ML platform and secure, automated, multi-account model deployment workflows.

Such ML platforms and workflows can fulfill stringent security requirements, even for regulated industries such as financial services. For example, customers in regulated industries often don’t allow any internet access in ML environments. They often use only VPC endpoints for AWS services. They implement end-to-end data encryption in transit and at rest, and enforce workload isolation for individual teams in a line of business in multi-account organizational structures.

Part 1 of this series focuses on providing a solution architecture overview, in which we explain the security controls employed and how they are implemented. We also look at MLOps automation workflows with SageMaker projects and Pipelines.

In Part 2, we walk through deploying the solution with hands-on SageMaker notebooks.

This is Part 1 in a two-part series on secure multi-account deployment on Amazon SageMaker

Solution overview

The post Multi-account model deployment with Amazon SageMaker Pipelines shows a conceptual setup of a multi-account MLOps environment based on Pipelines and SageMaker projects.

The solution presented in this post is built for an actual use case for an AWS customer in the financial services industry. It focuses on the security, automation, and governance aspects of multi-account ML environments. It provides a fully automated provisioning of Studio into your private VPC, subnets and security groups using CloudFormation templates, and stack sets. Compared to the previous post, this solution implements network traffic and access controls with VPC endpoints, security groups, and fine-grained permissions with designated IAM roles. To reflect the real-life ML environment requirements, the solution enforces end-to-end data encryption at rest and in transit.

The following diagram shows the overview of the solution architecture and the deployed components.

Let’s look at each group of components in more detail.

Component 1: AWS Service Catalog

The end-to-end deployment of the data science environment is delivered as an AWS Service Catalog self-provisioned product. One of the main advantages of using AWS Service Catalog for self-provisioning is that authorized users can configure and deploy available products and AWS resources on their own, without needing full privileges or access to AWS services. The deployment of all AWS Service Catalog products happens under a specified service role with the defined set of permissions, which are unrelated to the user’s permissions.

Component 2: Studio domain

The Data Science Environment product in the AWS Service Catalog creates a Studio domain. A Studio domain consists of a list of authorized users, configuration settings, and an Amazon Elastic File System (Amazon EFS) volume. The Amazon EFS volume contains data for the users, including notebooks, resources, and artifacts.

Components 3 and 4: SageMaker MLOps project templates

The solution delivers the customized versions of SageMaker MLOps project templates. Each MLOps template provides an automated model building and deployment pipeline using continuous integration and continuous delivery (CI/CD). The delivered templates are configured for the secure multi-account model deployment and are fully integrated in the provisioned data science environment. The project templates are provisioned in Studio via AWS Service Catalog. The templates include the seed code repository with Studio notebooks, which implements a secure setup of SageMaker workloads such as processing, training jobs, and pipelines.

Components 5 and 6: CI/CD workflows

The MLOps projects implement CI/CD using Pipelines and AWS CodePipeline, AWS CodeCommit, and AWS CodeBuild. SageMaker project templates also support a CI/CD workflow using Jenkins and GitHub as the source repository.

Pipelines is responsible for orchestrating workflows across each step of the ML process and task automation, including data loading, data transformation, training, tuning and validation, and deployment. Each model is tracked via SageMaker Model Registry, which stores the model metadata, such as training and validation metrics and data lineage, and retains model versions and the approval status of the model.

CodePipeline deploys the model to the designated target accounts with staging and production environments. The necessary resources are pre-created by CloudFormation templates during infrastructure creation.

This solution supports secure multi-account model deployment using AWS Organizations or via simple target account lists.

Component 7: Secure infrastructure

The Studio domain is deployed in a dedicated VPC. Each elastic network interface used by a SageMaker domain or workload is created within a private dedicated subnet and attached to the specified security groups. The data science environment VPC can be configured with internet access via an optional NAT gateway. You can also run this VPC in internet-free mode without any inbound or outbound internet access.

All access to the AWS public services is routed via AWS PrivateLink. Traffic between your VPC and the AWS services doesn’t leave the Amazon network and isn’t exposed to the public internet.

Component 8: Data security

All data in the data science environment, which is stored in Amazon Simple Storage Service (Amazon S3) buckets and Amazon Elastic Block Store (Amazon EBS) and EFS volumes, is encrypted at rest using customer managed CMKs. All data transfer between platform components, API calls, and inter-container communication is protected using the Transport Layer Security (TLS 1.2) protocol.

Data access from the Studio notebooks or any SageMaker workload to the environment’s S3 buckets is governed by the combination of the S3 bucket and user policies and S3 VPC endpoint policy.

Multi-account structure

With the goal of illustrating best practices, this solution implements the following three account groups:

  • Development – This account is used by data scientists and ML engineers to perform experimentation and development. Data science tools such as Studio are used in the development account. S3 buckets with data and models, code repositories, and CI/CD pipelines are hosted in this account. Models are built, trained, validated, and registered in the model repository in this account.
  • Testing/staging/UAT – Validated and approved models are first deployed to the staging account, where the automated unit and integration tests are run. Data scientists and ML engineers have read-only access to this account.
  • Production – Fully tested and approved models from the staging accounts are deployed to the production account for both online and batch inference.

Depending on your specific security and governance requirements and your development organization, for the production setup, we recommend using two additional account groups:

  • Shared services – This account hosts common resources like team code repositories, CI/CD pipelines for MLOps workflows, Docker image repositories, service catalog portfolios, model registries, and library package repositories.
  • Data management – A dedicated AWS account to store and manage all data for the ML process. We recommend implementing strong data security and governance practices using AWS Data Lake and AWS Lake Formation.

Each of these account groups can have multiple AWS accounts and environments for developing and testing services and storing different types of data.

Environment layers

In the following sections, we look at the whole data science environment in terms of layers:

  • Network and security infrastructure
  • IAM roles and cross-account permission setup
  • Application stack consisting of Studio and SageMaker MLOps projects

In Part 2 of this post, you deploy the solution into your AWS account for further experimentation.

Secure infrastructure

We use AWS foundational services such as VPC, security groups, subnets, and NAT gateways to create the secure infrastructure for the data science environment. The following diagram shows the deployment architecture for the solution.

VPC, subnets, routes, and internet access

Our Studio domain is deployed into a dedicated data science VPC using VPC Only mode (Step 1 in the preceding architecture). In this mode, you use your own control flow for the internet traffic, like a NAT gateway or AWS Network Firewall. You can also create an internet-free VPC for your highly secure workloads. Any SageMaker workload launched in the VPC creates an elastic network interface in the specified subnet. You can apply all available layers of security controls—security groups, network ACLs, VPC endpoints, AWS PrivateLink, or Network Firewall endpoints—to the internal network and internet traffic to exercise fine-grained control of network access in Studio. For a detailed description of network configurations and security controls, refer to Securing Amazon SageMaker Studio connectivity using a private VPC. If you must control ingress and egress network traffic or apply any filtering rules, you can use Network Firewall as described in Securing Amazon SageMaker Studio internet traffic using AWS Network Firewall.

All SageMaker workloads, like Studio notebooks, processing or training jobs, and inference endpoints, are placed in the private subnets within the dedicated security group (2). This security group doesn’t allow any ingress from any network interface outside the group except for intra-group communications.

VPC endpoints

All access to Amazon S3 is routed via the gateway-type S3 VPC endpoint (3). You control access to the resources behind a VPC endpoint with a VPC endpoint policy. The combination of the VPC endpoint policy and the S3 bucket policy ensures that only specified buckets can be accessed, and these buckets can be accessed only via the designated VPC endpoints. The solution provisions two buckets: Data and Models. You can extend the CloudFormation templates to accommodate your data storage requirements, create additional S3 buckets, or tighten the data access permissions.

Studio and Studio notebooks communicate with various AWS services, such as the SageMaker backend and APIs, Amazon SageMaker Runtime, AWS Security Token Service (AWS STS), Amazon CloudWatch, AWS Key Management Service (AWS KMS), and others.

The solution uses a private connection over interface-type VPC endpoints (4) to access these AWS services. All VPC endpoints are placed in the dedicated security group to control the inbound and outbound network access. You can find a list with the recommended VPC endpoints to be set up for Studio in the following AWS technical guide.

IAM roles and preventive security controls

The solution uses IAM to set up personas and service execution roles (5). You can assign fine-grained permissions policies on the least privilege principle to various SageMaker execution roles, used to run different workloads, such as processing or training jobs, pipelines, or inference. You can implement preventive security controls using SageMaker-specific IAM condition keys. For example, the solution enforces usage of VPC isolation with private subnets and usage of the security groups for SageMaker notebook instances, processing, training, and tuning jobs, as well as for models for the SageMaker execution role:

{
    "Action": [
        "sagemaker:CreateNotebookInstance",
        "sagemaker:CreateHyperParameterTuningJob",
        "sagemaker:CreateProcessingJob",
        "sagemaker:CreateTrainingJob",
        "sagemaker:CreateModel"
    ],
    "Resource": "*",
    "Effect": "Deny",
    "Condition": {
        "Null": {
            "sagemaker:VpcSubnets": "true",
	    "sagemaker:VpcSecurityGroupIds": "true"
        }
    }
}

For a detailed discussion of the security controls and best practices, refer to Building secure machine learning environments with Amazon SageMaker.

Cross-account permission and infrastructure setup

When using a multi-account setup for your data science platform, you must focus on setting up and configuring IAM roles, resource policies, and cross-account trust and permissions polices with special attention to the following topics:

  • How do you set up access to the resources in one account from authorized and authenticated roles and users from another accounts?
  • What roles in one (target) account must be assumed by a role in another (source) account to perform a specific action in the target account?
  • Does the assumed role in the target account have a trust policy for a role in the source account, and does the role in the source account have iam:AssumeRole permission in its permissions policy for the principal in the target account? For more information, see How to use trust policies with IAM roles.
  • Do your AWS CloudFormation deployment roles have iam:PassRole permission for the execution roles they assign to the created resources?
  • How do you configure access control and resource isolation for teams or groups within Studio? For an overview and recipes for the implementation, see Configuring Amazon SageMaker Studio for teams and groups with complete resource isolation.

The solution implements the following IAM roles in its multi-account setup, as shown in the diagram.

User persona IAM roles and various execution roles are created in the development account as we run Studio and perform development work there. We must create the following IAM roles in the staging and production accounts:

  • Stack set execution roles – Used to deploy various resources into target accounts during the initial environment provision and for multi-account CI/CD MLOps workflows
  • Model execution roles – Assumed by SageMaker to access model artifacts and the Docker image for deployment on ML compute instances (SageMaker inference)

These roles are assumed by the roles in the development account.

Configure permissions for multi-account model deployment

In this section, we look closer at the permission setup for multi-account model deployment.

First, we must understand how the multi-account CI/CD model pipeline deploys the model to SageMaker endpoints in the target accounts. The following diagram shows the model deployment process.

After model training and validation, the model is registered in the model registry. The model registry stores the model metadata, and all model artifacts are stored in an S3 bucket (Step 1 in the preceding diagram). The CI/CD pipeline uses CloudFormation stack sets (2) to deploy the model in the target accounts. The CloudFormation service assumes the role StackSetExecutionRole (3) in the target account to perform the deployment. SageMaker also assumes the role ModelExecutionRole (4) to access the model metadata and download the model artifacts from the S3 bucket. The StackSetExecutionRole role must have iam:PassRole permission (5) for ModelExecutionRole to be able to pass the role successfully at stack provisioning time. Finally, the model is deployed to a SageMaker endpoint (6).

For a successful deployment, ModelExecutionRole needs access to the model, which is saved in an S3 bucket, and to the corresponding AWS KMS encryption keys in the development account, because the data in the S3 bucket is encrypted.

Both the S3 bucket and AWS KMS key resource policies have an explicit deny statement if any access request doesn’t arrive via a designated VPC endpoint (following is AWS KMS key policy example):

        - Sid: DenyNoVPC
            Effect: Deny
            Principal: '*'
            Action:
              - kms:Encrypt
              - kms:Decrypt
              - kms:ReEncrypt*
              - kms:GenerateDataKey*
              - kms:DescribeKey
            Resource: '*'
            Condition:
              StringNotEquals:
                'aws:sourceVpce': !Ref VPCEndpointKMSId

To access the S3 bucket and AWS KMS key with ModelExecutionRole, the following conditions must be met:

  • ModelExecutionRole must have permissions to access the S3 bucket and AWS KMS key in the development account
  • Both S3 bucket and AWS KMS key policies must allow cross-account access from ModelExecutionRole in the corresponding target account
  • The S3 bucket and AWS KMS key must be accessed only via a designated VPC endpoint in the target account
  • The VPC endpoint ID must be explicitly allowed in both S3 bucket and AWS KMS key policies in the Condition statement

The following diagram shows the infrastructure and IAM configuration for a development, staging, and production account that fulfills these requirements.

All access to the model artifacts is made via the S3 VPC endpoint (Step 1 in the preceding architecture). This VPC endpoint allows access to the model and data in your S3 buckets. The bucket policy (2) for the bucket where the models are stored grants access to the ModelExecutionRole principals (5) in each of the target accounts:

"Sid": "AllowCrossAccount",
"Effect": "Allow",
"Principal": {
    "AWS": [
            "arn:aws:iam::<staging-account>:role/SageMakerModelExecutionRole",
            "arn:aws:iam::<prod-account>:role/SageMakerModelExecutionRole",
            "arn:aws:iam::<dev-account>:root"
        ]
}

We apply the same setup for the data encryption key (3), whose policy (4) grants access to the principals in the target accounts.

SageMaker model-hosting endpoints are placed in the VPC (6) in each of the target accounts. Any access to S3 buckets and AWS KMS keys is made via the corresponding VPC endpoints. The IDs of these VPC endpoints are added to the Condition statement of the bucket and the AWS KMS key’s resource policies:

"Sid": "DenyNoVPC",
"Effect": "Deny",
"Principal": "*",
"Action": [
    "s3:GetObject",
    "s3:PutObject",
    "s3:ListBucket",
    "s3:GetBucketAcl",
    "s3:GetObjectAcl",
    "s3:PutBucketAcl",
    "s3:PutObjectAcl"
    ],
    "Resource": [
        "arn:aws:s3:::sm-mlops-dev-us-east-1-models/*",
        "arn:aws:s3:::sm-mlops-dev-us-east-1-models"
    ],
    "Condition": {
         "StringNotEquals": {
              "aws:sourceVpce": [
                   "vpce-0b82e29a828790da2",
                   "vpce-07ef65869ca950e14",
                   "vpce-03d9ed0a1ba396ff5"
                    ]
         }
    }

SageMaker MLOps projects: Automation pipelines

This solution delivers two MLOps projects as SageMaker project templates:

  • Model build, train, and validate pipeline
  • Multi-account model deploy pipeline

These projects are fully functional examples that are integrated with the solution infrastructure and multi-layer security controls such as VPC, subnets, security groups, AWS account boundaries, and the dedicated IAM execution roles.

You can find a detailed description of the SageMaker MLOps projects in Building, automating, managing, and scaling ML workflows using Amazon SageMaker Pipelines.

MLOps project template to build, train, validate model

This project is based on the SageMaker project template but has been adapted for this particular solution infrastructure and security controls. The following diagram shows the functional setup of the CI/CD pipeline.

The project creates the following resources comprising the MLOps pipeline:

  1. An MLOps template, made available through SageMaker projects and provided via an AWS Service Catalog portfolio.
  2. A CodePipeline pipeline with two stages: Source to get the source code of the ML pipeline, and Build to build and run the pipeline.
  3. A pipeline to implement a repeatable DAG workflow with individual steps for processing, training, validation, and model registration.
  4. A seed code repository in CodeCommit.

The seed code repository contains code to create a multi-step model building pipeline that includes data processing, model training, model evaluation, and conditional model registration (depending on model accuracy) steps. The pipeline implementation in the pipeline.py file trains a linear regression model using the XGBoost algorithm on the well-known UCI Abalone dataset. This repository also includes a build specification file, used by CodePipeline and CodeBuild to run the pipeline automatically.

MLOps project template for multi-account model deployment

This project is based on the SageMaker MLOps template for model deployment, but implements secure multi-account deployment from SageMaker Model Registry to SageMaker hosted endpoints for real-time inference in the staging and production accounts.

The following diagram shows the functional components of the project.

The components are as follows:

  1. The MLOps project template, which is deployable as a SageMaker project in Studio.
  2. A CodeCommit repository with seed code.
  3. The model deployment multi-stage CI/CD CodePipeline pipeline.
  4. A staging AWS account or accounts where the model is deployed and tested.
  5. A production AWS account or accounts where the model is deployed for production serving.
  6. SageMaker endpoints with the approved model hosted in your private VPC.

You can use the delivered seed code to implement your own customized model deployment pipelines with additional tests or approval steps.

Multi-account ML development best practices

In addition to the already discussed MLOps approaches, security controls, and infrastructure setup, the following resources provide a detailed description and overview of the ML development and deployment best practices:

Conclusion

In this post, we presented the main building blocks and patterns for implementing a multi-account, secure, and governed ML environment. In Part 2 of this series, you deploy the solution from the source code GitHub repository into your account and experiment with the hands-on SageMaker notebooks.


About the Author

Yevgeniy Ilyin is a Solutions Architect at AWS. He has over 20 years of experience working at all levels of software development and solutions architecture and has used programming languages from COBOL and Assembler to .NET, Java, and Python. He develops and codes cloud native solutions with a focus on big data, analytics, and data engineering.

Read More

Optimize personalized recommendations for a business metric of your choice with Amazon Personalize

Amazon Personalize now enables you to optimize personalized recommendations for a business metric of your choice, in addition to improving relevance of recommendations for your users. You can define a business metric such as revenue, profit margin, video watch time, or any other numerical attribute of your item catalog to optimize your recommendations. Amazon Personalize automatically learns what is relevant to your users, considers the business metric you’ve defined, and recommends the products or content to your users that benefit your overall business goals. Configuring an additional objective is easy. You select any numerical column in your catalog when creating a new solution in Amazon Personalize via the AWS Management Console or the API, and you’re ready to go.

Amazon Personalize enables you to easily add real-time personalized recommendations to your applications without requiring any ML expertise. With Amazon Personalize, you pay for what you use, with no minimum fees or upfront commitments. You can get started with a simple three-step process, which takes only a few clicks on the console or a few simple API calls. First, point Amazon Personalize to your user data, catalog data, and activity stream of views, clicks, purchases, and so on, in Amazon Simple Storage Service (Amazon S3) or upload using an API call. Second, either via the console or an API call, train a custom, private recommendation model for your data (CreateSolution). Third, retrieve personalized recommendations for any user by creating a campaign and using the GetRecommendations API.

The rest of this post walks you through the suggested best practices for generating recommendations for your business in greater detail.

Streaming movie service use case

In this post, we propose a fictitious streaming movie service, and as part of the service we provide movie recommendations using movie reviews from the MovieLens database. We assume the streaming service’s agreement with content providers requires royalties every time a movie is viewed. For our use case, we assume movies that have royalties that range from $0.00 to $0.10 per title. All things being equal, the streaming service wants to provide recommendations for titles that the subscriber will enjoy, but minimize costs by recommending titles with lower royalty fees.

It’s important to understand that a trade-off is made when including a business objective in recommendations. Placing too much weight on the objective can lead to a loss of opportunities with customers as the recommendations presented become less relevant to user interests. If the objective weight doesn’t impart enough impact on recommendations, the recommendations will still be relevant but may not drive the business outcomes you aim to achieve. By testing the models in real-world environments, you can collect data on the impact the objective has on your results and balance the relevance of the recommendations with your business objective.

Movie dataset

The items dataset from MovieLens has a structure as follows.

ITEM_ID TITLE ROYALTY GENRE
1 Toy Story (1995) 0.01 ANIMATION|CHILDRENS|COMEDY
2 GoldenEye (1995) 0.02 ACTION|ADVENTURE|THRILLER
3 Four Rooms (1995) 0.03 THRILLER
4 Get Shorty (1995) 0.04 ACTION|COMEDY|DRAMA
5 Copycat (1995) 0.05 CRIME|DRAMA|THRILLER

Amazon Personalize objective optimization requires a numerical field to be defined in the item metadata, which is used when considering your business objective. Because Amazon Personalize optimizes for the largest value in the business metric column, simply passing in the royalty amount results in the recommendations driving customers to those movies with the highest royalties. To minimize royalties, we multiply the royalty field by -1, and capture how much the streaming service will spend in royalties to stream the movie.

ITEM_ID TITLE ROYALTY GENRE
1 Toy Story (1995) -0.01 ANIMATION|CHILDRENS|COMEDY
2 GoldenEye (1995) -0.02 ACTION|ADVENTURE|THRILLER
3 Four Rooms (1995) -0.03 THRILLER
4 Get Shorty (1995) -0.04 ACTION|COMEDY|DRAMA
5 Copycat (1995) -0.05 CRIME|DRAMA|THRILLER

In this example, the royalty value ranges from -0.12 to 0. The objective’s value can be an integer or a floating point, and the lowest value is adjusted to zero internally by the service when creating a solution regardless of whether the lowest value is positive or negative. The highest value is adjusted to 1, and other values are interpolated between 0–1, preserving the relative difference between all data points.

For movie recommendations, we use the following schema for the items dataset:

{
    "type": "record",
    "name": "Items",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "ROYALTY",
            "type": "float"
        },
        {
            "name": "GENRE",
            "type": [
                "null",
                "string"
              ],
            "categorical": True
        }
    ],
    "version": "1.0"
}

The items dataset includes the mandatory ITEM_ID field, list of genres, and savings fields.

Comparing three solutions

The following diagram illustrates the architecture we use to test the benefits of objective optimization. In this scenario, we use two buckets – Items contains movie data and Interactions contains positive movie reviews. The data from the buckets is loaded into the Amazon Personalize dataset group. Once loaded, three solutions are driven from the two datasets: one solution with objective sensitivity off, a second solution with objective sensitivity set to low, and the third has the objective sensitivity set to high. Each of these solutions drives a corresponding campaign.

After the datasets are loaded in an Amazon Personalize dataset group, we create three solutions to demonstrate the impact of the varied objective optimizations on recommendations. The optimization objective selected when creating an Amazon Personalize solution and can have a sensitivity level set to one of four values: OFF, LOW, MEDIUM, or HIGH. This provides a setting on how much weight to give to the business objective, and in this post we show the impact that these settings can have on recommendation performance. While developing your own models, you should experiment with the sensitivity setting to evaluate what drives the best results for your recommendations. Because the objective optimization maximizes for the business metric, we must select ROYALTY as the objective optimization column.

The following example Python code creates an Amazon Personalize solution:

create_solution_response = personalize.create_solution(
        name = "solution name",
        datasetGroupArn = dataset_group_arn,
        recipeArn = recipe_arn,
        solutionConfig = {
            "optimizationObjective": {
                "itemAttribute": "ROYALTY",
                "objectiveSensitivity":"HIGH"
            }
        }
    )

After the solution versions have been trained, you can compare the offline metrics by calling the DescribeSolutionVersion API or visiting the Amazon Personalize console for each solution version.

Metric no-optimization low-optimization high-optimization
Average rewards-at-k 0.1491 0.1412 0.1686
coverage 0.1884 0.1711 0.1295
MRR-25 0.0769 0.1116 0.0805
NDCG-10 0.0937 0.1 0.0999
NDCG-25 0.14 0.1599 0.1547
NDCG-5 0.0774 0.0722 0.0698
Precision-10 0.027 0.0292 0.0281
Precision-25 0.0229 0.0256 0.0238
Precision-5 0.0337 0.0315 0.027

In the preceding table, larger numbers are better. For coverage, this is the ratio of items that are present in recommendations compared to the total number of items in the dataset (how many items in your catalog are covered by the recommendation generated). To make sure Amazon Personalize recommends a larger portion of your movie catalog, use a model with a higher coverage score.

The average rewards-at-k metric indicates how the solution version performs in achieving your objective. Amazon Personalize calculates this metric by dividing the total rewards generated by interactions (for example, total revenue from clicks) by the total possible rewards from recommendations. The higher the score, the more gains on average per user you can expect from recommendations.

The mean reciprocal rank (MRR) metric measures the relevance of the highest ranked item in the list, and is important for situations where the user is very likely to select the first item recommended. Normalized discounted cumulative gain at k (NDCG-k) measures the relevance of the highest k items, providing the highest weight to the first k in the list. NDCG is useful for measuring effectiveness when multiple recommendations are presented to users, but highest-rated recommendations are more important than lower-rated recommendations. The Precision-k metric measures the number of relevant recommendations in the top k recommendations.

As the solution weighs the objective higher, metrics tend to show lower relevance for users because the model is selecting recommendations based on user behavior data and the business objective. Amazon Personalize provides the ability to control how much influence the objective imparts on recommendations. If the objective provides too much influence, you can expect it to create a poor customer experience because the recommendations stop being relevant to the user. By running an A/B test, you can collect the data needed to deliver the results that best balance relevance and your business objective.

We can retrieve recommendations from the solution versions by creating an Amazon Personalize campaign for each one. A campaign is a deployed solution version (trained model) with provisioned dedicated capacity for creating real-time recommendations for your users. Because the three campaigns share the same item and interaction data, the only variable in the model is the objective optimization settings. When you compare the recommendations for a randomly selected user, you can see how recommendations can change with varied objective sensitivities.

The following chart shows the results of the three campaigns. The rank indicates the order of relevance that Amazon Personalize has generated for each title for the sample user. The title, year, and royalty amount are listed in each cell. Notice how “The Big Squeeze (1994)” moves to the top of the list from fourth position when objective optimization is turned off. Meanwhile, “The Machine (1994)” drops from first position to fifth position when objective optimization is set to low, and down to 24th position when objective optimization is set to high.

Rank OFF LOW HIGH
1 Machine, The (1994)(0.01) Kazaam (1996)(0.00) Kazaam (1996)(0.00)
2 Last Summer in the Hamptons (1995)(0.01) Machine, The (1994)(0.01) Last Summer in the Hamptons (1995)(0.01)
3 Wedding Bell Blues (1996)(0.02) Last Summer in the Hamptons (1995)(0.01) Big One, The (1997)(0.01)
4 Kazaam (1996)(0.00) Wedding Bell Blues (1996)(0.02) Machine, The (1994)(0.01)
5 Heaven & Earth (1993)(0.01) Gordy (1995)(0.00) Gordy (1995)(0.00)
6 Pushing Hands (1992)(0.03) Venice/Venice (1992)(0.01) Vermont Is For Lovers (1992)(0.00)
7 Big One, The (1997)(0.01) Vermont Is For Lovers (1992)(0.00) Robocop 3 (1993)(0.01)
8 King of New York (1990)(0.01) Robocop 3 (1993)(0.01) Venice/Venice (1992)(0.01)
9 Chairman of the Board (1998)(0.05) Big One, The (1997)(0.01) Etz Hadomim Tafus (Under the Domin Tree) (1994…
10 Bushwhacked (1995)(0.05) Phat Beach (1996)(0.01) Phat Beach (1996)(0.01)
11 Big Squeeze, The (1996)(0.05) Etz Hadomim Tafus (Under the Domin Tree) (1994… Wedding Bell Blues (1996)(0.02)
12 Big Bully (1996)(0.03) Heaven & Earth (1993)(0.01) Truth or Consequences, N.M. (1997)(0.01)
13 Gordy (1995)(0.00) Pushing Hands (1992)(0.03) Surviving the Game (1994)(0.01)
14 Truth or Consequences, N.M. (1997)(0.01) Truth or Consequences, N.M. (1997)(0.01) Niagara, Niagara (1997)(0.00)
15 Venice/Venice (1992)(0.01) King of New York (1990)(0.01) Trial by Jury (1994)(0.01)
16 Invitation, The (Zaproszenie) (1986)(0.10) Big Bully (1996)(0.03) King of New York (1990)(0.01)
17 August (1996)(0.03) Niagara, Niagara (1997)(0.00) Country Life (1994)(0.01)
18 All Things Fair (1996)(0.01) All Things Fair (1996)(0.01) Commandments (1997)(0.00)
19 Etz Hadomim Tafus (Under the Domin Tree) (1994… Surviving the Game (1994)(0.01) Target (1995)(0.01)
20 Target (1995)(0.01) Chairman of the Board (1998)(0.05) Heaven & Earth (1993)(0.01)
21 Careful (1992)(0.10) Bushwhacked (1995)(0.05) Beyond Bedlam (1993)(0.00)
22 Vermont Is For Lovers (1992)(0.00) August (1996)(0.03) Mirage (1995)(0.01)
23 Phat Beach (1996)(0.01) Big Squeeze, The (1996)(0.05) Pushing Hands (1992)(0.03)
24 Johnny 100 Pesos (1993)(0.03) Bloody Child, The (1996)(0.02) You So Crazy (1994)(0.01)
25 Surviving the Game (1994)(0.01) Country Life (1994)(0.01) All Things Fair (1996)(0.01)
TOTAL Royalty TOTAL ROYALTIES: 0.59 TOTAL ROYALTIES: 0.40 TOTAL ROYALTIES: 0.20

The trend of lower royalties as the objective optimization setting is increased from low to high, as you would expect. The sum of all the royalties for the 25 recommended titles also decreased from $0.59 with no objective optimization to $0.20 with objective optimization set to high.

Conclusion

You can use Amazon Personalize to combine user interaction data with a business objective, thereby improving the business outcomes that recommendations deliver for your business. As we’ve shown, objective optimization influenced the recommendations to lower the costs for the movies in our fictitious movie recommendation service. The trade-off between recommendation relevance and the objective is an important consideration, because optimizing for revenue can make your recommendations less relevant for your users. Other examples include steering users to premium content, promoted content, or items with the highest reviews. This additional objective can improve the quality of the recommendations as well as take into account factors you know are important to your business.

The source code for this post is available on GitHub.

To learn more about Amazon Personalize, visit the product page.


About the Authors

Mike Gillespie is a solutions architect at Amazon Web Services. He works with the AWS customers to provide guidance and technical assistance helping them improve the value of their solutions when using AWS. Mike specializes in helping customers with serverless, containerized, and machine learning applications. Outside of work, Mike enjoys being outdoors running and paddling, listening to podcasts, and photography.

 

 

Matt Chwastek is a Senior Product Manager for Amazon Personalize. He focuses on delivering products that make it easier to build and use machine learning solutions. In his spare time, he enjoys reading and photography.

 

 

 

Ge Liu is an Applied Scientist at AWS AI Labs working on developing next generation recommender system for Amazon Personalize. Her research interests include Recommender System, Deep Learning, and Reinforcement Learning.

 

 

 

Abhishek Mangal is a Software Engineer for Amazon Personalize and works on architecting software systems to serve customers at scale. In his spare time, he likes to watch anime and believes ‘One Piece’ is the greatest piece of story-telling in recent history.

Read More

Create Amazon SageMaker projects using third-party source control and Jenkins

Launched at AWS re:Invent 2020, Amazon SageMaker Pipelines is the first purpose-built, easy-to-use continuous integration and continuous delivery (CI/CD) service for machine learning (ML). With Pipelines, you can create, automate, and manage end-to-end ML workflows at scale.

You can integrate Pipelines with existing CI/CD tooling. This includes integration with existing source control systems such as GitHub, GitHub Enterprise, and Bitbucket. This new capability also allows you to utilize existing installations of Jenkins for orchestrating your ML pipelines. Before this new feature, Amazon SageMaker projects and pipelines were optimized for use with AWS Developer Tools including AWS CodePipeline, AWS CodeCommit, and AWS CodeBuild. This new capability allows you to take advantage of Pipelines while still using existing skill sets and tooling when building your ML CI/CD pipelines.

With the newly added MLOps project templates, you can choose between the following options:

  • Model building, training, and deployment using a third-party Git repository and Jenkins
  • Model building, training, and deployment using a third-party Git repository and CodePipeline

The new template options are now available via the SDK or within the Amazon SageMaker Studio IDE, as shown in the following screenshot.

In this post, we walk through an example using GitHub and Jenkins to demonstrate these new capabilities. You can perform equivalent steps using GitHub Enterprise or Bitbucket as your source code repository. The MLOps project template specifically creates a CI/CD pipeline using Jenkins to build a model using a SageMaker pipeline. The resulting trained ML model is deployed from the model registry to staging and production environments.

Prerequisites

The following are prerequisites to completing the steps in this post:

  • Jenkins (we use Jenkins v2.3) installed with administrative privileges.
  • A GitHub user account.
  • Two GitHub repositories initialized with a README. You must create these repositories as a prerequisite because you supply the two repositories as input when creating your SageMaker project. The project templates automatically seed the code that is pushed to these repositories:
    • abalone-model-build – Seeded with your model build code, which includes the code needed for data preparation, model training, model evaluation, and your SageMaker pipeline code.
    • abalone-model-deploy – Seeded with your model deploy code, which includes the code needed to deploy your SageMaker endpoints using AWS CloudFormation.
  • An AWS account and access to services used in this post.

We also assume some familiarity with Jenkins. For general information on Jenkins, we recommend reading the Jenkins Handbook.

Solution overview

In the following sections, we cover the one-time setup tasks and the steps required when building new pipelines using the new SageMaker MLOps project templates to build out the following high-level architecture (click on image to expand).

The model build pipeline is triggered based on changes to the model build GitHub repository based on Jenkins polling the source repository every minute. The model deploy pipeline can be triggered based on changes to the model deploy code in GitHub or when a new model version is approved in the SageMaker Model Registry.

The one-time setup tasks include:

  1. Establish the AWS CodeStar connection from your AWS account to your GitHub user or organization.
  2. Install dependencies on your Jenkins server.
  3. Set up permissions for communication between Jenkins and AWS.
  4. Create an Amazon EventBridge rule and AWS Lambda function that is triggered to run the Jenkins model deploy pipeline when approved models are registered in the model registry.

We then use the new MLOps project template for third-party GitHub and Jenkins to provision and configure the following resources, which are also discussed in more detail later in this post:

  • SageMaker code repositories – Based on the existing GitHub code repository information you provide on input when creating your SageMaker project, a SageMaker code repository association with that same repository is created when you launch the project. This essentially creates an association with a GitHub repository that SageMaker is aware of using the CodeRepository AWS CloudFormation resource type.
  • Model build and deploy seed code triggers –AWS CloudFormation custom resources used by SageMaker projects to seed code in your model build and model deploy code repositories. This seed code includes an example use case, abalone, which is similar to the existing project template, and also the generated code required for building your Jenkins pipeline. When you indicate that you want the repositories seeded, this triggers a Lambda function that seeds your code into the GitHub repository you supply as input.
  • Lambda function – A new Lambda function called sagemaker-p-<hash>-git-seedcodecheckin. This function is triggered by the custom resource in the CloudFormation template. It’s called along with the seed code information (what code needs to be populated), the Git repository information (where it needs to be populated), and the Git AWS CodeStar connection information. This function then triggers the CodeBuild run, which performs the population of the seed code.
  • CodeBuild project – A CodeBuild project using a buildspec.yml file from an Amazon Simple Storage Service (Amazon S3) bucket owned and maintained by SageMaker. This CodeBuild project is responsible for checking in the initial seed code into the repository supplied as input when creating the project.
  • MLOps S3 bucket – An S3 bucket for the MLOps pipeline that is used for inputs and artifacts of your project and pipeline.

All of the provisioning and configuration required to set up the end-to-end CI/CD pipeline using these resources is automatically performed by SageMaker projects.

Now that we’ve covered how the new feature works, let’s walk through the one-time setup tasks followed by using the new templates.

One-time setup tasks

The tasks in this section are required as part of the one-time setup activities that must be performed for each AWS Region where you use the new SageMaker MLOps project templates. The steps to create a GitHub connection and an AWS Identity and Access Management (IAM) user for Jenkins could be incorporated into a CloudFormation template for repeatability. For this post, we explicitly define the steps.

Set up the GitHub connection

In this step, you connect to your GitHub repositories using AWS Developer Tools and, more specifically, AWS CodeStar connections. The SageMaker project uses this connection to connect to your source code repositories.

  1. On the CodePipeline console, under Settings in the navigation pane, choose Connections.
  2. Choose Create connection.
  3. For Select a provider, select GitHub.
  4. For Connection name, enter a name.
  5. Choose Connect to GitHub.
  6. If the AWS Connector GitHub app isn’t previously installed, choose Install new app.

A list of all the GitHub personal accounts and organizations you have access to is displayed.

  1. Choose the account where you want to establish connectivity for use with SageMaker projects and GitHub repositories.
  2. Choose Configure.
  3. You can optionally select specific repositories, but for this post we create a repository in later steps, so we choose All repositories.
  4. Choose Save.

When the app is installed, you’re redirected to the Connect to GitHub page and the installation ID is automatically populated.

  1. Choose Connect.
  2. Add a tag with the key sagemaker and value true to this AWS CodeStar connection.
  3. Copy the connection ARN to save for later.

You use the ARN as a parameter in the project creation step.

Install Jenkins software dependencies

In this step, you ensure that several software dependencies are in place on the Jenkins server. If you don’t have an existing Jenkins server or need to create one for testing, you can install Jenkins.

  1. Make sure pip3 is installed.

On Unix or Mac, enter the following code:

sudo yum install python3-pip

On Ubuntu, enter the following code:

sudo apt install python3-pip
  1. Install Git on the Jenkins server if it’s not already installed.
  2. Install the following plugins on your Jenkins server:
    1. Job DSL
    2. Git
    3. Pipeline
    4. Pipeline: AWS Steps
    5. CloudBees AWS Credentials for the Jenkins plugin

Create a Jenkins user on IAM

In this step, you create an IAM user and permissions policy that allows for programmatic access to Amazon S3, SageMaker, and AWS CloudFormation. This IAM user is used by your Jenkins server to access the AWS resources needed to configure the integration with SageMaker projects and your Jenkins server. After this user is created, you configure the same on the Jenkins server using the IAM user credentials.

  1. On the IAM console, choose Policies in the navigation pane.
  2. Choose Create policy.
  3. On the JSON tab, enter the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::sagemaker-*"
]
},
{
"Effect": "Allow",
"Action": [
"iam:PassRole"
],
"Resource": [
"arn:aws:iam::*:role/service-role/AmazonSageMakerServiceCatalogProductsUseRole"
]
},
{
"Effect": "Allow",
"Action": [
"sagemaker:CreatePipeline",
"sagemaker:DescribePipeline",
"sagemaker:DescribePipelineExecution",
"sagemaker:ListPipelineExecutionSteps",
"sagemaker:StartPipelineExecution",
"sagemaker:UpdatePipeline",
"sagemaker:ListModelPackages",
"sagemaker:ListTags",
"sagemaker:AddTags",
"sagemaker:DeleteTags",
"sagemaker:CreateModel",
"sagemaker:CreateEndpointConfig",
"sagemaker:CreateEndpoint",
"sagemaker:DeleteModel",
"sagemaker:DeleteEndpointConfig",
"sagemaker:DeleteEndpoint",
"sagemaker:DescribeEndpoint",
"sagemaker:DescribeModel",
"sagemaker:DescribeEndpointConfig",
"sagemaker:UpdateEndpoint"
],
"Resource": "arn:aws:sagemaker:${AWS::Region}:${AWS::AccountId}:*"
},
{
"Effect": "Allow",
"Action": [
"cloudformation:CreateStack",
"cloudformation:DescribeStacks",
"cloudformation:UpdateStack",
"cloudformation:DeleteStack"
],
"Resource": "arn:aws:cloudformation:*:*:stack/sagemaker-*"
}
]
}
  1. Choose Next: Tags.
  2. Choose Next: Review.
  3. Under Review policy, name your policy JenkinsExecutionPolicy.
  4. Choose Create policy.

We now need to create a user that the policy is attached to.

  1. In the navigation pane, choose Users.
  2. Choose Add user.
  3. For User name¸ enter jenkins.
  4. For Access type, select Programmatic access.
  5. Choose Next: Permissions.
  6. Under Set Permissions, select Attach existing policies directly, then search for the policy you created.
  7. Select the policy JenkinsExecutionPolicy.
  8. Choose Next: Tags.
  9. Choose Next: Review.
  10. Choose Create user.

You need the access key ID and secret key for Jenkins to be able to create and run the CI/CD pipeline. The secret key is only displayed one time, so make sure to save both values in a secure place.

Configure the Jenkins IAM user on the Jenkins server

In this step, you configure the AWS credentials for the Jenkins IAM user on your Jenkins server. To do this, you need to sign in to your Jenkins server with administrative credentials. The credentials are stored in the Jenkins Credential Store.

  1. On the Jenkins dashboard, choose Manage Jenkins.
  2. Choose Manage Credentials.
  3. Choose the store Jenkins.
  4. Choose Global credentials.
  5. Choose Add Credentials.
  6. For Kind, select AWS Credentials.
  7. For Scope, select Global.
  8. For Description, enter Jenkins AWS Credentials.
  9. For Access Key ID, enter the access key for the IAM user you created.
  10. For Secret Access Key, enter the secret access key for the IAM user you created.
  11. Choose OK.

Your new credentials are now listed under Global credentials.

Create a model deployment Jenkins pipeline trigger

In this step, you configure the trigger to run your Jenkins model deployment pipeline whenever a new model version gets registered into a model package group in the SageMaker Model Registry. To do this, you create an API token for communication with your Jenkins server. Then you run a CloudFormation template from your AWS account that sets up a new rule in EventBridge to monitor the approval status of a model package registered in the SageMaker Model Registry. We use the model registry to catalog models and metadata about those models, as well as manage the approval status and model deployment pipelines. The CloudFormation template also creates a Lambda function that is the event target when a new model gets registered. This function gets the Jenkins API user token credentials from AWS Secrets Manager and uses that to trigger the pipeline remotely based on the trigger, as shown in the following diagram (click on the image to expand).

Create the Jenkins API token

First, you need to create an API token for the Jenkins user.

  1. Choose your user name on the Jenkins console.
  2. Choose Configure.
  3. Under API Token, choose Add new Token.
  4. Choose Generate.
  5. Copy the generated token value and save it somewhere to use in the next step.

Create the trigger and Lambda function

Next, you create the trigger and Lambda function. To do this, you need the provided CloudFormation template, model_trigger.yml. The template takes three parameters as input:

  • JenkinsUser – Your Jenkins user with administrative privileges (for example, Jenkins-admin)
  • JenkinsAPIToken – The Jenkins API token you created (for example, 11cnnnnnnnnnnnnnn)
  • JenkinsURL– The URL of your Jenkins server (for example, http://ec2-nn-nn-nnn-n.eu-north-1.compute.amazonaws.com)

You can download and launch the CloudFormation template via the AWS CloudFormation Console, the AWS Command Line Interface (AWS CLI), or the SDK, or by simply choosing the following launch button:

This completes the one-time setup required to use the new MLOps SageMaker project templates for each Region. Depending on your organizational structure and roles across the ML development lifecycle, these one-time setup steps may need to be performed by your DevOps, MLOps, or system administrators.

We now move on to the steps for creating SageMaker projects using the new MLOps project template from SageMaker Studio.

Use the new MLOps project template with GitHub and Jenkins

In this section, we cover how to use one of the two new MLOps project templates released that allow you to utilize Jenkins as your orchestrator. First, we create a new SageMaker project using one of the new templates. Then we use the generated Jenkins pipeline code to create the Jenkins pipeline.

Create a new SageMaker project

To create your SageMaker project, complete the following steps:

  1. On the Studio console, choose SageMaker resources.
  2. On the drop-down menu, choose Projects.
  3. Choose Create project.
  4. For SageMaker project templates, choose MLOps template for model building, training, and deployment with third-party Git repositories using Jenkins.
  5. Choose Select project template.

You need to provide several parameters to configure the source code repositories for your model build and model deploy code.

  1. Under ModelBuild CodeRepository Info, provide the following parameters:
    1. For URL, enter the URL of your existing Git repository for the model build code in https:// format.
    2. For Branch, enter the branch to use from your existing Git repository for pipeline activities as well as for seeding code (if that option is enabled).
    3. For Full Repository Name, enter the Git repository name in the format of <username>/<repository name> or <organization>/<repository name>.
    4. For Codestar Connection ARN, enter the ARN of the AWS CodeStar connection created as part of the one-time setup steps.
    5. For Sample Code, choose whether the seed code should be populated in the repository identified.

The seed code includes model build code for the abalone use case that is common to SageMaker projects; however, when this is enabled, a new /jenkins folder with Jenkins pipeline code is also seeded.

It’s recommended to allow SageMaker projects to seed your repositories with the code to ensure proper structure and for automatic generation of the Jenkins DSL pipeline code. If you don’t choose this option, you need to create your own Jenkins DSL pipeline code. You can then modify the seed code specific to your model based on your use case.

  1. Under ModelDeploy CodeRepository Info, provide the following parameters:
    1. For URL, enter the URL of your existing Git repository for the model deploy code in https:// format.
    2. For Branch, enter the branch to use from your existing Git repository for pipeline activities as well as for seeding code (if that option is enabled).
    3. For Full Repository Name, enter the Git repository name in the format of <username>/<repository name> or <organization>/<repository name>.
    4. For Codestar Connection ARN, enter the ARN of the AWS CodeStar connection created as part of the one-time setup steps.
    5. For Sample Code, choose whether the seed code should be populated in the repository identified.

As we mentioned earlier, the seed code includes the model deploy code for the abalone use case that is common to SageMaker projects; however, when this is enabled, a /jenkins folder with Jenkins pipeline code is also seeded.

  1. Choose Create project.

A message appears indicating that SageMaker is provisioning and configuring the resources.

When the project is complete, you receive a successful message, and your project is now listed on the Projects list.

You now have seed code in your abalone-model-build and abalone-model-deploy GitHub repositories. You also have the /jenkins folders containing the Jenkins DSL to create your Jenkins pipeline.

Automatically generated Jenkins pipeline syntax

After you create the SageMaker project with seed code enabled, the code needed to create a Jenkins pipeline is automatically generated. Let’s review the code generated and push to the abalone-model-build and abalone-model-deploy GitHub repositories.

The model build pipeline contains the following:

  • seed_job.groovy – A Jenkins groovy script to create a model build Jenkins pipeline using the pipeline definition from the Jenkinsfile.
  • Jenkinsfile – The Jenkins pipeline definition for model build activities, including the following steps:
    • Checkout SCM – Source code checkout (abalone-model-build).
    • Build and install – Ensure latest version of the AWS CLI is installed.
    • Update and run the SageMaker pipeline – Run the SageMaker pipeline that corresponds to the SageMaker project ID. This pipeline is visible on the Studio console but is being triggered by Jenkins in this case.

The model deploy pipeline contains the following:

  • seed_job.groovy – A Jenkins groovy script to create a model deploy Jenkins pipeline using the pipeline definition from the Jenkinsfile.
  • Jenkinsfile – The Jenkins pipeline definition for model deploy activities, including the following steps:
    • Checkout SCM – Source code checkout (abalone-model-deploy).
    • Install – Ensure the latest version of the AWS CLI is installed.
    • Build – Run a script called build.py from your seeded source code, which fetches the approved model package from the SageMaker Model Registry and generates the CloudFormation templates for creating staging and production SageMaker endpoints.
    • Staging deploy – Launch the CloudFormation template to create a staging SageMaker endpoint.
    • Test staging – Run a script called test.py from your seeded source code. The generated code includes a test to describe the endpoint to ensure it’s showing InService and also includes code blocks to add your own custom testing code:
      def invoke_endpoint(endpoint_name):
      """
      Add custom logic here to invoke the endpoint and validate reponse
      """
      return {"endpoint_name": endpoint_name, "success": True}

    • Manual approval for production – A Jenkins step to enable continuous delivery requiring manual approval being deploying to a production environment.
    • Prod deploy – Launch the CloudFormation template to create a production SageMaker endpoint.

Create a Jenkins model build pipeline

In this step, we create the Jenkins pipeline using the DSL generated in the seed code created through the SageMaker project in the previous step.

  1. On your Jenkins server, choose New Item on the dashboard menu.
  2. For Enter an item name¸ enter CreateJenkinsPipeline.
  3. Choose Freestyle project.
  4. Choose OK.
  5. On the General tab, select This project is parameterized.
  6. On the Add Parameter drop-down menu, choose Credentials Parameter.

You must provide the following information for the AWS credentials that are used by your Jenkins pipeline to integrate with AWS.

  1. For Name, enter AWS_CREDENTIAL.
  2. For Credential type, choose AWS Credentials.
  3. For Default Value, choose the Jenkins AWS credentials that you created during the one-time setup tasks.
  4. On the Source Code Management tab, select Git.
  5. For Repository URL, enter the URL for the GitHub repository containing the model build code (for this post, abalone-model-build).
  6. For Branches to build, make sure to indicate the correct branch.
  7. On the Build Triggers tab, in the Build section, choose Process Job DSLs on the drop-down menu.
  8. For Process Job DSLs, select Look on Filesystem.
  9. For DSL Scripts, enter the value of jenkins/seed_job.groovy.

seed_job.groovy was automatically generated by your SageMaker project and pushed to your GitHub repository when seeding was indicated.

  1. Choose Save.

Next, we want to run our Jenkins job to create the Jenkins pipeline.

  1. Choose Build with Parameters.
  2. Choose Build.

The first run of the pipeline fails with an error that the script is not approved. Jenkins implements security controls to ensure only approved user-provided groovy scripts can be run (for more information, see In-process Script Approval). As a result, we need to approve the script before running the build again.

  1. On the Jenkins dashboard, choose Manage Jenkins.
  2. Choose In-process Script Approval.

You should see a message that a script is pending approval.

  1. Choose Approve.
  2. Repeat the steps to build the pipeline again.

This time, the job should run successfully and create a new modelbuild pipeline.

  1. Choose your new pipeline (sagemaker-jenkings-btd-1-p-<hash>-modelbuild) to view its details.

This is the pipeline generated by the Jenkins DSL code that was seeded in your GitHub repository. This is the actual model building pipeline.

  1. On the Studio UI, return to your project.
  2. Choose the Pipelines tab.

You still have visibility to your model build pipeline, but the orchestration for the CI/CD pipeline steps is performed by Jenkins.

If a data scientist wants to update any of the model build code, they can clone the repository to their Studio environment by choosing clone repo. When new code is committed and pushed to the GitHub repository, the Jenkins model build pipeline is automatically triggered.

Create a Jenkins model deploy pipeline

In this step, we perform the same steps as we did with the model build pipeline to create a model deploy pipeline, using the model deploy GitHub repo.

You can now see a new pipeline called sagemaker-jenkings-btd-1-p-<hash>-modeldeploy. This is the pipeline generated by the Jenkins DSL code that was seeded in your model deploy GitHub repository (abalone-model-deploy).

The first time this pipeline builds, it fails. Similar to the previous steps, you need to approve the script and rebuild the pipeline.

After the two pipelines are created, two additional pipelines appear in Jenkins that are associated with the SageMaker project.

The model deploy pipeline fails because the first time it runs, there are no approved models in the model registry.

When you navigate to the model registry, you can see a model that has been trained and registered by the model build pipeline. You can approve the model by updating its status, which triggers the deploy pipeline.

You can see the deploy pipeline running and the model is deployed to a staging environment.

After the model is deployed to staging, a manual approval option is available to deploy the model into a production environment

On the SageMaker console, the endpoint deployed by Jenkins is also visible.

After you approve the Jenkins pipeline, a model is deployed to a production environment and is visible on the SageMaker console.

Summary

In this post, we walked through one of the new SageMaker MLOps project templates that you can use to build and configure a CI/CD pipeline that takes advantage of SageMaker features for model building, training, and deployment while still using your existing tooling and skillsets. For our use case, we focused on using GitHub and Jenkins, but you can also use GitHub Enterprise or Bitbucket depending on your needs. You can also utilize the other new template to combine your choice of source code repository (GitHub, GitHub Enterprise, or Bitbucket) with CodePipeline. Try it out and let us know if you have any questions in the comments section!


About the Authors

Shelbee Eigenbrode is a Principal AI and Machine Learning Specialist Solutions Architect at Amazon Web Services (AWS). She holds 6 AWS certifications and has been in technology for 23 years spanning multiple industries, technologies, and roles. She is currently focusing on combining her DevOps and ML background to deliver and manage ML workloads at scale. With over 35 patents granted across various technology domains, she has a passion for continuous innovation and using data to drive business outcomes. Shelbee co-founded the Denver chapter of Women in Big Data.

 

Saumitra Vikram is a Software Developer on the Amazon SageMaker team and is based in Chennai, India. Outside of work, he loves spending time running, trekking and motor bike riding through the Himalayas.

 

 

Venkatesh Krishnan is a Principal Product Manager – Technical for Amazon SageMaker in AWS. He is the product owner for a portfolio of services in the MLOps space including SageMaker Pipelines, Model Registry, Projects, and Experiments. Earlier he was the Head of Product, Integrations and the lead product manager for Amazon AppFlow, a new AWS service that he helped build from the ground up. Before joining Amazon in 2018, Venkatesh served in various research, engineering, and product roles at Qualcomm, Inc. He holds a PhD in Electrical and Computer Engineering from Georgia Tech and an MBA from UCLA’s Anderson School of Management.

 

Kirit Thadaka is an ML Solutions Architect working in the SageMaker Service SA team. Prior to joining AWS, Kirit spent time working in early stage AI startups followed by some time in consulting in various roles in AI research, MLOps, and technical leadership.

Read More

Use Block Kit when integrating Amazon Lex bots with Slack

If you’re integrating your Amazon Lex chatbots with Slack, chances are you’ll come across Block Kit. Block Kit is a UI framework for Slack apps. Like response cards, Block Kit can help simplify interactions with your users. It offers flexibility to format your bot messages with blocks, buttons, check boxes, date pickers, time pickers, select menus, and more.

Amazon Lex provides channel integration with messaging platforms such as Slack, Facebook, and Twilio. For instructions on integrating with Slack, see Integrating an Amazon Lex Bot with Slack. You can also update the interactivity and shortcuts feature with the request URL that Amazon Lex generated. If you want to use Block Kit and other Slack native components, you need a custom endpoint for the request URL.

This post describes a solution architecture with a custom endpoint and shows how to use Block Kit with your Amazon Lex bot. It also provides an AWS Serverless Application Model (AWS SAM) template implementing the architecture.

Solution overview

In the proposed architecture, we use Amazon API Gateway for the custom endpoint and an AWS Lambda function to process the events. We also introduce an Amazon Simple Queue Service (Amazon SQS) queue to invoke the Lambda function asynchronously. The rest of the architecture includes an Amazon Lex bot and another Lambda function used for initialization, validation, and fulfillment. We use Python for the provided code examples.

The following diagram illustrates the solution architecture.

Use Slack Block Kit with an Amazon Lex bot to post messages

You can use Block Kit to format messages you configured at build time within the Lambda function associated with an intent. The following example uses blocks to display available flowers to users.

Each time you want to display a message with blocks, the following steps are required:

  1. Build the block. Block Kit Builder helps you visually format your messages.
  2. Check whether the request originated from Slack before you post the block. This allows you to deploy your bots on multiple platforms without major changes.
  3. Use the chat_postMessage operation from the Slack WebClient to post them in Slack. You can use the following operation to post both text and blocks to Slack:
def postInSlack(user_id, message, messageType='Plaintext', bot_token=slacksecret['SLACK_BOT_TOKEN']):
    try:
        # Call the chat.postMessage method using the WebClient
        if (messageType == 'blocks'):
            result = slackClient.chat_postMessage(
            channel=user_id, token=bot_token, blocks=message
        )

        else:
            result = slackClient.chat_postMessage(
            channel=user_id, token=bot_token, text=message
        )

    except SlackApiError as e:
        logger.error(f"Error posting message: {e}")

To illustrate those steps with the OrderFlowers bot, we show you how to use a date picker from Block Kit to re-prompt users for the pick-up date.

  1. First, you build the block in the format Slack expects:
    def get_pickup_date_block():
    	responseBlock = [
    		{
    			"type": "section",
    			"text": {
    			    "type": "mrkdwn",
                    "text": "Pick a date to pick up your flower"
    			},
    			"accessory": {
    				"type": "datepicker",
    				"action_id": "datepicker123",
    				"initial_date": f'{datetime.date.today()}',
    				"placeholder": {
    					"type": "plain_text",
    					"text": "Select a date"
    				}
    			}
    		}
    ]

  2. Then, you modify the validation code hook as follows. This checks if the request originated from Slack using the channel-type request attribute.
    if source == 'DialogCodeHook':
        slots = helper.get_slots(intent_request)
        validation_result = validate_order_flowers(flower_type, date, pickup_time)
        if not validation_result['isValid']:
          		slots[validation_result['violatedSlot']] = None
                
            	#Check if request from slack 
    
                if intent_request['requestAttributes'] and 'x-amz-lex:channel-type' in intent_request['requestAttributes'] and intent_request['requestAttributes']['x-amz-lex:channel-type'] == 'Slack':
                	    blocks = []
                        channel_id = intent_request['userId'].split(':')[2]
    

  3. If the violated slot is PickupDate, you post the block you defined earlier to Slack. Then, you ask Amazon Lex to elicit the slot with the returned validation message:
    if validation_result['violatedslot'] == 'PickupDate':
        blocks = get_pickup_date_block()
                         
    helper.postInSlack (channel_id, blocks, 'blocks')
    return helper.elicit_slot( intent_request['sessionAttributes'], intent_request['currentIntent']['name'], slots, validation_result['violatedSlot'], validation_result['message'])
    

Outside of Slack, the user only receives the validation result message.

In Slack, the user receives both the pick-up date block and the validation result message.

You can use this approach to complement messages that you had configured at build time with Block Kit.

User interactions

Now that you know how to use blocks to post your bot messages, let’s go over how you handle users’ interactions with the blocks.

When a user interacts with an action block element, the following steps take place:

  1. Slack sends an HTTP request to API Gateway.
  2. API Gateway forwards the request to Amazon SQS.
  3. Amazon SQS receives the transformed request as a message, and invokes the Lambda function that processes the request.

The following diagram illustrates the interaction flow.

Let’s take a closer look at what happens at each step.

Slack sends an HTTP request to API Gateway

When a user chooses an action block element, Slack sends an HTTP post with the event details to the endpoint configured as request URL. The endpoint should reply to Slack with an HTTP 2xx response within 3 seconds. If not, Slack resends the same event. We decouple the ingestion and processing of events by using an Amazon SQS queue between API Gateway and the processing Lambda function. The queue allows you to reply to events with HTTP 200, queue them, and asynchronously process them. This prevents unnecessary retry events from flooding the custom endpoint.

API Gateway forwards the request to Amazon SQS

When API Gateway receives an event from Slack, it uses an integration request-mapping template to transform the request to the format Amazon SQS is expecting. Then it forwards the request to Amazon SQS.

Amazon SQS receives and processes the transformed request

When Amazon SQS receives the message, it initiates the process Lambda function and returns the 200 HTTP response to API Gateway that, in turn, returns the HTTP response to Slack.

Process requests

The Lambda function completes the following steps:

  1. Verify that the received request is from Slack.
  2. Forward the text value associated to the event to Amazon Lex.
  3. Post the Amazon Lex response to Slack.

In this section, we discuss each step in more detail.

Verify that the received request is from Slack

Use the signature module from slack_sdk to verify the requests. You can save and retrieve your signing secret from AWS Secrets Manager. For Slack’s recommendation on request verification, see Verifying requests from Slack.

Forward the text value associated to the event to Amazon Lex

If the request is from Slack, the Lambda function extracts the text value associated with the action type. Then it forwards the user input to Amazon Lex. See the following code:

actions = payload["actions"]
team_id = payload["team"]["id"]
user_id = payload["user"]["id"]
action_type = actions[0]["type"]
if action_type == "button":    
       forwardToLex = actions[0]["value"]
elif action_type == 'datepicker':
       forwardToLex = actions[0]['selected_date']
else:
       forwardToLex = "None"
forward_to_Lex(team_id, user_id, forwardToLex)

We use the Amazon Lex client post_text operation to forward the text to Amazon Lex. You can also store and retrieve the bot’s name, bot’s alias, and the channel ID from Secrets Manager. See the following code:

#Post event received from Slack to Lex and post Lex reply to #Slack
def forward_to_Lex(team_id, user_id, forwardToLex):
    response = lexClient.post_text(
    botName=slacksecret['BOT_NAME'],
    botAlias=slacksecret['BOT_ALIAS'],
    userId=slacksecret['LEX_SLACK_CHANNEL_ID']+":"+ team_id+ ":" + user_id,
    inputText=forwardToLex
    ) 

Post the Amazon Lex response to Slack

Finally, we post the message from Amazon Lex to Slack:

postInSlack(user_id, response['message'])

The following screenshot shows the response on Slack.

From the user’s perspectives, the experience is the following:

  1. The bot re-prompts the user for the pick-up date with a date picker.
  2. The user selects a date.
  3. The bot prompts the user for the pick-up time.

The messages that use Block Kit are seamlessly integrated to the original conversation flow with the Amazon Lex bot.

Walkthrough

In this part of the post, we walk through the deployment and configuration of the components you need to use Block Kit. We go over the following steps:

  1. Launch the prerequisite resources.
  2. Update the Slack request URL with the deployed API Gateway endpoint.
  3. Gather information for Secrets Manager.
  4. Populate the secret value.
  5. Update the Lambda function for Amazon Lex fulfillment initialization and validation.
  6. Update the listener Lambda function.
  7. Test the integration.

Prerequisites

For this walkthrough, you need the following:

Integrate Amazon Lex and Slack with a custom request URL

To create the resources, complete the following steps:

  1. Clone the repository https://github.com/aws-samples/amazon-lex-slack-block-kit:
git clone https://github.com/aws-samples/amazon-lex-slack-block-kit.git
  1. Build the application and run the guided deploy command:
cd amazon-lex-slack-block-kit
sam build
sam deploy --guided

These steps deploy an AWS CloudFormation stack that launches the following resources:

  • An API Gateway endpoint integrated with an SQS queue
  • A Lambda function to listen to requests from Slack
  • A Lambda function for Amazon Lex fulfillment, initialization, and validation hooks
  • AWS Identity and Access Management (IAM) roles associated to the API and the Lambda functions
  • A Lambda layer with slack_sdk, urllib3, and common operations used by the two Lambda functions
  • A secret in Secrets Manager with the secret keys our code uses

Update the Slack request URL

To update the Slack request URL, complete the following steps:

  1. On the AWS CloudFormation console, navigate to the stack Outputs tab and copy the ListenSlackApi endpoint URL.
  2. Sign in to the Slack API console.
  3. Choose the app you integrated with Amazon Lex.
  4. Update the Interactivity & Shortcuts feature by replacing the value for Request URL with the ListenSlackApi endpoint URL.
  5. Choose Save Changes.

Gather information for Secrets Manager

To gather information for Secrets Manager, complete the following steps:

  1. On the Slack API console, under Settings, choose Basic Information.
  2. Note down the value for Signing Secret.
  3. Under Features, choose OAuth & Permissions.
  4. Note down the value for Bot User OAuth Token.
  5. On the Amazon Lex console, note the following:
    • Your bot’s name
    • Your bot’s alias
    • The last part of the two callback URLs that Amazon Lex generated when you created your Slack channel (for example, https://channels.lex.us-east-1.amazonaws.com/slack/webhook/value-to-record).

Populate the secret value

To populate the secret value, complete the following steps:

  1. On the Secrets Manager console, from the list of secrets, choose SLACK_LEX_BLOCK_KIT.
  2. Choose Retrieve secret value.
  3. Choose Edit.
  4. Replace the secret values as follows:
    1. SLACK_SIGNING_SECRET – The signing secret from Slack.
    2. SLACK_BOT_TOKEN – The bot user OAuth token from Slack.
    3. BOT_NAME – Your Amazon Lex bot’s name.
    4. BOT_ALIAS – Your Amazon Lex bot’s alias name.
    5. LEX_SLACK_CHANNEL_ID – The value you recorded from the callback URLs.
  5. Choose Save.

Update the Lambda fulfillment function and Lambda initialization and validation for your Amazon Lex bot

If you’re using the OrderFlowers bot, follow the instructions in Step 4: Add the Lambda Function as Code Hook (Console) to add the Lambda function amazon-lex-slack-block-kit-OrderFlowerFunction as code hooks for fulfillment, initialization, and validation.

If you’re not using the OrderFlowers bot, use the Lambda layer slack-lex-block that the stack created if your runtime is Python version 3.6 and later. The layer includes an operation postInSlack to post your blocks:

helper.postInSlack (channel_id, blocks, 'blocks')

You can use Slack Block Kit Builder to build your blocks.

Update the listener Lambda function

If you’re using the OrderFlowers bot, move to the next step to test the integration.

If you’re not using the OrderFlowers bot, update the Lambda function starting with amazon-lex-slack-block-kit-ListenFunction to process the actions your blocks used.

Test the integration

To test the integration, complete the following steps:

  1. Go back to the Slack team where you installed your application.
  2. In the navigation pane, in the Direct Messages section, choose your bot.

If you don’t see your bot, choose the plus icon (+) next to Direct Messages to search for it.

  1. Engage in a conversation with your Slack application.

Your bot now prompts you with the blocks you configured, as shown in the following example conversation.

Clean up

To avoid incurring future charges, delete the CloudFormation stack via the AWS CloudFormation console or the AWS Command Line Interface (AWS CLI):

aws cloudformation delete-stack --stack-name amazon-lex-slack-block-kit

You also need to delete the Amazon Lex bot resources that you created, the Amazon CloudWatch logs, and the Lambda layer that was created by the stack.

Conclusion

In this post, we showed how to use Block Kit to format Amazon Lex messages within Slack. We provided code examples to post blocks to Slack, listen to events from users’ interactions with the blocks’ elements, and process those events. We also walked you through deploying and configuring the necessary components to use Block Kit. Try the code examples and adapt them for your use case as you see fit.


About the Author

Anne Martine Augustin is an Application Consultant for AWS Professional Services based in Houston, TX. She is passionate about helping customers architect and build modern applications that accelerate their business outcomes. In her spare time, Martine enjoys spending time with friends and family, listening to audio books, and trying new foods.

Read More

Patterns for multi-account, hub-and-spoke Amazon SageMaker model registry

Data science workflows have to pass multiple stages as they progress from the experimentation to production pipeline. A common approach involves separate accounts dedicated to different phases of the AI/ML workflow (experimentation, development, and production).

In addition, issues related to data access control may also mandate that workflows for different AI/ML applications be hosted on separate, isolated AWS accounts. Managing these stages and multiple accounts is complex and challenging.

When it comes to model deployment, however, it often makes sense to have a central repository of approved models to keep track of what is being used for production-grade inference. The Amazon SageMaker Model Registry is the natural choice for this kind of inference-oriented metadata store. In this post, we showcase how to set up such a centralized repository.

Overview

The workflow we address here is the one common to many data science projects. A data scientist in a dedicated data science account experiments on models, creates model artifacts on Amazon Simple Storage Service (Amazon S3), keeps track of the association between model artifacts and Amazon Elastic Container Registry (Amazon ECR) images using SageMaker model packages, and groups model versions into model package groups. The following diagram gives an overview of the structure of the SageMaker Model Registry.

A typical scenario has the following components:

  • One or more spoke environments are used for experimenting and for training ML models
  • Segregation between the spoke environments and a centralized environment is needed
  • We want to promote a machine learning (ML) model from the spokes to the centralized environment by creating a model package (version) in the centralized environment, and optionally moving the generated artifact model.tar.gz to an S3 bucket to serve as a centralized model store
  • Tracking and versioning of promoted ML models is done in the centralized environment from which, for example, deployment can be performed

This post illustrates how to build federated, hub-and-spoke model registries, where multiple spoke accounts use the SageMaker Model Registry from a hub account to register their model package groups and versions.

The following diagram illustrates two possible patterns: a push-based approach and a pull-based approach.

In the push-based approach, a user or role from a spoke account assumes a role in the central account. They then register the model packages or versions directly into the central registry. This is the simplest approach, both to set up and operate. However, you must give the spoke accounts write access (through the assumed role) to the central hub, which in some setups may not be possible or desirable.

In the pull-based approach, the spoke account registers model package groups or versions in the local SageMaker Model Registry. Amazon EventBridge notifies the hub account of the modification, which triggers a process that pulls the modification and replicates it to the hub’s registry. In this setup, spoke accounts don’t have any access to the central registry. Instead, the central account has read access to the spoke registries.

In the following sections, we illustrate example configurations for simple, two-account setups:

  • A data science (DS) account used for performing isolated experimentation using AWS services, such as SageMaker, the SageMaker Model Registry, Amazon S3, and Amazon ECR
  • A hub account used for storing the central model registry, and optionally also ML model binaries and Amazon ECR model images.

In real-life scenarios, multiple DS accounts would be associated to a single hub account.

Strictly connected to the operation of a model registry is the topic of model lineage, which is the possibility to trace a deployed model all the way back to the exact experiment and training job or data that generated it. Amazon SageMaker ML Lineage Tracking creates and stores information about the steps of an ML workflow (from data preparation to model deployment) in the accounts where the different steps are originally run. Exporting this information to different accounts is possible as of this writing using dedicated model metadata. Model metadata can be exchanged through different mechanisms (for example by emitting and forwarding a custom EventBridge event, or by writing to an Amazon DynamoDB table). A detailed description of these processes is beyond the scope of this post.

Access to model artifacts, Amazon ECR, and basic model registry permissions

Full cross-account operation of the model registry requires three main components:

  • Access from the hub account to model artifacts on Amazon S3 and to Amazon ECR images (either in the DS accounts or in a centralized Amazon S3 and Amazon ECR location)
  • Same-account operations on the model registry
  • Cross-account operations on the model registry

We can achieve the first component using resource policies. We provide examples of cross-account read-only policies for Amazon S3 and Amazon ECR in this section. In addition to these settings, the principals in the following policies must act using a role where the corresponding actions are allowed. For example, it’s not enough to have a resource policy that allows the DS account to read a bucket. The account must also do so from a role where Amazon S3 reads are allowed. This basic Amazon S3 and Amazon ECR configuration is not detailed here; links to the relevant documentation are provided at the end of this post.

Careful consideration must also be given to the location where model artifacts and Amazon ECR images are stored. If a central location is desired, it seems like a natural choice to let the hub account also serve as an artifact and image store. In this case, as part of the promotion process, model artifacts and Amazon ECR images must be copied from the DS accounts to the hub account. This is a normal copy operation, and can be done using both push-to-hub and pull-from-DS patterns, which aren’t detailed in this post. However, the attached code for the push-based pattern shows a complete example, including the code to handle the Amazon S3 copy of the artifacts. The example assumes that such a central store exists, that it coincides with the hub account, and that the necessary copy operations are in place.

In this context, versioning (of model images and of model artifacts) is also an important building block. It is required to improve the security profile of the setup and make sure that no accidental overwriting or deletion occurs. In real-life scenarios, the operation of the setups described here is fully automated, and steered by CI/CD pipelines that use unique build-ids to generate unique identifiers for all archived resources (unique object keys for Amazon S3, unique image tags for Amazon ECR). An additional level of robustness can be added by activating versioning on the relevant S3 buckets, as detailed in the resources provided at the end of this post.

Amazon S3 bucket policy

The following resource policy allows the DS account to get objects inside a defined S3 bucket in the hub account. As already mentioned, in this scenario, the hub account also serves as a model store, keeping a copy of the model artifacts. The case where the model store is disjointed from the hub account would have a similar configuration: the relevant bucket must allow read operations from the hub and DS accounts.

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Sid":"S3CrossAccountRead",
         "Effect":"Allow",
         "Action":"s3:GetObject",
         "Resource": [
            "arn::s3:::{HUB_BUCKET_NAME}/*model.tar.gz"
         ],
         "Principal":{
            "AWS":[
               "arn:aws:iam::{DS_ACCOUNT_ID}:role/{DS_ACCOUNT_ROLE}"
            ]
         }
      }
   ]
}

Amazon ECR repository policy

The following resource policy allows the DS account to get images from a defined Amazon ECR repository in the hub account, because in this example the hub account also serves as the central Amazon ECR registry. In case a separate central registry is desired, the configuration is similar: the hub or DS account needs to be given read access to the central registry. Optionally, you can also restrict the access to specific resources, such as enforce a specific pattern for tagging cross-account images.

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Sid":"S3CrossAccountRead",
         "Effect":"Allow",
         "Action": [
            "ecr:BatchGetImage",
            "ecr:GetDownloadUrlForLayer"
         ]
         "Principal":{
            "AWS":[
               "arn:aws:iam::{DS_ACCOUNT_ID}:role/{DS_ACCOUNT_ROLE}"
            ]
         }
      }
   ]
}

IAM policy for SageMaker Model Registry

Operations on the model registry within an account are regulated by normal AWS Identity and Access Management (IAM) policies. The following example allows basic actions on the model registry:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "sagemaker:CreateModelPackage*",
                "sagemaker:DescribeModelPackage",
                "sagemaker:DescribeModelPackageGroup",
                "sagemaker:ListModelPackages",
                "sagemaker:ListModelPackageGroups"
            ],
            "Resource": [
                "*"
            ],
            "Effect": "Allow"
        }
    ]
}

We now detail how to configure cross-account operations on the model registry.

SageMaker Model Registry configuration: Push-based approach

The following diagram shows the architecture of the push-based approach.

In this approach, users in the DS account can read from the hub account, thanks to resource-based policies. However, to gain write access to central registry, the DS account must assume a role in the hub account with the appropriate permissions.

The minimal setup of this architecture requires the following:

  • Read access to the model artifacts on Amazon S3 and to the Amazon ECR images, using resource-based policies, as outlined in the previous section.
  • IAM policies in the hub account allowing it to write the objects into the chosen S3 bucket and create model packages into the SageMaker model package groups.
  • An IAM role in the hub account with the previous policies attached with a cross-account AssumeRole rule. The DS account assumes this role to write the model.tar.gz in the S3 bucket and create a model package. For example, this operation could be carried out by an AWS Lambda function.
  • A second IAM role, in the DS account, that can read the model.tar.gz artifact from the S3 bucket, and assume the role in the hub account mentioned above. This role is used for reads from the registry. For example, this could be used as the run role of a Lambda function.

Create a resource policy for model package groups

The following is an example policy to be attached to model package groups in the hub account. It allows read operations on a package group and on all package versions it contains.

{
'Version': '2012-10-17',
'Statement': [
    {
        'Sid': 'AddPermModelPackageGroup',
        'Effect': 'Allow',
        'Principal': {
            'AWS': [
                'arn:aws:iam::{DS_ACCOUNT_ID}:role/service-role/{LAMBDA_ROLE}'
            ]
        },
        'Action': [
            'sagemaker:DescribeModelPackageGroup'
        ],
 'Resource': 'arn:aws:sagemaker:{REGION}:{HUB_ACCOUNT_ID}:model-package-group/{NAME}'
    },
    {
        'Sid': 'AddPermModelPackageVersion',
        'Effect': 'Allow',
        'Principal': {
            'AWS': 'arn:aws:iam::{DS_ACCOUNT_ID}:role/service-role/{LAMBDA_ROLE}'
        },
        'Action': [
                    "sagemaker:DescribeModelPackage",
                    "sagemaker:ListModelPackages",
                  ],
  'Resource': 'arn:aws:sagemaker:{REGION}:{HUB_ACCOUNT_ID}:model-package/{NAME}/*'
        }
    ]
}

You can’t associate this policy with the package group via the AWS Management Console. You need SDK or AWS Command Line Interface (AWS CLI) access. For example, the following code uses Python and Boto3:

sm_client = boto3.client('sagemaker')

sm_client.put_model_package_group_policy(
    ModelPackageGroupName = model_package_group_name, 
    ResourcePolicy = model_pacakge_group_policy)

Cross-account policy for the DS account to the Hub account

This policy allows the users and services in the DS account to assume the relevant role in the hub account. For example, the following policy allows a Lambda execution role in the DS account to assume the role in the hub account:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "sts:AssumeRole"
            ],
            "Resource": [
                "arn:aws:iam::{HUB_ACCOUNT_ID}:role/SagemakerModelRegistryRole"
            ],
            "Effect": "Allow"
        }
    ]
}

Example workflow

Now that all permissions are configured, we can illustrate the workflow using a Lambda function that assumes the hub account role previously defined, copies the artifact model.tar.gz created into the hub account S3 bucket, and creates the model package linked to the previously copied artifact.

In the following code snippets, we illustrate how to create a model package in the target account after assuming the relevant role. The complete code needed for operation (including manipulation of Amazon S3 and Amazon ECR assets) is attached to this post.

Copy the artifact

To maintain a centralized approach in the hub account, the first operation described is copying the artifact in the centralized S3 bucket.

The method requires as input the DS source bucket name, the hub target bucket name, and the path to the model.tar.gz. After you copy the artifact into the target bucket, it returns the new Amazon S3 path that is used from the model package. As discussed earlier, you need to run this code from a role that has read (write) access to the source (destination) Amazon S3 location. You set this up, for example, in the execution role of a Lambda function, whose details are beyond the scope of this document. See the following code:

def copy_artifact(ds_bucket_name, hub_bucket_name, model_path):
    try:

        s3_client = boto3.client("s3")

        source_response = s3_client.get_object(
            Bucket=ds_bucket_name,
            Key=model_path
        )
        
        # HERE we are assuming the role for copying into the target S3 bucket
        s3_client = assume_dev_role_s3()

        s3_client.upload_fileobj(
            source_response["Body"],
            hub_bucket_name,
            model_path
        )

        new_model_path = "s3://{}/{}".format(hub_bucket_name, model_path)

        return new_model_path
    except Exception as e:
        stacktrace = traceback.format_exc()
        LOGGER.error("{}".format(stacktrace))

        raise e

Create a model package

This method registers the model version in a model package group that you already created in the hub account. The method requires as input a Boto3 SageMaker client instantiated after assuming the role in the hub account, the Amazon ECR image URI to use in the model package, the model URL created after copying the artifact in the target S3 bucket, the model package group name used for creating the new model package version, and the approval status to be assigned to the new version created:

def create_model_package(sm_client, 
                         image_uri,
                         model_path, 
                         model_package_group_name, 
                         approval_status):
    try:
        modelpackage_inference_specification = {
            "InferenceSpecification": {
                "Containers": [
                    {
                        "Image": image_uri,
                        "ModelDataUrl": model_path
                    }
                ],
                # use correct types here
                "SupportedContentTypes": ["text/csv"],
                "SupportedResponseMIMETypes": ["text/csv"], 
            }
        }

        create_model_package_input_dict = {
            "ModelPackageGroupName": model_package_group_name,
            "ModelPackageDescription": f"Model for {model_package_group_name}",
            "ModelApprovalStatus": approval_status
        }

        create_model_package_input_dict.update(modelpackage_inference_specification)
        create_mode_package_response = sm_client.create_model_package(
        **create_model_package_input_dict)
        model_package_arn = create_mode_package_response["ModelPackageArn"]

        return model_package_arn
    except Exception as e:
        stacktrace = traceback.format_exc()
        LOGGER.error("{}".format(stacktrace))

        raise e

A Lambda handler orchestrates all the actions needed to operate the central registry. The mandatory parameters in this example are as follows:

  • image_uri – The Amazon ECR image URI used in the model package
  • model_path – The source path of the artifact in the S3 bucket
  • model_package_group_name – The model package group name used for creating the new model package version
  • ds_bucket_name – The name of the source S3 bucket
  • hub_bucket_name – The name of the target S3 bucket
  • approval_status – The status to assign to the model package version

See the following code:

def lambda_handler(event, context):
    
    image_uri = event.get("image_uri", None)
    model_path = event.get("model_path", None)
    model_package_group_name = event.get("model_package_group_name", None)
    ds_bucket_name = event.get("ds_bucket_name", None)
    hub_bucket_name = event.get("hub_bucket_name", None)
    approval_status = event.get("approval_status", None)
    
    # copy the S3 assets from DS to Hub
    model_path = copy_artifact(ds_bucket_name, hub_bucket_name, model_path)
    
    # assume a role in the Hub account, retrieve the sagemaker client
    sm_client = assume_hub_role_sagemaker()
    
    # create the model package in the Hub account
    model_package_arn = create_model_package(sm_client, 
                                            image_uri, 
                                            model_path, 
                                            model_package_group_name, 
                                            approval_status)

    response = {
        "statusCode": "200",
        "model_arn": model_package_arn
     }
     
    return response

SageMaker Model Registry configuration: Pull-based approach

The following diagram illustrates the architecture for the pull-based approach.

This approach is better suited for cases where write access to the account hosting the central registry is restricted. The preceding diagram shows a minimal setup, with a hub and just one spoke.

A typical workflow is as follows:

  1. A data scientist is working on a dedicated account. The local model registry is used to keep track of model packages and deployment.
  2. Each time a model package is created, an event “SageMaker Model Package State Change” is emitted.
  3. The EventBridge rule in the DS account forwards the event to the hub account, where it triggers actions. In this example, a Lambda function with cross-account read access to the DS model registry can retrieve the needed information and copy it to the central registry.

The minimal setup of this architecture requires the following:

  • Model package groups in the DS account need to have a resource policy, allowing read access from the Lambda execution role in the hub account.
  • The EventBridge rule in the DS account must be configured to forward relevant events to the hub account.
  • The hub account must allow the DS EventBridge rule to send events over.
  • Access to the S3 bucket storing the model artifacts, as well as to Amazon ECR for model images, must be granted to a role in the hub account. These configurations follow the lines of what we outlined in the first section, and are not further elaborated on here.

If the hub account is also in charge of deployment in addition to simple bookkeeping, read access to the model artifacts on Amazon S3 and to the model images on Amazon ECR must also be set up. This can be done by either archiving resources to the hub account or with read-only cross-account access, as already outlined earlier in this post.

Create a resource policy for model package groups

The following is an example policy to attach to model package groups in the DS account. It allows read operations on a package group and on all package versions it contains:

{
'Version': '2012-10-17',
'Statement': [
    {
        'Sid': 'AddPermModelPackageGroup',
        'Effect': 'Allow',
        'Principal': {
            'AWS': 'arn:aws:iam::{HUB_ACCOUNT_ID}:role/service-role/{LAMBDA_ROLE}'
        },
        'Action': ['sagemaker:DescribeModelPackageGroup'],
 'Resource': 'arn:aws:sagemaker:{REGION}:{DS_ACCOUNT_ID}:model-package-group/{NAME}'
    },
    {
        'Sid': 'AddPermModelPackageVersion',
        'Effect': 'Allow',
        'Principal': {
            'AWS': 'arn:aws:iam::{HUB_ACCOUNT_ID}:role/service-role/{LAMBDA_ROLE}'
        },
        'Action': [
                    "sagemaker:DescribeModelPackage",
                    "sagemaker:ListModelPackages",
                  ],
  'Resource': 'arn:aws:sagemaker:{REGION}:{DS_ACCOUNT_ID}:model-package/{NAME}/*'
        }
    ]
}

You can’t associate this policy to the package group via the console. The SDK or AWS CLI is required. For example, the following code uses Python and Boto3:

sm_client = boto3.client('sagemaker')

sm_client.put_model_package_group_policy(
    ModelPackageGroupName = model_package_group_name, 
    ResourcePolicy = model_pacakge_group_policy)

Configure an EventBridge rule in the DS account

In the DS account, you must configure a rule for EventBridge:

  1. On the EventBridge console, choose Rules.
  2. Choose the event bus you want to add the rule to (for example, the default bus).
  3. Choose Create rule.
  4. Select Event Pattern, and navigate your way to through the drop-down menus to choose Predefined pattern, AWS, SageMaker¸ and SageMaker Model Package State Change.

You can refine the event pattern as you like. For example, to forward only events related to approved models within a specific package group, use the following code:

{
    "source": ["aws.sagemaker"],
    "detail-type": ["SageMaker Model Package State Change"],
    "detail": {
        "ModelPackageGroupName": ["ExportPackageGroup"],
        "ModelApprovalStatus": ["Approved"],
    }
}
  1. In the Target section, choose Event Bus in another AWS account.
  2. Enter the ARN of the event bus in the hub account that receives the events.
  3. Finish creating the rule.
  4. In the hub account, open the EventBridge console, choose the event bus that receives the events from the DS account, and edit the Permissions field so that it contains the following code:
{

  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "sid1",
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::{DS_ACCOUNT_ID}:root"
    },
    "Action": "events:*",
    "Resource": "arn:aws:events:{REGION}:{HUB_ACCOUNT_ID}:event-bus/{BUS_NAME}"
  }]
}

Configure an EventBridge rule in the hub account

Now events can flow from the DS account to the hub account. You must configure the hub account to properly handle the events:

  1. On the EventBridge console, choose Rules.
  2. Choose Create rule.
  3. Similarly to the previous section, create a rule for the relevant event type.
  4. Connect it to the appropriate target—in this case, a Lambda function.

In the following example code, we process the event, extract the model package ARN, and retrieve its details. The event from EventBridge already contains all the information from the model package in the DS account. In principle, the resource policy for the model package group isn’t even needed when the copy operation is triggered by EventBridge.

import boto3

sm_client = boto3.client('sagemaker')

# this is meant to be triggered by events in the bus

def lambda_handler(event, context):

    # users need to implement the function get_model_details
    # to extract info from the event received from EventBridge
    model_arn, model_spec, model_desc = get_model_details(event)

    target_group_name = 'targetGroupName'

    # copy the model package to the hub registry
    create_model_package_args = {
        'InferenceSpecification': model_spec,
        'ModelApprovalStatus': 'PendingManualApproval',
        'ModelPackageDescription': model_desc,
        'ModelPackageGroupName': target_group_name}

    return sm_client.create_model_package(**create_model_package_args)

Conclusion

SageMaker model registries are a native AWS tool to track model versions and lineage. The implementation overhead is minimal, in particular when compared with a fully custom metadata store, and they integrate with the rest of the tools within SageMaker. As we demonstrated in this post, even in complex multi-account setups with strict segregation between accounts, model registries are a viable solution to track operations of AI/ML workflows.

References

To learn more, refer to the following resources:


About the Authors

Andrea Di Simone is a Data Scientist in the Professional Services team based in Munich, Germany. He helps customers to develop their AI/ML products and workflows, leveraging AWS tools. He enjoys reading, classical music and hiking.

 

 

Bruno Pistone is a Machine Learning Engineer for AWS based in Milan. He works with enterprise customers on helping them to productionize Machine Learning solutions and to follow best practices using AWS AI/ML services. His field of expertise are Machine Learning Industrialization and MLOps. He enjoys spending time with his friends and exploring new places around Milan, as well as traveling to new destinations.

 

Matteo Calabrese is a Data and ML engineer in the Professional Services team based in Milan (Italy).
He works with large enterprises on AI/ML projects, helping them in proposition, deliver, scale, and optimize ML solutions . His goal is shorten their time to value and accelerate business outcomes by providing AWS best practices. In his spare time, he enjoys hiking and traveling.

 

 

Giuseppe Angelo Porcelli is a Principal Machine Learning Specialist Solutions Architect for Amazon Web Services. With several years software engineering an ML background, he works with customers of any size to deeply understand their business and technical needs and design AI and Machine Learning solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. He has worked on projects in different domains, including MLOps, Computer Vision, NLP, and involving a broad set of AWS services. In his free time, Giuseppe enjoys playing football.

 

 

Read More

Deploy multiple serving containers on a single instance using Amazon SageMaker multi-container endpoints

Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning (ML) models built on different frameworks. SageMaker real-time inference endpoints are fully managed and can serve predictions in real time with low latency.

This post introduces SageMaker support for direct multi-container endpoints. This enables you to run up to 15 different ML containers on a single endpoint and invoke them independently, thereby saving up to 90% in costs. These ML containers can be running completely different ML frameworks and algorithms for model serving. In this post, we show how to serve TensorFlow and PyTorch models from the same endpoint by invoking different containers for each request and restricting access to each container.

SageMaker already supports deploying thousands of ML models and serving them using a single container and endpoint with multi-model endpoints. SageMaker also supports deploying multiple models built on different framework containers on a single instance, in a serial implementation fashion using inference pipelines.

Organizations are increasingly taking advantage of ML to solve various business problems and running different ML frameworks and algorithms for each use case. This pattern requires you to manage the challenges around deployment and cost for different serving stacks in production. These challenges become more pronounced when models are accessed infrequently but still require low-latency inference. SageMaker multi-container endpoints enable you to deploy up to 15 containers on a single endpoint and invoke them independently. This option is ideal when you have multiple models running on different serving stacks with similar resource needs, and when individual models don’t have sufficient traffic to utilize the full capacity of the endpoint instances.

Overview of SageMaker multi-container endpoints

SageMaker multi-container endpoints enable several inference containers, built on different serving stacks (such as ML framework, model server, and algorithm), to be run on the same endpoint and invoked independently for cost savings. This can be ideal when you have several different ML models that have different traffic patterns and similar resource needs.

Examples of when to utilize multi-container endpoints include, but are not limited to, the following:

  • Hosting models across different frameworks (such as TensorFlow, PyTorch, and Sklearn) that don’t have sufficient traffic to saturate the full capacity of an instance
  • Hosting models from the same framework with different ML algorithms (such as recommendations, forecasting, or classification) and handler functions
  • Comparisons of similar architectures running on different framework versions (such as TensorFlow 1.x vs. TensorFlow 2.x) for scenarios like A/B testing

Requirements for deploying a multi-container endpoint

To launch a multi-container endpoint, you specify the list of containers along with the trained models that should be deployed on an endpoint. Direct inference mode informs SageMaker that the models are accessed independently. As of this writing, you’re limited to up to 15 containers on a multi-container endpoint and GPU inference is not supported due to resource contention. You can also run containers on multi-container endpoints sequentially as inference pipelines for each inference if you want to make preprocessing or postprocessing requests, or if you want to run a series of ML models in order. This capability is already supported as the default behavior of the multi-container endpoints and is selected by setting the inference mode to Serial.

After the models are trained, either through training on SageMaker or a bring-your-own strategy, you can deploy them on a multi-container endpoint using the SageMaker create_modelcreate_endpoint_config, and create_endpoint APIs. The create_endpoint_config and create_endpoint APIs work exactly the same way as they work for single model or container endpoints. The only change you need to make is in the usage of the create_model API. The following changes are required:

  • Specify a dictionary of container definitions for the Containers argument. This dictionary contains the container definitions of all the containers required to be hosted under the same endpoint. Each container definition must specify a ContainerHostname.
  • Set the Mode parameter of InferenceExecutionConfig to Direct, for direct invocation of each container, or Serial, for using containers in a sequential order (inference pipeline). The default Mode value is Serial.

Solution overview

In this post, we explain the usage of multi-container endpoints with the following steps:

  1. Train a TensorFlow and a PyTorch Model on the MNIST dataset.
  2. Prepare container definitions for TensorFlow and PyTorch serving.
  3. Create a multi-container endpoint.
  4. Invoke each container directly.
  5. Secure access to each container on a multi-container endpoint.
  6. View metrics for a multi-container endpoint

The complete code related to this post is available on the GitHub repo.

Dataset

The MNIST dataset contains images of handwritten digits from 0–9 and is a popular ML problem. The MNIST dataset contains 60,000 training images and 10,000 test images. This solution uses the MNIST dataset to train a TensorFlow and PyTorch model, which can classify a given image content as representing a digit between 0–9. The models give a probability score for each digit category (0–9) and the highest probability score is taken as the output.

Train TensorFlow and PyTorch models on the MNIST dataset

SageMaker provides built-in support for training models using TensorFlow and PyTorch. To learn how to train models on SageMaker, we recommend referring to the SageMaker documentation for training a PyTorch model and training a TensorFlow model, respectively. In this post, we use TensorFlow 2.3.1 and PyTorch 1.8.1 versions to train and host the models.

Prepare container definitions for TensorFlow and PyTorch serving

SageMaker has built-in support for serving these framework models, but under the hood TensorFlow uses TensorFlow Serving and PyTorch uses TorchServe. This requires launching separate containers to serve the two framework models. To use SageMaker pre-built Deep Learning Containers, see Available Deep Learning Containers Images. Alternatively, you can retrieve pre-built URIs through the SageMaker SDK. The following code snippet shows how to build the container definitions for TensorFlow and PyTorch serving containers.

  1. Create a container definition for TensorFlow:
tf_ecr_image_uri = sagemaker.image_uris.retrieve(
    framework="tensorflow",
    region=region,
    version="2.3.1",
    py_version="py37",
    instance_type="ml.c5.4xlarge",
    image_scope="inference",
)

tensorflow_container = {
    "ContainerHostname": "tensorflow-mnist",
    "Image": tf_ecr_image_uri,
    "ModelDataUrl": tf_mnist_model_data,
}

Apart from ContainerHostName, specify the correct serving Image provided by SageMaker and also ModelDataUrl, which is an Amazon Simple Storage Service (Amazon S3) location where the model is present.

  1. Create the container definition for PyTorch:
pt_ecr_image_uri = sagemaker.image_uris.retrieve(
    framework="pytorch",
    region=region,
    version="1.8.1",
    py_version="py36",
    instance_type="ml.c5.4xlarge",
    image_scope="inference",
)

pytorch_container = {
    "ContainerHostname": "pytorch-mnist",
    "Image": pt_ecr_image_uri,
    "ModelDataUrl": pt_updated_model_uri,
    "Environment": {
        "SAGEMAKER_PROGRAM": "inference.py",
        "SAGEMAKER_SUBMIT_DIRECTORY": pt_updated_model_uri,
    },
}

For PyTorch container definition, an additional argument, Environment, is provided. It contains two keys:

  • SAGEMAKER_PROGRAM – The name of the script containing the inference code required by the PyTorch model server
  • SAGEMAKER_SUBMIT_DIRECTORY – The S3 URI of tar.gz containing the model file (model.pth) and the inference script

Create a multi-container endpoint

The next step is to create a multi-container endpoint.

  1. Create a model using the create_model API:
create_model_response = sm_client.create_model(
    ModelName="mnist-multi-container",
    Containers=[pytorch_container, tensorflow_container],
    InferenceExecutionConfig={"Mode": "Direct"},
    ExecutionRoleArn=role,
)

Both the container definitions are specified under the Containers argument. Additionally, the InferenceExecutionConfig mode has been set to Direct.

  1. Create endpoint_configuration using the create_endpoint_config API. It specifies the same ModelName created in the previous step:
endpoint_config = sm_client.create_endpoint_config(
    EndpointConfigName="mnist-multi-container-ep-config",
    ProductionVariants=[
        {
            "VariantName": "prod",
            "ModelName": "mnist-multi-container",
            "InitialInstanceCount": 1,
            "InstanceType": "ml.c5.4xlarge",
        },
    ],
)
  1. Create an endpoint using the create_endpoint API. It contains the same endpoint configuration created in the previous step:
endpoint = sm_client.create_endpoint(
    EndpointName="mnist-multi-container-ep", EndpointConfigName="mnist-multi-container-ep-config"
)

Invoke each container directly

To invoke a multi-container endpoint with direct invocation mode, use invoke_endpoint from the SageMaker Runtime, passing a TargetContainerHostname argument that specifies the same ContainerHostname used while creating the container definition. The SageMaker Runtime InvokeEndpoint request supports X-Amzn-SageMaker-Target-Container-Hostname as a new header that takes the container hostname for invocation.

The following code snippet shows how to invoke the TensorFlow model on a small sample of MNIST data. Note the value of TargetContainerHostname:

tf_result = runtime_sm_client.invoke_endpoint(
    EndpointName="mnist-multi-container-ep",
    ContentType="application/json",
    Accept="application/json",
    TargetContainerHostname="tensorflow-mnist",
    Body=json.dumps({"instances": np.expand_dims(tf_samples, 3).tolist()}),
)

Similarly, to invoke the PyTorch container, change the TargetContainerHostname to pytorch-mnist:

pt_result = runtime_sm_client.invoke_endpoint(
    EndpointName="mnist-multi-container-ep",
    ContentType="application/json",
    Accept="application/json",
    TargetContainerHostname="pytorch-mnist",
    Body=json.dumps({"inputs": np.expand_dims(pt_samples, axis=1).tolist()}),
)

Apart from using different containers, each container invocation can also support a different MIME type.

For each invocation request to a multi-container endpoint set in direct invocation mode, only the container with TargetContainerHostname processes the request. Validation errors are raised if you specify a TargetContainerHostname that doesn’t exist inside the endpoint, or if you failed to specify a TargetContainerHostname parameter when invoking a multi-container endpoint.

Secure multi-container endpoints

For multi-container endpoints using direct invocation mode, multiple containers are co-located in a single instance by sharing memory and storage volume. You can provide users with the right access to the target containers. SageMaker uses AWS Identity and Access Management (IAM) roles to provide IAM identity-based policies that allow or deny actions.

By default, an IAM principal with InvokeEndpoint permissions on a multi-container endpoint using direct invocation mode can invoke any container inside the endpoint with the EndpointName you specify. If you need to restrict InvokeEndpoint access to a limited set of containers inside the endpoint you invoke, you can restrict InvokeEndpoint calls to specific containers by using the sagemaker:TargetContainerHostname IAM condition key, similar to restricting access to models when using multi-model endpoints.

The following policy allows InvokeEndpoint requests only when the value of the TargetContainerHostname field matches one of the specified regular expressions:

{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"sagemaker:InvokeEndpoint"
],
"Effect": "Allow",
"Resource": "arn:aws:sagemaker:region:account-id:endpoint/endpoint_name",
"Condition": {
"StringLike": {
"sagemaker:TargetContainerHostname": ["customIps*", "common*"]
}
}
}
]
}

The following policy denies InvokeEndpont requests when the value of the TargetContainerHostname field matches one of the specified regular expressions of the Deny statement:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "sagemaker:InvokeEndpoint"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:sagemaker:region:account-id:endpoint/endpoint_name",
            "Condition": {
                "StringLike": {
                    "sagemaker:TargetContainerHostname": [""]
                }
            }
        },
        {
            "Action": [
                "sagemaker:InvokeEndpoint"
            ],
            "Effect": "Deny",
            "Resource": "arn:aws:sagemaker:region:account-id:endpoint/endpoint_name",
            "Condition": {
                "StringLike": {
                    "sagemaker:TargetContainerHostname": ["special"]
                }
            }
        }
    ]
}

For information about SageMaker condition keys, see Condition Keys for Amazon SageMaker.

Monitor multi-container endpoints

For multi-container endpoints using direct invocation mode, SageMaker not only provides instance-level metrics as it does with other common endpoints, but also supports per-container metrics.

Per-container metrics for multi-container endpoints with direct invocation mode are located in Amazon CloudWatch metrics and are categorized into two namespaces: AWS/SageMaker and aws/sagemaker/Endpoints. The namespace of AWS/SageMaker includes invocation-related metrics, and the aws/sagemaker/Endpoints namespace includes per-container metrics of memory and CPU utilization.

The following screenshot of the AWS/SageMaker namespace shows per-container latency.

The following screenshot shows the aws/sagemaker/Endpoints namespace, which displays the CPU and memory utilization for each container.

For a full list of metrics, see Monitor Amazon SageMaker with Amazon CloudWatch.

Conclusion

SageMaker multi-container endpoints support deploying up to 15 containers on real-time endpoints and invoking them independently for low-latency inference and cost savings. The models can be completely heterogenous, with their own independent serving stack. You can either invoke these containers sequentially or independently for each request. Securely hosting multiple models, from different frameworks, on a single instance could save you up to 90% in cost.

To learn more, see Deploy multi-container endpoints and try out the example used in this post on the SageMaker GitHub examples repo.


About the Author

Vikesh Pandey is a Machine Learning Specialist Specialist Solutions Architect at AWS, helping customers in the Nordics and wider EMEA region design and build ML solutions. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.

 

 

Sean MorganSean Morgan is an AI/ML Solutions Architect at AWS. He previously worked in the semiconductor industry, using computer vision to improve product yield. He later transitioned to a DoD research lab where he specialized in adversarial ML defense and network security. In his free time, Sean is an active open-source contributor and maintainer, and is the special interest group lead for TensorFlow Addons.

Read More

Machine Learning at the Edge with AWS Outposts and Amazon SageMaker

As customers continue to come up with new use-cases for machine learning, data gravity is as important as ever. Where latency and network connectivity is not an issue, generating data in one location (such as a manufacturing facility) and sending it to the cloud for inference is acceptable for some use-cases. With other critical use-cases, such as fraud detection for financial transactions, product quality in manufacturing, or analyzing video surveillance in real-time, customers are faced with the challenges that come with having to move that data to the cloud first. One of the challenges customers are facing with performing inference in the cloud is the lack of real-time inference and/or security requirements preventing user data to be sent or stored in the cloud.

Tens of thousands of customers use Amazon SageMaker to accelerate their Machine Learning (ML) journey by helping data scientists and developers to prepare, build, train, and deploy machine learning models quickly. Once you’ve built and trained your ML model with SageMaker, you’ll want to deploy it somewhere to start collecting inputs to run through your model (inference). These models can be deployed and run on AWS, but we know that there are use-cases that don’t lend themselves well for running inference in an AWS Region while the inputs come from outside the Region. Cases include a customer’s data center, manufacturing facility, or autonomous vehicles. Predictions must be made in real-time when new data is available. When you want to run inference locally or on an edge device, a gateway, an appliance or on-premises server, you can optimize your ML models for the specific underlying hardware with Amazon SageMaker Neo. It is the easiest way to optimize ML models for edge devices, enabling you to train ML models once in the cloud and run them on any device. To increase efficiency of your edge ML operations, you can use Amazon SageMaker Edge Manager to automate the manual steps to optimize, test, deploy, monitor and maintain your models on fleets of edge devices.

In this blog post, we will talk about the different use-cases for inference at the edge and the way to accomplish it using Amazon SageMaker features and AWS Outposts. Let’s review each, before we dive into ML with AWS Outposts.

Amazon SageMaker – Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.

Amazon SageMaker Edge Manager – Amazon SageMaker Edge Manager provides a software agent that runs on edge devices. The agent comes with a ML model optimized with SageMaker Neo automatically. You don’t need to have Neo runtime installed on your devices in order to take advantage of the model optimizations such as machine learning models performing at up to twice the speed with no loss in accuracy. Other benefits include reduction of hardware resource usage by up to 10x and the ability to run the same ML model on multiple hardware platforms. The agent also collects prediction data and sends a sample of the data to the AWS Region for monitoring, labeling, and retraining so you can keep models accurate over time.

AWS Outposts – AWS Outposts is a fully managed service that offers the same AWS infrastructure, AWS services, APIs, and tools to virtually any data center, co-location space, or on-premises facility for a truly consistent hybrid experience. AWS Outposts is ideal for workloads that require low latency access to on-premises systems, local data processing, data residency, and migration of applications with local system interdependencies.

AWS compute, storage, database, and other services run locally on Outposts. You can access the full range of AWS services available in the Region to build, manage, and scale your on-premises applications using familiar AWS services and tools.

Use cases

Due to low latency needs or large volumes of data, customers need ML inferencing at the edge. Two main use-cases require customers to implement these models for inference at the edge:

Low Latency – In many use-cases, the end user or application must provide inferencing in (near) real-time, requiring the model to be running at the edge. This is a common use case in industries such as Financial Services (risk analysis), Healthcare (medical imaging analysis), Autonomous Vehicles and Manufacturing (shop floor).

Large Volumes of Data – Large volumes of new data being generated at the edge means that inferencing needs to happen closer to where data is being generated. This is a common use case in IoT scenarios, such as in the oil and gas or utilities industries.

Scenario

For this scenario, let’s focus on the low latency use-case. A financial services customer wants to implement fraud detection on all customer transactions. They’ve decided on using Amazon SageMaker to build and train their model in an AWS Region. Given the distance between the data center in which they process transactions and an AWS Region, inference needs to be performed locally, in the same location as the transaction processing. They will use Amazon SageMaker Edge Manager to optimize the trained model to perform inference locally in their data center. The last piece is the compute. The customer is not interested in managing the hardware and their team is already trained in AWS development, operations and management. Given that, the customer chose AWS Outposts as their compute to run locally in their data center.

What does this look like technically? Let’s take a look at an architecture and then talk through the different pieces.

Let’s look at the flow. On the left, training of the model is done in the AWS Region with Amazon SageMaker for training and packaging. On the right, we have a data center, which can be the customer data center or a co-location facility, with AWS Outposts and SageMaker Edge Manager to do the inference.

AWS Region:

  1. Load dataset into Amazon S3, which acts as input for model training.
  2. Use Amazon SageMaker to do processing and training against the dataset.
  3. Store the model artifacts in Amazon S3.
  4. Compile the trained model using Amazon SageMaker Neo.
  5. Package and sign the model with Amazon SageMaker Edge Manger and store in Amazon S3.

AWS Outposts

  1. Launch an Amazon EC2 instance (use the instance family that you’ve optimized the model for) in a subnet that lives on the AWS Outposts.
  2. Install Amazon SageMaker Edge Manager agent onto the instance. Learn more about installing the agent here.
  3. Copy the compiled and signed model from Amazon S3 in the AWS Region to the Amazon EC2 instance on the AWS Outposts. Here’s an example using the AWS CLI to copy a model file (model-ml_m5.tar.gz) from Amazon S3 to the current directory (.):
    aws s3 cp s3://sagemaker-studio-66f50fg898c/fraud-detection-ml/profiler/model-ml_m5.tar.gz .

  4. Financial transactions come into the data center and are routed into the Outposts via the Local Gateway (LGW), to the front-end web server and then to the application server.
  5. The transaction gets stored in the database and at the same time, the application server generates a customer profile based on multiple variables, including transaction history.
  6. The customer profile is sent to Edge Manager agent to run inference against the compiled model using the customer profile as input.
  7. The fraud detection model will generate a score once inference is complete. Based on that score the application server will return one of the following back to the client:
    1. Approve the transaction.
    2. Ask for 2nd factor (two factor authentication).
    3. Deny the transaction.
  8. Additionally, sample input/output data as well as model metrics are captured and sent back to the AWS Region for monitoring with Amazon SageMaker Edge Manager.

AWS Region

  1. Monitor your model with Amazon SageMaker Edge Manager and push metrics in to CloudWatch, which can be used as a feedback loop to improve the model’s performance on an on-going basis.

Considerations for Using Amazon SageMaker Edge Manager with AWS Outposts

Factors to consider when choosing between inference in an AWS Region vs AWS Outposts:

  • Security: Whether other factors are relevant to your use-case or not, security of your data is a priority. If the data you must perform inference on is not permissible to be stored in the cloud, AWS Outposts for inference at the edge will perform inference without sacrificing data security.
  • Real-time processing: Is the data you need to perform inference on time bound? If the value of the data diminishes as more time passes, then sending the data to an AWS Region for inference may not have value.
  • WAN Connectivity: Along with the speed and quality of your connection, the time from where the data is generated and sent to the cloud (latency) is also important. You may only need near real-time inference and cloud-based inference is an option.
    • Do you have enough bandwidth to send the amount of data back to an AWS Region? If not, is the required bandwidth cost effective?
    • Is the quality of network link back to the AWS Region suitable to meet your requirements?
    • What are the consequences of a network outage?

If link quality is an issue, if bandwidth costs are not reasonable, or a network outage is detrimental to your business, then using AWS Outposts for inference at the edge can help to ensure that you’re able to continually perform inference regardless of the state of your WAN connectivity.

As of the writing of this blog post, Amazon SageMaker Edge Manager supports common CPU (ARM, x86), GPU (ARM, Nvidia) based devices with Linux and Windows operating systems. Over time, SageMaker Edge Manager will expand to support more embedded processors and mobile platforms that are also supported by SageMaker Neo.

Additionally, you need to use Amazon SageMaker Neo to compile the model in order to use Amazon SageMaker Edge Manager. Amazon SageMaker Neo converts and compiles your models into an executable that you can then package and deploy on to your edge devices. Once the model package is deployed, Amazon SageMaker Edge Manager agent will unpack the model package and run the model on the device.

Conclusion

Whether it’s providing quality assurance to manufactured goods, real-time monitoring of cameras, wind farms, or medical devices (and countless other use-cases), Amazon SageMaker combined with AWS Outposts provides you with world class machine learning capabilities and inference at the edge.

To learn more about Amazon SageMaker Edge Manager, you can visit the Edge Manager product page and check out this demo. To learn more about AWS Outposts, visit the Outposts product page and check out this introduction.


About the Author

Josh Coen is a Senior Solutions Architect at AWS specializing in AWS Outposts. Prior to joining AWS, Josh was a Cloud Architect at Sirius, a national technology systems integrator, where he helped build and run their AWS practice. Josh has a BS in Information Technology and has been in the IT industry since 2003.

 

 

Mani Khanuja is an Artificial Intelligence and Machine Learning Specialist SA at Amazon Web Services (AWS). She helps customers use machine learning to solve their business challenges using AWS. She spends most of her time diving deep and teaching customers on AI/ML projects related to computer vision, natural language processing, forecasting, ML at the edge, and more. She is passionate about ML at edge and has created her own lab with a self-driving kit and prototype manufacturing production line, where she spends a lot of her free time.

Read More