Patterns for multi-account, hub-and-spoke Amazon SageMaker model registry

Data science workflows have to pass multiple stages as they progress from the experimentation to production pipeline. A common approach involves separate accounts dedicated to different phases of the AI/ML workflow (experimentation, development, and production).

In addition, issues related to data access control may also mandate that workflows for different AI/ML applications be hosted on separate, isolated AWS accounts. Managing these stages and multiple accounts is complex and challenging.

When it comes to model deployment, however, it often makes sense to have a central repository of approved models to keep track of what is being used for production-grade inference. The Amazon SageMaker Model Registry is the natural choice for this kind of inference-oriented metadata store. In this post, we showcase how to set up such a centralized repository.

Overview

The workflow we address here is the one common to many data science projects. A data scientist in a dedicated data science account experiments on models, creates model artifacts on Amazon Simple Storage Service (Amazon S3), keeps track of the association between model artifacts and Amazon Elastic Container Registry (Amazon ECR) images using SageMaker model packages, and groups model versions into model package groups. The following diagram gives an overview of the structure of the SageMaker Model Registry.

A typical scenario has the following components:

  • One or more spoke environments are used for experimenting and for training ML models
  • Segregation between the spoke environments and a centralized environment is needed
  • We want to promote a machine learning (ML) model from the spokes to the centralized environment by creating a model package (version) in the centralized environment, and optionally moving the generated artifact model.tar.gz to an S3 bucket to serve as a centralized model store
  • Tracking and versioning of promoted ML models is done in the centralized environment from which, for example, deployment can be performed

This post illustrates how to build federated, hub-and-spoke model registries, where multiple spoke accounts use the SageMaker Model Registry from a hub account to register their model package groups and versions.

The following diagram illustrates two possible patterns: a push-based approach and a pull-based approach.

In the push-based approach, a user or role from a spoke account assumes a role in the central account. They then register the model packages or versions directly into the central registry. This is the simplest approach, both to set up and operate. However, you must give the spoke accounts write access (through the assumed role) to the central hub, which in some setups may not be possible or desirable.

In the pull-based approach, the spoke account registers model package groups or versions in the local SageMaker Model Registry. Amazon EventBridge notifies the hub account of the modification, which triggers a process that pulls the modification and replicates it to the hub’s registry. In this setup, spoke accounts don’t have any access to the central registry. Instead, the central account has read access to the spoke registries.

In the following sections, we illustrate example configurations for simple, two-account setups:

  • A data science (DS) account used for performing isolated experimentation using AWS services, such as SageMaker, the SageMaker Model Registry, Amazon S3, and Amazon ECR
  • A hub account used for storing the central model registry, and optionally also ML model binaries and Amazon ECR model images.

In real-life scenarios, multiple DS accounts would be associated to a single hub account.

Strictly connected to the operation of a model registry is the topic of model lineage, which is the possibility to trace a deployed model all the way back to the exact experiment and training job or data that generated it. Amazon SageMaker ML Lineage Tracking creates and stores information about the steps of an ML workflow (from data preparation to model deployment) in the accounts where the different steps are originally run. Exporting this information to different accounts is possible as of this writing using dedicated model metadata. Model metadata can be exchanged through different mechanisms (for example by emitting and forwarding a custom EventBridge event, or by writing to an Amazon DynamoDB table). A detailed description of these processes is beyond the scope of this post.

Access to model artifacts, Amazon ECR, and basic model registry permissions

Full cross-account operation of the model registry requires three main components:

  • Access from the hub account to model artifacts on Amazon S3 and to Amazon ECR images (either in the DS accounts or in a centralized Amazon S3 and Amazon ECR location)
  • Same-account operations on the model registry
  • Cross-account operations on the model registry

We can achieve the first component using resource policies. We provide examples of cross-account read-only policies for Amazon S3 and Amazon ECR in this section. In addition to these settings, the principals in the following policies must act using a role where the corresponding actions are allowed. For example, it’s not enough to have a resource policy that allows the DS account to read a bucket. The account must also do so from a role where Amazon S3 reads are allowed. This basic Amazon S3 and Amazon ECR configuration is not detailed here; links to the relevant documentation are provided at the end of this post.

Careful consideration must also be given to the location where model artifacts and Amazon ECR images are stored. If a central location is desired, it seems like a natural choice to let the hub account also serve as an artifact and image store. In this case, as part of the promotion process, model artifacts and Amazon ECR images must be copied from the DS accounts to the hub account. This is a normal copy operation, and can be done using both push-to-hub and pull-from-DS patterns, which aren’t detailed in this post. However, the attached code for the push-based pattern shows a complete example, including the code to handle the Amazon S3 copy of the artifacts. The example assumes that such a central store exists, that it coincides with the hub account, and that the necessary copy operations are in place.

In this context, versioning (of model images and of model artifacts) is also an important building block. It is required to improve the security profile of the setup and make sure that no accidental overwriting or deletion occurs. In real-life scenarios, the operation of the setups described here is fully automated, and steered by CI/CD pipelines that use unique build-ids to generate unique identifiers for all archived resources (unique object keys for Amazon S3, unique image tags for Amazon ECR). An additional level of robustness can be added by activating versioning on the relevant S3 buckets, as detailed in the resources provided at the end of this post.

Amazon S3 bucket policy

The following resource policy allows the DS account to get objects inside a defined S3 bucket in the hub account. As already mentioned, in this scenario, the hub account also serves as a model store, keeping a copy of the model artifacts. The case where the model store is disjointed from the hub account would have a similar configuration: the relevant bucket must allow read operations from the hub and DS accounts.

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Sid":"S3CrossAccountRead",
         "Effect":"Allow",
         "Action":"s3:GetObject",
         "Resource": [
            "arn::s3:::{HUB_BUCKET_NAME}/*model.tar.gz"
         ],
         "Principal":{
            "AWS":[
               "arn:aws:iam::{DS_ACCOUNT_ID}:role/{DS_ACCOUNT_ROLE}"
            ]
         }
      }
   ]
}

Amazon ECR repository policy

The following resource policy allows the DS account to get images from a defined Amazon ECR repository in the hub account, because in this example the hub account also serves as the central Amazon ECR registry. In case a separate central registry is desired, the configuration is similar: the hub or DS account needs to be given read access to the central registry. Optionally, you can also restrict the access to specific resources, such as enforce a specific pattern for tagging cross-account images.

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Sid":"S3CrossAccountRead",
         "Effect":"Allow",
         "Action": [
            "ecr:BatchGetImage",
            "ecr:GetDownloadUrlForLayer"
         ]
         "Principal":{
            "AWS":[
               "arn:aws:iam::{DS_ACCOUNT_ID}:role/{DS_ACCOUNT_ROLE}"
            ]
         }
      }
   ]
}

IAM policy for SageMaker Model Registry

Operations on the model registry within an account are regulated by normal AWS Identity and Access Management (IAM) policies. The following example allows basic actions on the model registry:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "sagemaker:CreateModelPackage*",
                "sagemaker:DescribeModelPackage",
                "sagemaker:DescribeModelPackageGroup",
                "sagemaker:ListModelPackages",
                "sagemaker:ListModelPackageGroups"
            ],
            "Resource": [
                "*"
            ],
            "Effect": "Allow"
        }
    ]
}

We now detail how to configure cross-account operations on the model registry.

SageMaker Model Registry configuration: Push-based approach

The following diagram shows the architecture of the push-based approach.

In this approach, users in the DS account can read from the hub account, thanks to resource-based policies. However, to gain write access to central registry, the DS account must assume a role in the hub account with the appropriate permissions.

The minimal setup of this architecture requires the following:

  • Read access to the model artifacts on Amazon S3 and to the Amazon ECR images, using resource-based policies, as outlined in the previous section.
  • IAM policies in the hub account allowing it to write the objects into the chosen S3 bucket and create model packages into the SageMaker model package groups.
  • An IAM role in the hub account with the previous policies attached with a cross-account AssumeRole rule. The DS account assumes this role to write the model.tar.gz in the S3 bucket and create a model package. For example, this operation could be carried out by an AWS Lambda function.
  • A second IAM role, in the DS account, that can read the model.tar.gz artifact from the S3 bucket, and assume the role in the hub account mentioned above. This role is used for reads from the registry. For example, this could be used as the run role of a Lambda function.

Create a resource policy for model package groups

The following is an example policy to be attached to model package groups in the hub account. It allows read operations on a package group and on all package versions it contains.

{
'Version': '2012-10-17',
'Statement': [
    {
        'Sid': 'AddPermModelPackageGroup',
        'Effect': 'Allow',
        'Principal': {
            'AWS': [
                'arn:aws:iam::{DS_ACCOUNT_ID}:role/service-role/{LAMBDA_ROLE}'
            ]
        },
        'Action': [
            'sagemaker:DescribeModelPackageGroup'
        ],
 'Resource': 'arn:aws:sagemaker:{REGION}:{HUB_ACCOUNT_ID}:model-package-group/{NAME}'
    },
    {
        'Sid': 'AddPermModelPackageVersion',
        'Effect': 'Allow',
        'Principal': {
            'AWS': 'arn:aws:iam::{DS_ACCOUNT_ID}:role/service-role/{LAMBDA_ROLE}'
        },
        'Action': [
                    "sagemaker:DescribeModelPackage",
                    "sagemaker:ListModelPackages",
                  ],
  'Resource': 'arn:aws:sagemaker:{REGION}:{HUB_ACCOUNT_ID}:model-package/{NAME}/*'
        }
    ]
}

You can’t associate this policy with the package group via the AWS Management Console. You need SDK or AWS Command Line Interface (AWS CLI) access. For example, the following code uses Python and Boto3:

sm_client = boto3.client('sagemaker')

sm_client.put_model_package_group_policy(
    ModelPackageGroupName = model_package_group_name, 
    ResourcePolicy = model_pacakge_group_policy)

Cross-account policy for the DS account to the Hub account

This policy allows the users and services in the DS account to assume the relevant role in the hub account. For example, the following policy allows a Lambda execution role in the DS account to assume the role in the hub account:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "sts:AssumeRole"
            ],
            "Resource": [
                "arn:aws:iam::{HUB_ACCOUNT_ID}:role/SagemakerModelRegistryRole"
            ],
            "Effect": "Allow"
        }
    ]
}

Example workflow

Now that all permissions are configured, we can illustrate the workflow using a Lambda function that assumes the hub account role previously defined, copies the artifact model.tar.gz created into the hub account S3 bucket, and creates the model package linked to the previously copied artifact.

In the following code snippets, we illustrate how to create a model package in the target account after assuming the relevant role. The complete code needed for operation (including manipulation of Amazon S3 and Amazon ECR assets) is attached to this post.

Copy the artifact

To maintain a centralized approach in the hub account, the first operation described is copying the artifact in the centralized S3 bucket.

The method requires as input the DS source bucket name, the hub target bucket name, and the path to the model.tar.gz. After you copy the artifact into the target bucket, it returns the new Amazon S3 path that is used from the model package. As discussed earlier, you need to run this code from a role that has read (write) access to the source (destination) Amazon S3 location. You set this up, for example, in the execution role of a Lambda function, whose details are beyond the scope of this document. See the following code:

def copy_artifact(ds_bucket_name, hub_bucket_name, model_path):
    try:

        s3_client = boto3.client("s3")

        source_response = s3_client.get_object(
            Bucket=ds_bucket_name,
            Key=model_path
        )
        
        # HERE we are assuming the role for copying into the target S3 bucket
        s3_client = assume_dev_role_s3()

        s3_client.upload_fileobj(
            source_response["Body"],
            hub_bucket_name,
            model_path
        )

        new_model_path = "s3://{}/{}".format(hub_bucket_name, model_path)

        return new_model_path
    except Exception as e:
        stacktrace = traceback.format_exc()
        LOGGER.error("{}".format(stacktrace))

        raise e

Create a model package

This method registers the model version in a model package group that you already created in the hub account. The method requires as input a Boto3 SageMaker client instantiated after assuming the role in the hub account, the Amazon ECR image URI to use in the model package, the model URL created after copying the artifact in the target S3 bucket, the model package group name used for creating the new model package version, and the approval status to be assigned to the new version created:

def create_model_package(sm_client, 
                         image_uri,
                         model_path, 
                         model_package_group_name, 
                         approval_status):
    try:
        modelpackage_inference_specification = {
            "InferenceSpecification": {
                "Containers": [
                    {
                        "Image": image_uri,
                        "ModelDataUrl": model_path
                    }
                ],
                # use correct types here
                "SupportedContentTypes": ["text/csv"],
                "SupportedResponseMIMETypes": ["text/csv"], 
            }
        }

        create_model_package_input_dict = {
            "ModelPackageGroupName": model_package_group_name,
            "ModelPackageDescription": f"Model for {model_package_group_name}",
            "ModelApprovalStatus": approval_status
        }

        create_model_package_input_dict.update(modelpackage_inference_specification)
        create_mode_package_response = sm_client.create_model_package(
        **create_model_package_input_dict)
        model_package_arn = create_mode_package_response["ModelPackageArn"]

        return model_package_arn
    except Exception as e:
        stacktrace = traceback.format_exc()
        LOGGER.error("{}".format(stacktrace))

        raise e

A Lambda handler orchestrates all the actions needed to operate the central registry. The mandatory parameters in this example are as follows:

  • image_uri – The Amazon ECR image URI used in the model package
  • model_path – The source path of the artifact in the S3 bucket
  • model_package_group_name – The model package group name used for creating the new model package version
  • ds_bucket_name – The name of the source S3 bucket
  • hub_bucket_name – The name of the target S3 bucket
  • approval_status – The status to assign to the model package version

See the following code:

def lambda_handler(event, context):
    
    image_uri = event.get("image_uri", None)
    model_path = event.get("model_path", None)
    model_package_group_name = event.get("model_package_group_name", None)
    ds_bucket_name = event.get("ds_bucket_name", None)
    hub_bucket_name = event.get("hub_bucket_name", None)
    approval_status = event.get("approval_status", None)
    
    # copy the S3 assets from DS to Hub
    model_path = copy_artifact(ds_bucket_name, hub_bucket_name, model_path)
    
    # assume a role in the Hub account, retrieve the sagemaker client
    sm_client = assume_hub_role_sagemaker()
    
    # create the model package in the Hub account
    model_package_arn = create_model_package(sm_client, 
                                            image_uri, 
                                            model_path, 
                                            model_package_group_name, 
                                            approval_status)

    response = {
        "statusCode": "200",
        "model_arn": model_package_arn
     }
     
    return response

SageMaker Model Registry configuration: Pull-based approach

The following diagram illustrates the architecture for the pull-based approach.

This approach is better suited for cases where write access to the account hosting the central registry is restricted. The preceding diagram shows a minimal setup, with a hub and just one spoke.

A typical workflow is as follows:

  1. A data scientist is working on a dedicated account. The local model registry is used to keep track of model packages and deployment.
  2. Each time a model package is created, an event “SageMaker Model Package State Change” is emitted.
  3. The EventBridge rule in the DS account forwards the event to the hub account, where it triggers actions. In this example, a Lambda function with cross-account read access to the DS model registry can retrieve the needed information and copy it to the central registry.

The minimal setup of this architecture requires the following:

  • Model package groups in the DS account need to have a resource policy, allowing read access from the Lambda execution role in the hub account.
  • The EventBridge rule in the DS account must be configured to forward relevant events to the hub account.
  • The hub account must allow the DS EventBridge rule to send events over.
  • Access to the S3 bucket storing the model artifacts, as well as to Amazon ECR for model images, must be granted to a role in the hub account. These configurations follow the lines of what we outlined in the first section, and are not further elaborated on here.

If the hub account is also in charge of deployment in addition to simple bookkeeping, read access to the model artifacts on Amazon S3 and to the model images on Amazon ECR must also be set up. This can be done by either archiving resources to the hub account or with read-only cross-account access, as already outlined earlier in this post.

Create a resource policy for model package groups

The following is an example policy to attach to model package groups in the DS account. It allows read operations on a package group and on all package versions it contains:

{
'Version': '2012-10-17',
'Statement': [
    {
        'Sid': 'AddPermModelPackageGroup',
        'Effect': 'Allow',
        'Principal': {
            'AWS': 'arn:aws:iam::{HUB_ACCOUNT_ID}:role/service-role/{LAMBDA_ROLE}'
        },
        'Action': ['sagemaker:DescribeModelPackageGroup'],
 'Resource': 'arn:aws:sagemaker:{REGION}:{DS_ACCOUNT_ID}:model-package-group/{NAME}'
    },
    {
        'Sid': 'AddPermModelPackageVersion',
        'Effect': 'Allow',
        'Principal': {
            'AWS': 'arn:aws:iam::{HUB_ACCOUNT_ID}:role/service-role/{LAMBDA_ROLE}'
        },
        'Action': [
                    "sagemaker:DescribeModelPackage",
                    "sagemaker:ListModelPackages",
                  ],
  'Resource': 'arn:aws:sagemaker:{REGION}:{DS_ACCOUNT_ID}:model-package/{NAME}/*'
        }
    ]
}

You can’t associate this policy to the package group via the console. The SDK or AWS CLI is required. For example, the following code uses Python and Boto3:

sm_client = boto3.client('sagemaker')

sm_client.put_model_package_group_policy(
    ModelPackageGroupName = model_package_group_name, 
    ResourcePolicy = model_pacakge_group_policy)

Configure an EventBridge rule in the DS account

In the DS account, you must configure a rule for EventBridge:

  1. On the EventBridge console, choose Rules.
  2. Choose the event bus you want to add the rule to (for example, the default bus).
  3. Choose Create rule.
  4. Select Event Pattern, and navigate your way to through the drop-down menus to choose Predefined pattern, AWS, SageMaker¸ and SageMaker Model Package State Change.

You can refine the event pattern as you like. For example, to forward only events related to approved models within a specific package group, use the following code:

{
    "source": ["aws.sagemaker"],
    "detail-type": ["SageMaker Model Package State Change"],
    "detail": {
        "ModelPackageGroupName": ["ExportPackageGroup"],
        "ModelApprovalStatus": ["Approved"],
    }
}
  1. In the Target section, choose Event Bus in another AWS account.
  2. Enter the ARN of the event bus in the hub account that receives the events.
  3. Finish creating the rule.
  4. In the hub account, open the EventBridge console, choose the event bus that receives the events from the DS account, and edit the Permissions field so that it contains the following code:
{

  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "sid1",
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::{DS_ACCOUNT_ID}:root"
    },
    "Action": "events:*",
    "Resource": "arn:aws:events:{REGION}:{HUB_ACCOUNT_ID}:event-bus/{BUS_NAME}"
  }]
}

Configure an EventBridge rule in the hub account

Now events can flow from the DS account to the hub account. You must configure the hub account to properly handle the events:

  1. On the EventBridge console, choose Rules.
  2. Choose Create rule.
  3. Similarly to the previous section, create a rule for the relevant event type.
  4. Connect it to the appropriate target—in this case, a Lambda function.

In the following example code, we process the event, extract the model package ARN, and retrieve its details. The event from EventBridge already contains all the information from the model package in the DS account. In principle, the resource policy for the model package group isn’t even needed when the copy operation is triggered by EventBridge.

import boto3

sm_client = boto3.client('sagemaker')

# this is meant to be triggered by events in the bus

def lambda_handler(event, context):

    # users need to implement the function get_model_details
    # to extract info from the event received from EventBridge
    model_arn, model_spec, model_desc = get_model_details(event)

    target_group_name = 'targetGroupName'

    # copy the model package to the hub registry
    create_model_package_args = {
        'InferenceSpecification': model_spec,
        'ModelApprovalStatus': 'PendingManualApproval',
        'ModelPackageDescription': model_desc,
        'ModelPackageGroupName': target_group_name}

    return sm_client.create_model_package(**create_model_package_args)

Conclusion

SageMaker model registries are a native AWS tool to track model versions and lineage. The implementation overhead is minimal, in particular when compared with a fully custom metadata store, and they integrate with the rest of the tools within SageMaker. As we demonstrated in this post, even in complex multi-account setups with strict segregation between accounts, model registries are a viable solution to track operations of AI/ML workflows.

References

To learn more, refer to the following resources:


About the Authors

Andrea Di Simone is a Data Scientist in the Professional Services team based in Munich, Germany. He helps customers to develop their AI/ML products and workflows, leveraging AWS tools. He enjoys reading, classical music and hiking.

 

 

Bruno Pistone is a Machine Learning Engineer for AWS based in Milan. He works with enterprise customers on helping them to productionize Machine Learning solutions and to follow best practices using AWS AI/ML services. His field of expertise are Machine Learning Industrialization and MLOps. He enjoys spending time with his friends and exploring new places around Milan, as well as traveling to new destinations.

 

Matteo Calabrese is a Data and ML engineer in the Professional Services team based in Milan (Italy).
He works with large enterprises on AI/ML projects, helping them in proposition, deliver, scale, and optimize ML solutions . His goal is shorten their time to value and accelerate business outcomes by providing AWS best practices. In his spare time, he enjoys hiking and traveling.

 

 

Giuseppe Angelo Porcelli is a Principal Machine Learning Specialist Solutions Architect for Amazon Web Services. With several years software engineering an ML background, he works with customers of any size to deeply understand their business and technical needs and design AI and Machine Learning solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. He has worked on projects in different domains, including MLOps, Computer Vision, NLP, and involving a broad set of AWS services. In his free time, Giuseppe enjoys playing football.

 

 

Read More

Deploy multiple serving containers on a single instance using Amazon SageMaker multi-container endpoints

Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning (ML) models built on different frameworks. SageMaker real-time inference endpoints are fully managed and can serve predictions in real time with low latency.

This post introduces SageMaker support for direct multi-container endpoints. This enables you to run up to 15 different ML containers on a single endpoint and invoke them independently, thereby saving up to 90% in costs. These ML containers can be running completely different ML frameworks and algorithms for model serving. In this post, we show how to serve TensorFlow and PyTorch models from the same endpoint by invoking different containers for each request and restricting access to each container.

SageMaker already supports deploying thousands of ML models and serving them using a single container and endpoint with multi-model endpoints. SageMaker also supports deploying multiple models built on different framework containers on a single instance, in a serial implementation fashion using inference pipelines.

Organizations are increasingly taking advantage of ML to solve various business problems and running different ML frameworks and algorithms for each use case. This pattern requires you to manage the challenges around deployment and cost for different serving stacks in production. These challenges become more pronounced when models are accessed infrequently but still require low-latency inference. SageMaker multi-container endpoints enable you to deploy up to 15 containers on a single endpoint and invoke them independently. This option is ideal when you have multiple models running on different serving stacks with similar resource needs, and when individual models don’t have sufficient traffic to utilize the full capacity of the endpoint instances.

Overview of SageMaker multi-container endpoints

SageMaker multi-container endpoints enable several inference containers, built on different serving stacks (such as ML framework, model server, and algorithm), to be run on the same endpoint and invoked independently for cost savings. This can be ideal when you have several different ML models that have different traffic patterns and similar resource needs.

Examples of when to utilize multi-container endpoints include, but are not limited to, the following:

  • Hosting models across different frameworks (such as TensorFlow, PyTorch, and Sklearn) that don’t have sufficient traffic to saturate the full capacity of an instance
  • Hosting models from the same framework with different ML algorithms (such as recommendations, forecasting, or classification) and handler functions
  • Comparisons of similar architectures running on different framework versions (such as TensorFlow 1.x vs. TensorFlow 2.x) for scenarios like A/B testing

Requirements for deploying a multi-container endpoint

To launch a multi-container endpoint, you specify the list of containers along with the trained models that should be deployed on an endpoint. Direct inference mode informs SageMaker that the models are accessed independently. As of this writing, you’re limited to up to 15 containers on a multi-container endpoint and GPU inference is not supported due to resource contention. You can also run containers on multi-container endpoints sequentially as inference pipelines for each inference if you want to make preprocessing or postprocessing requests, or if you want to run a series of ML models in order. This capability is already supported as the default behavior of the multi-container endpoints and is selected by setting the inference mode to Serial.

After the models are trained, either through training on SageMaker or a bring-your-own strategy, you can deploy them on a multi-container endpoint using the SageMaker create_modelcreate_endpoint_config, and create_endpoint APIs. The create_endpoint_config and create_endpoint APIs work exactly the same way as they work for single model or container endpoints. The only change you need to make is in the usage of the create_model API. The following changes are required:

  • Specify a dictionary of container definitions for the Containers argument. This dictionary contains the container definitions of all the containers required to be hosted under the same endpoint. Each container definition must specify a ContainerHostname.
  • Set the Mode parameter of InferenceExecutionConfig to Direct, for direct invocation of each container, or Serial, for using containers in a sequential order (inference pipeline). The default Mode value is Serial.

Solution overview

In this post, we explain the usage of multi-container endpoints with the following steps:

  1. Train a TensorFlow and a PyTorch Model on the MNIST dataset.
  2. Prepare container definitions for TensorFlow and PyTorch serving.
  3. Create a multi-container endpoint.
  4. Invoke each container directly.
  5. Secure access to each container on a multi-container endpoint.
  6. View metrics for a multi-container endpoint

The complete code related to this post is available on the GitHub repo.

Dataset

The MNIST dataset contains images of handwritten digits from 0–9 and is a popular ML problem. The MNIST dataset contains 60,000 training images and 10,000 test images. This solution uses the MNIST dataset to train a TensorFlow and PyTorch model, which can classify a given image content as representing a digit between 0–9. The models give a probability score for each digit category (0–9) and the highest probability score is taken as the output.

Train TensorFlow and PyTorch models on the MNIST dataset

SageMaker provides built-in support for training models using TensorFlow and PyTorch. To learn how to train models on SageMaker, we recommend referring to the SageMaker documentation for training a PyTorch model and training a TensorFlow model, respectively. In this post, we use TensorFlow 2.3.1 and PyTorch 1.8.1 versions to train and host the models.

Prepare container definitions for TensorFlow and PyTorch serving

SageMaker has built-in support for serving these framework models, but under the hood TensorFlow uses TensorFlow Serving and PyTorch uses TorchServe. This requires launching separate containers to serve the two framework models. To use SageMaker pre-built Deep Learning Containers, see Available Deep Learning Containers Images. Alternatively, you can retrieve pre-built URIs through the SageMaker SDK. The following code snippet shows how to build the container definitions for TensorFlow and PyTorch serving containers.

  1. Create a container definition for TensorFlow:
tf_ecr_image_uri = sagemaker.image_uris.retrieve(
    framework="tensorflow",
    region=region,
    version="2.3.1",
    py_version="py37",
    instance_type="ml.c5.4xlarge",
    image_scope="inference",
)

tensorflow_container = {
    "ContainerHostname": "tensorflow-mnist",
    "Image": tf_ecr_image_uri,
    "ModelDataUrl": tf_mnist_model_data,
}

Apart from ContainerHostName, specify the correct serving Image provided by SageMaker and also ModelDataUrl, which is an Amazon Simple Storage Service (Amazon S3) location where the model is present.

  1. Create the container definition for PyTorch:
pt_ecr_image_uri = sagemaker.image_uris.retrieve(
    framework="pytorch",
    region=region,
    version="1.8.1",
    py_version="py36",
    instance_type="ml.c5.4xlarge",
    image_scope="inference",
)

pytorch_container = {
    "ContainerHostname": "pytorch-mnist",
    "Image": pt_ecr_image_uri,
    "ModelDataUrl": pt_updated_model_uri,
    "Environment": {
        "SAGEMAKER_PROGRAM": "inference.py",
        "SAGEMAKER_SUBMIT_DIRECTORY": pt_updated_model_uri,
    },
}

For PyTorch container definition, an additional argument, Environment, is provided. It contains two keys:

  • SAGEMAKER_PROGRAM – The name of the script containing the inference code required by the PyTorch model server
  • SAGEMAKER_SUBMIT_DIRECTORY – The S3 URI of tar.gz containing the model file (model.pth) and the inference script

Create a multi-container endpoint

The next step is to create a multi-container endpoint.

  1. Create a model using the create_model API:
create_model_response = sm_client.create_model(
    ModelName="mnist-multi-container",
    Containers=[pytorch_container, tensorflow_container],
    InferenceExecutionConfig={"Mode": "Direct"},
    ExecutionRoleArn=role,
)

Both the container definitions are specified under the Containers argument. Additionally, the InferenceExecutionConfig mode has been set to Direct.

  1. Create endpoint_configuration using the create_endpoint_config API. It specifies the same ModelName created in the previous step:
endpoint_config = sm_client.create_endpoint_config(
    EndpointConfigName="mnist-multi-container-ep-config",
    ProductionVariants=[
        {
            "VariantName": "prod",
            "ModelName": "mnist-multi-container",
            "InitialInstanceCount": 1,
            "InstanceType": "ml.c5.4xlarge",
        },
    ],
)
  1. Create an endpoint using the create_endpoint API. It contains the same endpoint configuration created in the previous step:
endpoint = sm_client.create_endpoint(
    EndpointName="mnist-multi-container-ep", EndpointConfigName="mnist-multi-container-ep-config"
)

Invoke each container directly

To invoke a multi-container endpoint with direct invocation mode, use invoke_endpoint from the SageMaker Runtime, passing a TargetContainerHostname argument that specifies the same ContainerHostname used while creating the container definition. The SageMaker Runtime InvokeEndpoint request supports X-Amzn-SageMaker-Target-Container-Hostname as a new header that takes the container hostname for invocation.

The following code snippet shows how to invoke the TensorFlow model on a small sample of MNIST data. Note the value of TargetContainerHostname:

tf_result = runtime_sm_client.invoke_endpoint(
    EndpointName="mnist-multi-container-ep",
    ContentType="application/json",
    Accept="application/json",
    TargetContainerHostname="tensorflow-mnist",
    Body=json.dumps({"instances": np.expand_dims(tf_samples, 3).tolist()}),
)

Similarly, to invoke the PyTorch container, change the TargetContainerHostname to pytorch-mnist:

pt_result = runtime_sm_client.invoke_endpoint(
    EndpointName="mnist-multi-container-ep",
    ContentType="application/json",
    Accept="application/json",
    TargetContainerHostname="pytorch-mnist",
    Body=json.dumps({"inputs": np.expand_dims(pt_samples, axis=1).tolist()}),
)

Apart from using different containers, each container invocation can also support a different MIME type.

For each invocation request to a multi-container endpoint set in direct invocation mode, only the container with TargetContainerHostname processes the request. Validation errors are raised if you specify a TargetContainerHostname that doesn’t exist inside the endpoint, or if you failed to specify a TargetContainerHostname parameter when invoking a multi-container endpoint.

Secure multi-container endpoints

For multi-container endpoints using direct invocation mode, multiple containers are co-located in a single instance by sharing memory and storage volume. You can provide users with the right access to the target containers. SageMaker uses AWS Identity and Access Management (IAM) roles to provide IAM identity-based policies that allow or deny actions.

By default, an IAM principal with InvokeEndpoint permissions on a multi-container endpoint using direct invocation mode can invoke any container inside the endpoint with the EndpointName you specify. If you need to restrict InvokeEndpoint access to a limited set of containers inside the endpoint you invoke, you can restrict InvokeEndpoint calls to specific containers by using the sagemaker:TargetContainerHostname IAM condition key, similar to restricting access to models when using multi-model endpoints.

The following policy allows InvokeEndpoint requests only when the value of the TargetContainerHostname field matches one of the specified regular expressions:

{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"sagemaker:InvokeEndpoint"
],
"Effect": "Allow",
"Resource": "arn:aws:sagemaker:region:account-id:endpoint/endpoint_name",
"Condition": {
"StringLike": {
"sagemaker:TargetContainerHostname": ["customIps*", "common*"]
}
}
}
]
}

The following policy denies InvokeEndpont requests when the value of the TargetContainerHostname field matches one of the specified regular expressions of the Deny statement:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "sagemaker:InvokeEndpoint"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:sagemaker:region:account-id:endpoint/endpoint_name",
            "Condition": {
                "StringLike": {
                    "sagemaker:TargetContainerHostname": [""]
                }
            }
        },
        {
            "Action": [
                "sagemaker:InvokeEndpoint"
            ],
            "Effect": "Deny",
            "Resource": "arn:aws:sagemaker:region:account-id:endpoint/endpoint_name",
            "Condition": {
                "StringLike": {
                    "sagemaker:TargetContainerHostname": ["special"]
                }
            }
        }
    ]
}

For information about SageMaker condition keys, see Condition Keys for Amazon SageMaker.

Monitor multi-container endpoints

For multi-container endpoints using direct invocation mode, SageMaker not only provides instance-level metrics as it does with other common endpoints, but also supports per-container metrics.

Per-container metrics for multi-container endpoints with direct invocation mode are located in Amazon CloudWatch metrics and are categorized into two namespaces: AWS/SageMaker and aws/sagemaker/Endpoints. The namespace of AWS/SageMaker includes invocation-related metrics, and the aws/sagemaker/Endpoints namespace includes per-container metrics of memory and CPU utilization.

The following screenshot of the AWS/SageMaker namespace shows per-container latency.

The following screenshot shows the aws/sagemaker/Endpoints namespace, which displays the CPU and memory utilization for each container.

For a full list of metrics, see Monitor Amazon SageMaker with Amazon CloudWatch.

Conclusion

SageMaker multi-container endpoints support deploying up to 15 containers on real-time endpoints and invoking them independently for low-latency inference and cost savings. The models can be completely heterogenous, with their own independent serving stack. You can either invoke these containers sequentially or independently for each request. Securely hosting multiple models, from different frameworks, on a single instance could save you up to 90% in cost.

To learn more, see Deploy multi-container endpoints and try out the example used in this post on the SageMaker GitHub examples repo.


About the Author

Vikesh Pandey is a Machine Learning Specialist Specialist Solutions Architect at AWS, helping customers in the Nordics and wider EMEA region design and build ML solutions. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.

 

 

Sean MorganSean Morgan is an AI/ML Solutions Architect at AWS. He previously worked in the semiconductor industry, using computer vision to improve product yield. He later transitioned to a DoD research lab where he specialized in adversarial ML defense and network security. In his free time, Sean is an active open-source contributor and maintainer, and is the special interest group lead for TensorFlow Addons.

Read More

Machine Learning at the Edge with AWS Outposts and Amazon SageMaker

As customers continue to come up with new use-cases for machine learning, data gravity is as important as ever. Where latency and network connectivity is not an issue, generating data in one location (such as a manufacturing facility) and sending it to the cloud for inference is acceptable for some use-cases. With other critical use-cases, such as fraud detection for financial transactions, product quality in manufacturing, or analyzing video surveillance in real-time, customers are faced with the challenges that come with having to move that data to the cloud first. One of the challenges customers are facing with performing inference in the cloud is the lack of real-time inference and/or security requirements preventing user data to be sent or stored in the cloud.

Tens of thousands of customers use Amazon SageMaker to accelerate their Machine Learning (ML) journey by helping data scientists and developers to prepare, build, train, and deploy machine learning models quickly. Once you’ve built and trained your ML model with SageMaker, you’ll want to deploy it somewhere to start collecting inputs to run through your model (inference). These models can be deployed and run on AWS, but we know that there are use-cases that don’t lend themselves well for running inference in an AWS Region while the inputs come from outside the Region. Cases include a customer’s data center, manufacturing facility, or autonomous vehicles. Predictions must be made in real-time when new data is available. When you want to run inference locally or on an edge device, a gateway, an appliance or on-premises server, you can optimize your ML models for the specific underlying hardware with Amazon SageMaker Neo. It is the easiest way to optimize ML models for edge devices, enabling you to train ML models once in the cloud and run them on any device. To increase efficiency of your edge ML operations, you can use Amazon SageMaker Edge Manager to automate the manual steps to optimize, test, deploy, monitor and maintain your models on fleets of edge devices.

In this blog post, we will talk about the different use-cases for inference at the edge and the way to accomplish it using Amazon SageMaker features and AWS Outposts. Let’s review each, before we dive into ML with AWS Outposts.

Amazon SageMaker – Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.

Amazon SageMaker Edge Manager – Amazon SageMaker Edge Manager provides a software agent that runs on edge devices. The agent comes with a ML model optimized with SageMaker Neo automatically. You don’t need to have Neo runtime installed on your devices in order to take advantage of the model optimizations such as machine learning models performing at up to twice the speed with no loss in accuracy. Other benefits include reduction of hardware resource usage by up to 10x and the ability to run the same ML model on multiple hardware platforms. The agent also collects prediction data and sends a sample of the data to the AWS Region for monitoring, labeling, and retraining so you can keep models accurate over time.

AWS Outposts – AWS Outposts is a fully managed service that offers the same AWS infrastructure, AWS services, APIs, and tools to virtually any data center, co-location space, or on-premises facility for a truly consistent hybrid experience. AWS Outposts is ideal for workloads that require low latency access to on-premises systems, local data processing, data residency, and migration of applications with local system interdependencies.

AWS compute, storage, database, and other services run locally on Outposts. You can access the full range of AWS services available in the Region to build, manage, and scale your on-premises applications using familiar AWS services and tools.

Use cases

Due to low latency needs or large volumes of data, customers need ML inferencing at the edge. Two main use-cases require customers to implement these models for inference at the edge:

Low Latency – In many use-cases, the end user or application must provide inferencing in (near) real-time, requiring the model to be running at the edge. This is a common use case in industries such as Financial Services (risk analysis), Healthcare (medical imaging analysis), Autonomous Vehicles and Manufacturing (shop floor).

Large Volumes of Data – Large volumes of new data being generated at the edge means that inferencing needs to happen closer to where data is being generated. This is a common use case in IoT scenarios, such as in the oil and gas or utilities industries.

Scenario

For this scenario, let’s focus on the low latency use-case. A financial services customer wants to implement fraud detection on all customer transactions. They’ve decided on using Amazon SageMaker to build and train their model in an AWS Region. Given the distance between the data center in which they process transactions and an AWS Region, inference needs to be performed locally, in the same location as the transaction processing. They will use Amazon SageMaker Edge Manager to optimize the trained model to perform inference locally in their data center. The last piece is the compute. The customer is not interested in managing the hardware and their team is already trained in AWS development, operations and management. Given that, the customer chose AWS Outposts as their compute to run locally in their data center.

What does this look like technically? Let’s take a look at an architecture and then talk through the different pieces.

Let’s look at the flow. On the left, training of the model is done in the AWS Region with Amazon SageMaker for training and packaging. On the right, we have a data center, which can be the customer data center or a co-location facility, with AWS Outposts and SageMaker Edge Manager to do the inference.

AWS Region:

  1. Load dataset into Amazon S3, which acts as input for model training.
  2. Use Amazon SageMaker to do processing and training against the dataset.
  3. Store the model artifacts in Amazon S3.
  4. Compile the trained model using Amazon SageMaker Neo.
  5. Package and sign the model with Amazon SageMaker Edge Manger and store in Amazon S3.

AWS Outposts

  1. Launch an Amazon EC2 instance (use the instance family that you’ve optimized the model for) in a subnet that lives on the AWS Outposts.
  2. Install Amazon SageMaker Edge Manager agent onto the instance. Learn more about installing the agent here.
  3. Copy the compiled and signed model from Amazon S3 in the AWS Region to the Amazon EC2 instance on the AWS Outposts. Here’s an example using the AWS CLI to copy a model file (model-ml_m5.tar.gz) from Amazon S3 to the current directory (.):
    aws s3 cp s3://sagemaker-studio-66f50fg898c/fraud-detection-ml/profiler/model-ml_m5.tar.gz .

  4. Financial transactions come into the data center and are routed into the Outposts via the Local Gateway (LGW), to the front-end web server and then to the application server.
  5. The transaction gets stored in the database and at the same time, the application server generates a customer profile based on multiple variables, including transaction history.
  6. The customer profile is sent to Edge Manager agent to run inference against the compiled model using the customer profile as input.
  7. The fraud detection model will generate a score once inference is complete. Based on that score the application server will return one of the following back to the client:
    1. Approve the transaction.
    2. Ask for 2nd factor (two factor authentication).
    3. Deny the transaction.
  8. Additionally, sample input/output data as well as model metrics are captured and sent back to the AWS Region for monitoring with Amazon SageMaker Edge Manager.

AWS Region

  1. Monitor your model with Amazon SageMaker Edge Manager and push metrics in to CloudWatch, which can be used as a feedback loop to improve the model’s performance on an on-going basis.

Considerations for Using Amazon SageMaker Edge Manager with AWS Outposts

Factors to consider when choosing between inference in an AWS Region vs AWS Outposts:

  • Security: Whether other factors are relevant to your use-case or not, security of your data is a priority. If the data you must perform inference on is not permissible to be stored in the cloud, AWS Outposts for inference at the edge will perform inference without sacrificing data security.
  • Real-time processing: Is the data you need to perform inference on time bound? If the value of the data diminishes as more time passes, then sending the data to an AWS Region for inference may not have value.
  • WAN Connectivity: Along with the speed and quality of your connection, the time from where the data is generated and sent to the cloud (latency) is also important. You may only need near real-time inference and cloud-based inference is an option.
    • Do you have enough bandwidth to send the amount of data back to an AWS Region? If not, is the required bandwidth cost effective?
    • Is the quality of network link back to the AWS Region suitable to meet your requirements?
    • What are the consequences of a network outage?

If link quality is an issue, if bandwidth costs are not reasonable, or a network outage is detrimental to your business, then using AWS Outposts for inference at the edge can help to ensure that you’re able to continually perform inference regardless of the state of your WAN connectivity.

As of the writing of this blog post, Amazon SageMaker Edge Manager supports common CPU (ARM, x86), GPU (ARM, Nvidia) based devices with Linux and Windows operating systems. Over time, SageMaker Edge Manager will expand to support more embedded processors and mobile platforms that are also supported by SageMaker Neo.

Additionally, you need to use Amazon SageMaker Neo to compile the model in order to use Amazon SageMaker Edge Manager. Amazon SageMaker Neo converts and compiles your models into an executable that you can then package and deploy on to your edge devices. Once the model package is deployed, Amazon SageMaker Edge Manager agent will unpack the model package and run the model on the device.

Conclusion

Whether it’s providing quality assurance to manufactured goods, real-time monitoring of cameras, wind farms, or medical devices (and countless other use-cases), Amazon SageMaker combined with AWS Outposts provides you with world class machine learning capabilities and inference at the edge.

To learn more about Amazon SageMaker Edge Manager, you can visit the Edge Manager product page and check out this demo. To learn more about AWS Outposts, visit the Outposts product page and check out this introduction.


About the Author

Josh Coen is a Senior Solutions Architect at AWS specializing in AWS Outposts. Prior to joining AWS, Josh was a Cloud Architect at Sirius, a national technology systems integrator, where he helped build and run their AWS practice. Josh has a BS in Information Technology and has been in the IT industry since 2003.

 

 

Mani Khanuja is an Artificial Intelligence and Machine Learning Specialist SA at Amazon Web Services (AWS). She helps customers use machine learning to solve their business challenges using AWS. She spends most of her time diving deep and teaching customers on AI/ML projects related to computer vision, natural language processing, forecasting, ML at the edge, and more. She is passionate about ML at edge and has created her own lab with a self-driving kit and prototype manufacturing production line, where she spends a lot of her free time.

Read More

Getting started with Amazon SageMaker Feature Store

In a machine learning (ML) journey, one crucial step before building any ML model is to transform your data and design features from your data so that your data can be machine-readable. This step is known as feature engineering. This can include one-hot encoding categorical variables, converting text values to vectorized representation, aggregating log data to a daily summary, and more. The quality of your features directly influences your model predictability, and often needs a few iterations until a model reaches an ideal level of accuracy. Data scientists and developers can easily spend 60% of their time designing and creating features, and the challenges go beyond writing and testing your feature engineering code. Features built at different times and by different teams aren’t consistent. Extensive and repetitive feature engineering work is often needed when productionizing new features. Difficulty tracking versions and up-to-date features aren’t easily accessible.

To address these challenges, Amazon SageMaker Feature Store provides a fully managed central repository for ML features, making it easy to securely store and retrieve features without the heavy lifting of managing the infrastructure. It lets you define groups of features, use batch ingestion and streaming ingestion, and retrieve the latest feature values with low latency.

For an introduction to Feature Store and a basic use case using a credit card transaction dataset for fraud detection, see New – Store, Discover, and Share Machine Learning Features with Amazon SageMaker Feature Store. For further exploration of its features, see Using streaming ingestion with Amazon SageMaker Feature Store to make ML-backed decisions in near-real time.

For this post, we focus on the integration of Feature Store with other Amazon SageMaker features to help you get started quickly. The associated sample notebook and the following video demonstrate how you can apply these concepts to the development of an ML model to predict the risk of heart failure.

The components of Feature Store

Feature Store is a centralized hub for features and associated metadata. Features are defined and stored in a collection called a feature group. You can visualize a feature group as a table in which each column is a feature, with a unique identifier for each row. In principle, a feature group is composed of features and values specific to each feature. A feature group’s definition is composed of a list of the following:

  • Feature definitions – These consist of a name and data types.
  • A record identifier name – Each feature group is defined with a record identifier name. It should be a unique ID to identify each instance of the data, for example, primary key, customer ID, transaction ID, and so on.
  • Configurations for its online and offline store – You can create an online or offline store. The online store is used for low-latency, real-time inference use cases, and the offline store is used for training and batch inference.

The following diagram shows how you can use Feature Store as part of your ML pipeline. First, you read in your raw data and transform it to features ready for exploration and modeling. Then you can create a feature store, configure it to an online or offline store, or both. Next you can ingest data via streaming to the online and offline store, or in batches directly to the offline store. After your feature store is set up, you can create a model using data from your offline store and access it for real time inference or batch inference.

For more hands-on experience, follow the notebook example for a step-by-step guide to build a feature store, train a model for fraud detection, and access the feature store for inference.

Export data from Data Wrangler to Feature Store

Because Feature Store can ingest data in batches, you can author features using Amazon SageMaker Data Wrangler, create feature groups in Feature Store, and ingest features in batches using a SageMaker Processing job with a notebook exported from Data Wrangler. This mode allows for batch ingestion into the offline store. It also supports ingestion into the online store if the feature group is configured for both online and offline use.

To start off, after you complete your data transformation steps and analysis, you can conveniently export your data preparation workflow into a notebook with one click. When you export your flow steps, you have the option of exporting your processing code to a notebook that pushes your processed features to Feature Store.

Choose Export step and Feature Store to automatically create your notebook. This notebook recreates the manual steps you created, creates a feature group, and adds features to an offline or online feature store, allowing you easily rerun your manual steps.

This notebook defines the schema instead of auto-detection of data types for each column of the data, with the following format:

column_schema = [
 { 
 "name": "Height", 
 "type": "long" 
 },
 { 
 "name": "Sum", 
 "type": "string" 
 }, 
 { 
 "name": "Time", 
 "type": "string"
  }
]

For more information on how to load the schema, map it, and add it as a FeatureDefinition that you can use to create the FeatureGroup, see Export to the SageMaker Feature Store.

Additionally, you must specify a record identifier name and event time feature name in the following code:

  • The record_identifier_name is the name of the feature whose value uniquely identifies a record defined in the feature store.
  • An EventTime is a point in time when a new event occurs that corresponds to the creation or update of a record in a feature. All records in the feature group must have a corresponding EventTime.

The notebook creates an offline store and the online by default with the following configuration set to True:

online_store_config = {
    "EnableOnlineStore": True
}

You can also disable an online store by setting EnableOnlineStore to False in the online and offline store configurations.

You can then run the notebook, and the notebook creates a feature group and processing job to process data in scale. The offline store is located in an Amazon Simple Storage Service (Amazon S3) bucket in your AWS account. Because Feature Store is integrated with Amazon SageMaker Studio, you can visualize the feature store by choosing Components and registries in the navigation pane, choosing Feature Store on the drop-down menu, and then finding your feature store on the list. You can check for feature definitions, manage feature group tags, and generate queries for the offline store.

Build a training set from an offline store

Now that you have created a feature store from your processed data, you can build a training dataset from your offline store by using services such as Amazon Athena, AWS Glue, or Amazon EMR. In the following example, because Feature Store automatically builds an AWS Glue Data Catalog when you create feature groups, you can easily create a training dataset with feature values from the feature group. This is done by utilizing the auto-built Data Catalog.

First, create an Athena query for your feature group with the following code. The table_name is the AWS Glue table that is automatically generated by Feature Store.

sample_query = your_feature_group.athena_query()
data_table = sample_query.table_name

You can then write your query using SQL on your feature group, and run the query with the .run() command and specify your S3 bucket location for the dataset to be saved there. You can modify the query to include any operations needed for your data like joining, filtering, ordering, and so on. You can further process the output DataFrame until it’s ready for modeling, then upload it to your S3 bucket so that your SageMaker trainer can directly read the input from the S3 bucket.

# define your Athena query
query_string = 'SELECT * FROM "'+data_table+'"'

# run Athena query. The output is loaded to a Pandas dataframe.
dataset = pd.DataFrame()
sample_query.run(query_string=query_string, output_location='s3://'+default_s3_bucket_name+'/query_results/')
sample_query.wait()
dataset = sample_query.as_dataframe()

Access your Feature Store for inference

After you build a model from the training set, you can access your online store conveniently to fetch a record and make predictions using the deployed model. Feature Store can be especially useful in supplementing data for inference requests because of the low-latency GetRecord functionality. In this example, you can use the following code to query the online feature group to build an inference request:

selected_id = str(194)

# Helper to parse the feature value from the record.

def get_feature_value(record, feature_name):
    return str(list(filter(lambda r: r['FeatureName'] == feature_name, record))[0]['ValueAsString'])

fs_response = featurestore_runtime.get_record(
                                               FeatureGroupName=your_feature_group_name,
                                               RecordIdentifierValueAsString=selected_id)
selected_record = fs_response['Record']
inference_request = [
    get_feature_value(selected_record, 'feature1'),
    get_feature_value(selected_record, 'feature2'),
    ....
    get_feature_value(selected_record, 'feature 10')
]

You can then call the deployed model predictor to generate a prediction for the selected record:

results = predictor.predict(','.join(inference_request), 
                            initial_args = {"ContentType": "text/csv"})
prediction = json.loads(results)

Integrate Feature Store in a SageMaker pipeline

Feature Store also integrates with Amazon SageMaker Pipelines to create, add feature search and discovery to, and reuse automated ML workflows. As a result, it’s easy to add feature search, discovery, and reuse to your ML workflow. The following code shows you how to configure the ProcessingOutput to directly write the output to your feature group instead of Amazon S3, so that you can maintain your model features in a feature store:

flow_step_outputs = []
flow_output = sagemaker.processing.ProcessingOutput(
    output_name=customers_output_name,
    feature_store_output=sagemaker.processing.FeatureStoreOutput(
        feature_group_name=your_feature_group_name), 
    app_managed=True)
flow_step_outputs.append(flow_output)

example_flow_step = ProcessingStep(
    name='SampleProcessingStep', 
    processor=flow_processor, # Your flow processor defined at the beginning of your pipeline
    inputs=flow_step_inputs, # Your processing and feature engineering steps, can be Data Wrangler flows
    outputs=flow_step_outputs)

Conclusion

In this post, we explored how Feature Store can be a powerful tool in your ML journey. You can easily export your data processing and feature engineering results to a feature group and build your feature store. After your feature store is all set up, you can explore and build training sets from your offline store, taking advantage of its integration with other AWS analytics services such as Athena, AWS Glue, and Amazon EMR. After you train and deploy a model, you can fetch records from your online store for real-time inference. Lastly, you can add a feature store as a part of a complete SageMaker pipeline in your ML workflow. Feature Store makes it easy to store and retrieve features as needed in ML development.

Give it a try, and let us know what you think!


About the Author

As a data scientist and consultant, Zoe Ma has helped bring the latest tools and technologies and data-driven insights to businesses and enterprises. In her free time, she loves painting and crafting and enjoys all water sports.

Courtney McKay is a consultant. She is passionate about helping customers drive measurable ROI with AI/ML tools and technologies. In her free time, she enjoys camping, hiking and gardening.

Read More

Run ML inference on AWS Snowball Edge with Amazon SageMaker Edge Manager and AWS IoT Greengrass

You can use AWS Snowball Edge devices in locations like cruise ships, oil rigs, and factory floors with limited to no network connectivity for a wide range of machine learning (ML) applications such as surveillance, facial recognition, and industrial inspection. However, given the remote and disconnected nature of these devices, deploying and managing ML models at the edge is often difficult. With AWS IoT Greengrass and Amazon SageMaker Edge Manager, you can perform ML inference on locally generated data on Snowball Edge devices using cloud-trained ML models. You not only benefit from the low latency and cost savings of running local inference, but also reduce the time and effort required to get ML models to production. You can do all this while continuously monitoring and improving model quality across your Snowball Edge device fleet.

In this post, we talk about how you can use AWS IoT Greengrass version 2.0 or higher and Edge Manager to optimize, secure, monitor, and maintain a simple TensorFlow classification model to classify shipping containers (connex) and people.

Getting started

To get started, order a Snowball Edge device (for more information, see Creating an AWS Snowball Edge Job). You can order a Snowball Edge device with an AWS IoT Greengrass validated AMI on it.

After you receive the device, you can use AWS OpsHub for Snow Family or the Snowball Edge client to unlock the device. You can start an Amazon Elastic Compute Cloud (Amazon EC2) instance with the latest AWS IoT Greengrass installed or use the commands on AWS OpsHub for Snow Family.

Launch and install an AMI with the following requirements, or provide an AMI reference on the Snowball console before ordering and it will be shipped with all libraries and data in the AMI:

  • The ML framework of your choice, such as TensorFlow, PyTorch, or MXNet
  • Docker (if you intend to use it)
  • AWS IoT Greengrass
  • Any other libraries you may need

Prepare the AMI at the time of ordering the Snowball Edge device on AWS Snow Family console. For instructions, see Using Amazon EC2 Compute Instances. You also have the option to update the AMI after Snowball is deployed to your edge location.

Install the latest AWS IoT Greengrass on Snowball Edge

To install AWS IoT Greengrass on your device, complete the following steps:

  1. Install the latest AWS IoT Greengrass on your Snowball Edge device. Make sure dev_tools=True is set to have ggv2 cli See the following code:
sudo -E java -Droot="/greengrass/v2" -Dlog.store=FILE  -jar ./MyGreengrassCore/lib/Greengrass.jar  --aws-region region  --thing-name MyGreengrassCore  --thing-group-name MyGreengrassCoreGroup  --tes-role-name GreengrassV2TokenExchangeRole  --tes-role-alias-name GreengrassCoreTokenExchangeRoleAlias  --component-default-user ggc_user:ggc_group  --provision true  --setup-system-service true  --deploy-dev-tools true

We reference the --thing-name you chose here when we set up Edge Manager.

  1. Run the following command to test your installation:
aws greegrassv2 help
  1. On the AWS IoT console, validate the successfully registered Snowball Edge device with your AWS IoT Greengrass account.

Optimize ML models with Edge Manager

We use Edge Manger to deploy and manage the model on Snowball Edge.

  1. Install the Edge Manager agent on Snowball Edge using the latest AWS IoT Greengrass.
  2. Train and store your ML model.

You can train your ML model using any framework of your choice and save it to an Amazon Simple Storage Service (Amazon S3) bucket. In the following screenshot, we use TensorFlow to train a multi-label model to classify connex and people in an image. The model used here is saved to an S3 bucket by first creating a .tar file.

After the model is saved (TensorFlow Lite in this case), you can start an Amazon SageMaker Neo compilation job of the model and optimize the ML model for Snowball Edge Compute (SBE_C).

  1. On the SageMaker console, under Inference in the navigation pane, choose Compilation jobs.
  2. Choose Create compilation job.

  1. Give your job a name and create or use an existing role.

 If you’re creating a new AWS Identity and Access Management (IAM) role, ensure that SageMaker has access to the bucket in which the model is saved.

  1. In the Input configuration section, for Location of model artifacts, enter the path to model.tar.gz where you saved the file (in this case, s3://feidemo/tfconnexmodel/connexmodel.tar.gz).
  2. For Data input configuration, enter the ML model’s input layer (its name and its shape). In this case, it’s called keras_layer_input and its shape is [1,224,224,3], so we enter {“keras_layer_input”:[1,224,224,3]}.

  1. For Machine learning framework, choose TFLite.

  1. For Target device, choose sbe_c.
  2. Leave Compiler options
  3. For S3 Output location, enter the same location as where your model is saved with the prefix (folder) output. For example, we enter s3://feidemo/tfconnexmodel/output.

  1. Choose Submit to start the compilation job.

Now you create a model deployment package to be used by Edge Manager.

  1. On the SageMaker console, under Edge Manager, choose Edge packaging jobs.
  2. Choose Create Edge packaging job.
  3. In the Job properties section, enter the job details.
  4. In the Model source section, for Compilation job name, enter the name you provided for the Neo compilation job.
  5. Choose Next.

  1. In the Output configuration section, for S3 bucket URI, enter where you want to store the package in Amazon S3.
  2. For Component name, enter a name for your AWS IoT Greengrass component.

This step creates an AWS IoT Greengrass model component where the model is downloaded from Amazon S3 and uncompressed to local storage on Snowball Edge.

  1. Create a device fleet to manage a group of devices, in this case, just one (SBE).
  2. For IAM role¸ enter the role generated by AWS IoT Greengrass earlier (–tes-role-name).

Make sure it has the required permissions by going to IAM console, searching for the role, and adding the required policies to it.

  1. Register the Snowball Edge device to the fleet you created.

  1. In the Device source section, enter the device name. The IoT name needs to match the name you used earlier—in this case, –thing-name MyGreengrassCore.

You can register additional Snowball devices on the SageMaker console to add them to the device fleet, which allows you to group and manage these devices together.

Deploy ML models to Snowball Edge using AWS IoT Greengrass

In the previous sections, you unlocked and configured your Snowball Edge device. The ML model is now compiled and optimized for performance on Snowball Edge. An Edge Manager package is created with the compiled model and the Snowball device is registered to a fleet. In this section, you look at the steps involved in deploying the ML model for inference to Snowball Edge with the latest AWS IoT Greengrass.

Components

AWS IoT Greengrass allows you to deploy to edge devices as a combination of components and associated artifacts. Components are JSON documents that contain the metadata, the lifecycle, what to deploy when, and what to install. Components also define what operating system to use and what artifacts to use when running on different OS options.

Artifacts

Artifacts can be code files, models, or container images. For example, a component can be defined to install a pandas Python library and run a code file that will transform the data, or to install a TensorFlow library and run the model for inference. The following are example artifacts needed for an inference application deployment:

  • gRPC proto and Python stubs (this can be different based on your model and framework)
  • Python code to load the model and perform inference

These two items are uploaded to an S3 bucket.

Deploy the components

The deployment needs the following components:

  • Edge Manager agent (available in public components at GA)
  • Model
  • Application

Complete the following steps to deploy the components:

  1. On the AWS IoT console, under Greengrass, choose Components, and create the application component.
  2. Find the Edge Manager agent component in the public components list and deploy it.
  3. Deploy a model component created by Edge Manager, which is used as a dependency in the application component.
  4. Deploy the application component to the edge device by going to the list of AWS IoT Greengrass deployments and creating a new deployment.

If you have an existing deployment, you can revise it to add the application component.

Now you can test your component.

  1. In your prediction or inference code deployed with application component, code in the logic to access files locally on the Snowball Edge device (for example, in the incoming folder) and have the predictions or processed files be moved to a processed folder.
  2. Log in to the device to see if the predictions have been made.
  3. Set up the code to run on a loop, checking the incoming folder for new files, processing the files, and moving them to the processed folder.

The following screenshot is an example setup of files before deployment inside the Snowball Edge.

After deployment, all the test images have classes of interest and therefore are moved to the processed folder.

Clean up

To clean up everything or reimplement this solution from scratch, stop all the EC2 instances by invoking the TerminateInstance API against EC2-compatible endpoints running on your Snowball Edge device. To return your Snowball Edge device, see Powering Off the Snowball Edge and Returning the Snowball Edge Device.

Conclusion

This post walked you through how to order a Snowball Edge device with an AMI of your choice. You then compile a model for the edge using SageMaker, package that model using Edge Manager, and create and run components with artifacts to perform ML inference on Snowball Edge using the latest AWS IoT Greengrass. With Edge Manager, you can deploy and update your ML models on a fleet of Snowball Edge devices, and monitor performance at the edge with saved input and prediction data on Amazon S3. You can also run these components as long-running AWS Lambda functions that can spin up a model and wait for data to do inference.

You combine several features of AWS IoT Greengrass to create an MQTT client and use a pub/sub model to invoke other services or microservices. The possibilities are endless.

By running ML inference on Snowball Edge with Edge Manager and AWS IoT Greengrass, you can optimize, secure, monitor, and maintain ML models on fleets of Snowball Edge devices. Thanks for reading and please do not hesitate to leave questions or comments in the comments section.

To learn more about AWS Snow Family, AWS IoT Greengrass, and Edge Manager, check out the following:


About the Authors

Raj Kadiyala is an AI/ML Tech Business Development Manager in AWS WWPS Partner Organization. Raj has over 12 years of experience in Machine Learning and likes to spend his free time exploring machine learning for practical every day solutions and staying active in the great outdoors of Colorado.

 

 

 

Nida Beig is a Sr. Product Manager – Tech at Amazon Web Services where she works on the AWS Snow Family team. She is passionate about understanding customer needs, and using technology as a conductor of transformative thinking to deliver consumer products. Besides work, she enjoys traveling, hiking, and running.

Read More