Define customized permissions in minutes with Amazon SageMaker Role Manager via the AWS CDK

Define customized permissions in minutes with Amazon SageMaker Role Manager via the AWS CDK

Machine learning (ML) administrators play a critical role in maintaining the security and integrity of ML workloads. Their primary focus is to ensure that users operate with the utmost security, adhering to the principle of least privilege. However, accommodating the diverse needs of different user personas and creating appropriate permission policies can sometimes impede agility. To address this challenge, AWS introduced Amazon SageMaker Role Manager in December 2022. SageMaker Role Manager is a powerful tool can you can use to swiftly develop persona-based roles, which can be easily customized to meet specific requirements.

With SageMaker Role Manager, administrators can efficiently define persona-based roles tailored to distinct user groups. This approach ensures that individuals have access only to the resources and actions essential for their tasks, reducing the risk of unauthorized actions or breaches. SageMaker Role Manager also allows for fine-grained customization. ML administrators can tailor the roles to meet specific requirements by modifying the permissions associated with each persona. This flexibility ensures that the permissions align precisely with the tasks and responsibilities of individual users, providing a robust security framework while accommodating unique use cases.

SageMaker Role Manager is currently available on the Amazon SageMaker console of all commercial Regions. Today, we are launching the ability to define customized permissions in minutes with SageMaker Role Manager via the AWS Cloud Development Kit (AWS CDK). This addresses a critical obstacle to wider adoption because ML administrators can now automate their tasks programmatically. With the power of the AWS CDK, ML administrators can streamline workflows, reduce manual efforts, and ensure consistency in managing permissions for their ML infrastructure.

Solution overview

With the release of the SageMaker Role Manager CDK, we are launching two new infrastructure as code (IaC) capabilities:

You can create fine-grained AWS Identity and Access Management (IAM) roles for ML personas such as data scientist, ML engineer, or data engineer. SageMaker Role Manager offers predefined personas and ML activities combined to streamline your permission generation process, allowing your ML practitioners to perform their responsibilities with the least privilege permissions. For secure access to your ML resources, SageMaker Role Manager allows you to specify networking and encryption permissions for Amazon Virtual Private Cloud (Amazon VPC) resources and AWS Key Management Service (AWS KMS) encryption keys. Furthermore, you can customize permissions by attaching your own customer managed policies.

The SageMaker Role Manager CDK lets you define custom permissions for SageMaker users in minutes. It comes with a set of predefined policy templates for different personas and ML activities. Personas represent the different types of users that need permissions to perform ML activities in SageMaker, such as data scientists or MLOps engineers. ML activities are a set of permissions to accomplish a common ML task, such as running Amazon SageMaker Studio applications or managing experiments, models, or pipelines. After you have selected the persona type and the set of ML activities, the SageMaker Role Manager CDK automatically creates the required IAM role and policies that you can assign to SageMaker users. Similarly, you can also create IAM roles with fine-grained permissions for automated jobs such as running SageMaker Pipelines.

Prerequisites

To start using the SageMaker Role Manager CDK, you need to complete the following prerequisite steps:

  1. Set up a role for your ML administrator to create and manage personas, as well as the IAM permissions for those users. For a sample admin policy, refer to the prerequisite section in Define customized permissions in minutes with Amazon SageMaker Role Manager blog post.
  2. Create a compute-only persona role (if you don’t have any) for passing to jobs and endpoints. For instructions to set up that role, refer to Using the role manager.
  3. Set up your AWS CDK development environment. For instructions, refer to Getting started with the AWS CDK.

Install and run the SageMaker Role Manager CDK

Complete the following steps to set up the SageMaker Role Manager CDK:

  1. Create your AWS CDK app and give it a name; for example, RoleManager.
  2. Navigate to the RoleManager folder and run the following command to create a blank typescript AWS CDK project:
    cdk init app --language typescript

  3. Open package.json and add the highlighted package as shown in the following code:
    "dependencies": {
        "aws-cdk-lib": "2.85.0",
        "@cdklabs/cdk-aws-sagemaker-role-manager": "0.0.15",
        "constructs": "^10.0.0",
        "source-map-support": "^0.5.21"
      }

  4. Run the following command to install the new cdk-aws-sagemaker-role-manager package:
    npm install

  5. Navigate to the lib folder and replace role_manager_stack.ts with the following code:
    import * as cdk from 'aws-cdk-lib';
    import { Construct } from 'constructs';
    import * as iam from 'aws-cdk-lib/aws-iam';
    import { Activity } from '@cdklabs/cdk-aws-sagemaker-role-manager';
    
    export class RoleManagerStack extends cdk.Stack {
      constructor(scope: Construct, id: string, props?: cdk.StackProps) {
        super(scope, id, props);
    
        const activity = Activity.manageJobs(this, 'id1', {
            rolesToPass: [iam.Role.fromRoleName(this, 'passRoleId', 'passRoleName')],
        });
        
        activity.createRole(this, 'newRoleId', 'newRoleName', newRoleDescription');
        
      }
    }

  6. Replace passRoleId, passRoleName, newRoleId, newRoleName, and newRoleDescription based on your requirements for role creation.
  7. Navigate back to your AWS CDK app home folder and run the following command to verify the generated AWS CloudFormation template:
    cdk synth

  8. Finally, run the following command to run the CloudFormation stack in your AWS account:
    cdk deploy

You should see an AWS CDK deployment output similar to the one in the following screenshot.

More SageMaker Role Manager CDK examples are available in the following GitHub repo.

ML persona and activity CDK reference

Administrators can define ML activities using one of the ML activity static functions of the ML activity class. For a list of the latest versions, refer to ML activity reference.

The ML persona class supports the following methods:

  • customizeVPC(subnets, securityGroups) – Customizes the VPC of all activities that support VPC customization of personas.
  • customizeKMS(dataKeys, volumeKeys) – Customizes KMS keys of all activities that support KMS key customization of personas.
  • createRole(scope, id, roleNameSuffix, roleDescription) – Creates a role with the persona’s activities’ permissions similar to the UI in the scope with ID, with the name SageMaker-${roleNameSuffix} and optionally with the passed role description.
  • grantPermissionsTo(identity) – Grants the persona’s activities’ permissions to the identity. The passed identity can be a role or an AWS resource associated with a role (for example, a Lambda function with the role of the Lambda function describing which resources the Lambda function can access).
  • grantPermissionsTo() – Updates the role of the passed identity to have the permissions specified in the ML activity.

The ML activity class supports the same set of functions as ML personas; however, the difference is an ML activity is constrained to a single activity when using this interface to create IAM roles.

Conclusion

SageMaker Role Manager enables you to create customized roles based on personas, pre-built ML activities, and custom policies, significantly reducing the time required. Now, with this latest AWS CDK support, the ability to define roles is further expanded to support infrastructure as code. This empowers ML practitioners to work programmatically in SageMaker, enhancing efficiency and enabling seamless integration into their workflows.

We would like to hear from you on how this new feature is helping you. Try out the new AWS CDK support for SageMaker Role Manager and send us your feedback!

To learn more about how to use SageMaker Role Manager, refer to the SageMaker Role Manager Developer Guide.


About The Authors

Akash Bhatia is a Principal Solution Architect with experience spanning multiple industries, including Manufacturing, Automotive, Retail ,and Space and Technology. Currently working in Amazon Web Services Enterprise Segments, Akash works closely with a diverse range of clients, including Fortune 100 companies and start-ups, to facilitate their cloud migration journey. In addition to his technical expertise, Akash has led product and program management, having successfully overseen numerous large-scale initiatives throughout his career.

Ram VittalRam Vittal is a Principal ML Solutions Architect at AWS. He has over 20 years of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he enjoys riding motorcycle, playing tennis, and photography.

Ozan Eken is a Senior Product Manager at Amazon Web Services. He has over 15 years of experience in consulting and product management. He is passionate about building governance products, and Admin capabilities in Machine Learning for enterprise customers. Outside of work, he likes exploring different outdoor activities and watching soccer.

Read More

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake

Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code.

SageMaker Data Wrangler supports Snowflake, a popular data source for users who want to perform ML. We launch the Snowflake direct connection from the SageMaker Data Wrangler in order to improve the customer experience. Before the launch of this feature, administrators were required to set up the initial storage integration to connect with Snowflake to create features for ML in Data Wrangler. This includes provisioning Amazon Simple Storage Service (Amazon S3) buckets, AWS Identity and Access Management (IAM) access permissions, Snowflake storage integration for individual users, and an ongoing mechanism to manage or clean up data copies in Amazon S3. This process is not scalable for customers with strict data access control and a large number of users.

In this post, we show how Snowflake’s direct connection in SageMaker Data Wrangler simplifies the administrator’s experience and data scientist’s ML journey from data to business insights.

Solution overview

In this solution, we use SageMaker Data Wrangler to speed up data preparation for ML and Amazon SageMaker Autopilot to automatically build, train, and fine-tune the ML models based on your data. Both services are designed specifically to increase productivity and shorten time to value for ML practitioners. We also demonstrate the simplified data access from SageMaker Data Wrangler to Snowflake with direct connection to query and create features for ML.

Refer to the diagram below for an overview of the low-code ML process with Snowflake, SageMaker Data Wrangler, and SageMaker Autopilot.

The workflow includes the following steps:

  1. Navigate to SageMaker Data Wrangler for your data preparation and feature engineering tasks.
    • Set up the Snowflake connection with SageMaker Data Wrangler.
    • Explore your Snowflake tables in SageMaker Data Wrangler, create a ML dataset, and perform feature engineering.
  2. Train and test the models using SageMaker Data Wrangler and SageMaker Autopilot.
  3. Load the best model to a real-time inference endpoint for predictions.
  4. Use a Python notebook to invoke the launched real-time inference endpoint.

Prerequisites

For this post, the administrator needs the following prerequisites:

Data scientists should have the following prerequisites

Lastly, you should prepare your data for Snowflake

  • We use credit card transaction data from Kaggle to build ML models for detecting fraudulent credit card transactions, so customers are not charged for items that they didn’t purchase. The dataset includes credit card transactions in September 2013 made by European cardholders.
  • You should use the SnowSQL client and install it in your local machine, so you can use it to upload the dataset to a Snowflake table.

The following steps show how to prepare and load the dataset into the Snowflake database. This is a one-time setup.

Snowflake table and data preparation

Complete the following steps for this one-time setup:

  1. First, as the administrator, create a Snowflake virtual warehouse, user, and role, and grant access to other users such as the data scientists to create a database and stage data for their ML use cases:
    -- Use the role SECURITYADMIN to create Role and User
    USE ROLE SECURITYADMIN;
    
    -- Create a new role 'ML Role'
    CREATE OR REPLACE ROLE ML_ROLE COMMENT='ML Role';
    GRANT ROLE ML_ROLE TO ROLE SYSADMIN;
    
    -- Create a new user and password and grant the role to the user
    CREATE OR REPLACE USER ML_USER PASSWORD='<REPLACE_PASSWORD>'
    DEFAULT_ROLE=ML_ROLE
    DEFAULT_WAREHOUSE=ML_WH
    DEFAULT_NAMESPACE=ML_WORKSHOP.PUBLIC
    COMMENT='ML User';
    GRANT ROLE ML_ROLE TO USER ML_USER;
    
    -- Grant privliges to role
    USE ROLE ACCOUNTADMIN;
    GRANT CREATE DATABASE ON ACCOUNT TO ROLE ML_ROLE;
    
    --Create Warehouse for AI/ML work
    USE ROLE SYSADMIN;
    
    CREATE OR REPLACE WAREHOUSE ML_WH
    WITH WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 120 AUTO_RESUME = true INITIALLY_SUSPENDED = TRUE;
    
    GRANT ALL ON WAREHOUSE ML_WH TO ROLE ML_ROLE;
    

  2. As the data scientist, let’s now create a database and import the credit card transactions into the Snowflake database to access the data from SageMaker Data Wrangler. For illustration purposes, we create a Snowflake database named SF_FIN_TRANSACTION:
    -- Select the role and the warehouse
    USE ROLE ML_ROLE;
    USE WAREHOUSE ML_WH;
    
    -- Create the DB to import the financial transactions
    CREATE DATABASE IF NOT EXISTS sf_fin_transaction;
    
    -- Create CSV File Format
    create or replace file format my_csv_format
    type = csv
    field_delimiter = ','
    skip_header = 1
    null_if = ('NULL', 'null')
    empty_field_as_null = true
    compression = gzip;
    

  3. Download the dataset CSV file to your local machine and create a stage to load the data into the database table. Update the file path to point to the downloaded dataset location before running the PUT command for importing the data to the created stage:
    -- Create a Snowflake named internal stage to store the transactions csv file
    CREATE OR REPLACE STAGE my_stage
    FILE_FORMAT = my_csv_format;
    
    -- Import the file in to the stage
    -- This command needs be run from SnowSQL client and not on WebUI
    PUT file:///Users/*******/Downloads/creditcard.csv @my_stage;
    
    -- Check whether the import was successful
    LIST @my_stage;
    

  4. Create a table named credit_card_transactions:
    -- Create table and define the columns mapped to the csv transactions file
    create or replace table credit_card_transaction (
    Time integer,
    V1 float, V2 float, V3 float,
    V4 float, V5 float, V6 float,
    V7 float, V8 float, V9 float,
    V10 float,V11 float,V12 float,
    V13 float,V14 float,V15 float,
    V16 float,V17 float,V18 float,
    V19 float,V20 float,V21 float,
    V22 float,V23 float,V24 float,
    V25 float,V26 float,V27 float,
    V28 float,Amount float,
    Class varchar(5)
    );
    

  5. Import the data into the created table from the stage:
    -- Import the transactions in to a new table named 'credit_card_transaction'
    copy into credit_card_transaction from @my_stage ON_ERROR = CONTINUE;
    
    -- Check whether the table was successfully created
    select * from credit_card_transaction limit 100;

Set up the SageMaker Data Wrangler and Snowflake connection

After we prepare the dataset to use with SageMaker Data Wrangler, let us create a new Snowflake connection in SageMaker Data Wrangler to connect to the sf_fin_transaction database in Snowflake and query the credit_card_transaction table:

  1. Choose Snowflake on the SageMaker Data Wrangler Connection page.
  2. Provide a name to identify your connection.
  3. Select your authentication method to connect with the Snowflake database:
    • If using basic authentication, provide the user name and password shared by your Snowflake administrator. For this post, we use basic authentication to connect to Snowflake using the user credentials we created in the previous step.
    • If you are using OAuth, provide your identity provider credentials.

SageMaker Data Wrangler by default queries your data directly from Snowflake without creating any data copies in S3 buckets. SageMaker Data Wrangler’s new usability enhancement uses Apache Spark to integrate with Snowflake to prepare and seamlessly create a dataset for your ML journey.

So far, we have created the database on Snowflake, imported the CSV file into the Snowflake table, created Snowflake credentials, and created a connector on SageMaker Data Wrangler to connect to Snowflake. To validate the configured Snowflake connection, run the following query on the created Snowflake table:

select * from credit_card_transaction;

Note that the storage integration option that was required before is now optional in the advanced settings.

Explore Snowflake data

After you validate the query results, choose Import to save the query results as the dataset. We use this extracted dataset for exploratory data analysis and feature engineering.

You can choose to sample the data from Snowflake in the SageMaker Data Wrangler UI. Another option is to download complete data for your ML model training use cases using SageMaker Data Wrangler processing jobs.

Perform exploratory data analysis in SageMaker Data Wrangler

The data within Data Wrangler needs to be engineered before it can be trained. In this section, we demonstrate how to perform feature engineering on the data from Snowflake using SageMaker Data Wrangler’s built-in capabilities.

First, let’s use the Data Quality and Insights Report feature within SageMaker Data Wrangler to generate reports to automatically verify the data quality and detect abnormalities in the data from Snowflake.

You can use the report to help you clean and process your data. It gives you information such as the number of missing values and the number of outliers. If you have issues with your data, such as target leakage or imbalance, the insights report can bring those issues to your attention. To understand the report details, refer to Accelerate data preparation with data quality and insights in Amazon SageMaker Data Wrangler.

After you check out the data type matching applied by SageMaker Data Wrangler, complete the following steps:

  1. Choose the plus sign next to Data types and choose Add analysis.
  2. For Analysis type, choose Data Quality and Insights Report.
  3. Choose Create.
  4. Refer to the Data Quality and Insights Report details to check out high-priority warnings.

You can choose to resolve the warnings reported before proceeding with your ML journey.

The target column Class to be predicted is classified as a string. First, let’s apply a transformation to remove the stale empty characters.

  1. Choose Add step and choose Format string.
  2. In the list of transforms, choose Strip left and right.
  3. Enter the characters to remove and choose Add.

Next, we convert the target column Class from the string data type to Boolean because the transaction is either legitimate or fraudulent.

  1. Choose Add step.
  2. Choose Parse column as type.
  3. For Column, choose Class.
  4. For From, choose String.
  5. For To, choose Boolean.
  6. Choose Add.

After the target column transformation, we reduce the number of feature columns, because there are over 30 features in the original dataset. We use Principal Component Analysis (PCA) to reduce the dimensions based on feature importance. To understand more about PCA and dimensionality reduction, refer to Principal Component Analysis (PCA) Algorithm.

  1. Choose Add step.
  2. Choose Dimensionality Reduction.
  3. For Transform, choose Principal component analysis.
  4. For Input columns, choose all the columns except the target column Class.
  5. Choose the plus sign next to Data flow and choose Add analysis.
  6. For Analysis type, choose Quick Model.
  7. For Analysis name, enter a name.
  8. For Label, choose Class.
  9. Choose Run.

Based on the PCA results, you can decide which features to use for building the model. In the following screenshot, the graph shows the features (or dimensions) ordered based on highest to lowest importance to predict the target class, which in this dataset is whether the transaction is fraudulent or valid.

You can choose to reduce the number of features based on this analysis, but for this post, we leave the defaults as is.

This concludes our feature engineering process, although you may choose to run the quick model and create a Data Quality and Insights Report again to understand the data before performing further optimizations.

Export data and train the model

In the next step, we use SageMaker Autopilot to automatically build, train, and tune the best ML models based on your data. With SageMaker Autopilot, you still maintain full control and visibility of your data and model.

Now that we have completed the exploration and feature engineering, let’s train a model on the dataset and export the data to train the ML model using SageMaker Autopilot.

  1. On the Training tab, choose Export and train.

We can monitor the export progress while we wait for it to complete.

Let’s configure SageMaker Autopilot to run an automated training job by specifying the target we want to predict and the type of problem. In this case, because we’re training the dataset to predict whether the transaction is fraudulent or valid, we use binary classification.

  1. Enter a name for your experiment, provide the S3 location data, and choose Next: Target and features.
  2. For Target, choose Class as the column to predict.
  3. Choose Next: Training method.

Let’s allow SageMaker Autopilot to decide the training method based on the dataset.

  1. For Training method and algorithms, select Auto.

To understand more about the training modes supported by SageMaker Autopilot, refer to Training modes and algorithm support.

  1. Choose Next: Deployment and advanced settings.
  2. For Deployment option, choose Auto deploy the best model with transforms from Data Wrangler, which loads the best model for inference after the experimentation is complete.
  3. Enter a name for your endpoint.
  4. For Select the machine learning problem type, choose Binary classification.
  5. For Objection metric, choose F1.
  6. Choose Next: Review and create.
  7. Choose Create experiment.

This starts an SageMaker Autopilot job that creates a set of training jobs that uses combinations of hyperparameters to optimize the objective metric.

Wait for SageMaker Autopilot to finish building the models and evaluation of the best ML model.

Launch a real-time inference endpoint to test the best model

SageMaker Autopilot runs experiments to determine the best model that can classify credit card transactions as legitimate or fraudulent.

When SageMaker Autopilot completes the experiment, we can view the training results with the evaluation metrics and explore the best model from the SageMaker Autopilot job description page.

  1. Select the best model and choose Deploy model.

We use a real-time inference endpoint to test the best model created through SageMaker Autopilot.

  1. Select Make real-time predictions.

When the endpoint is available, we can pass the payload and get inference results.

Let’s launch a Python notebook to use the inference endpoint.

  1. On the SageMaker Studio console, choose the folder icon in the navigation pane and choose Create notebook.
  2. Use the following Python code to invoke the deployed real-time inference endpoint:
    # Library imports
    import os
    import io
    import boto3
    import json
    import csv
    
    #: Define the endpoint's name.
    ENDPOINT_NAME = 'SnowFlake-FraudDetection' # replace the endpoint name as per your config
    runtime = boto3.client('runtime.sagemaker')
    
    #: Define a test payload to send to your endpoint.
    payload = {
        "body": {
        "TIME": 152895,
        "V1": 2.021155535,
        "V2": 0.05372872624,
        "V3": -1.620399104,
        "V4": 0.3530165253,
        "V5": 0.3048483853,
        "V6": -0.6850955461,
        "V7": 0.02483335885,
        "V8": -0.05101346021,
        "V9": 0.3550896835,
        "V10": -0.1830053153,
        "V11": 1.148091498,
        "V12": 0.4283365505,
        "V13": -0.9347237892,
        "V14": -0.4615291327,
        "V15": -0.4124343184,
        "V16": 0.4993445934,
        "V17": 0.3411548305,
        "V18": 0.2343833846,
        "V19": 0.278223588,
        "V20": -0.2104513475,
        "V21": -0.3116427235,
        "V22": -0.8690778214,
        "V23": 0.3624146958,
        "V24": 0.6455923598,
        "V25": -0.3424913329,
        "V26": 0.1456884618,
        "V27": -0.07174890419,
        "V28": -0.040882382,
        "AMOUNT": 0.27
        }
    }
    
    #: Submit an API request and capture the response object.
    response = runtime.invoke_endpoint(
        EndpointName=ENDPOINT_NAME,
        ContentType='text/csv',
        Body=str(payload)
    )
    
    #: Print the model endpoint's output.
    print(response['Body'].read().decode()) 
    

The output shows the result as false, which implies the sample feature data is not fraudulent.

Clean up

To make sure you don’t incur charges after completing this tutorial, shut down the SageMaker Data Wrangler application and shut down the notebook instance used to perform inference. You should also delete the inference endpoint you created using SageMaker Autopilot to prevent additional charges.

Conclusion

In this post, we demonstrated how to bring your data from Snowflake directly without creating any intermediate copies in the process. You can either sample or load your complete dataset to SageMaker Data Wrangler directly from Snowflake. You can then explore the data, clean the data, and perform featuring engineering using SageMaker Data Wrangler’s visual interface.

We also highlighted how you can easily train and tune a model with SageMaker Autopilot directly from the SageMaker Data Wrangler user interface. With SageMaker Data Wrangler and SageMaker Autopilot integration, we can quickly build a model after completing feature engineering, without writing any code. Then we referenced SageMaker Autopilot’s best model to run inferences using a real-time endpoint.

Try out the new Snowflake direct integration with SageMaker Data Wrangler today to easily build ML models with your data using SageMaker.


About the authors

Hariharan Suresh is a Senior Solutions Architect at AWS. He is passionate about databases, machine learning, and designing innovative solutions. Prior to joining AWS, Hariharan was a product architect, core banking implementation specialist, and developer, and worked with BFSI organizations for over 11 years. Outside of technology, he enjoys paragliding and cycling.

Aparajithan Vaidyanathan is a Principal Enterprise Solutions Architect at AWS. He supports enterprise customers migrate and modernize their workloads on AWS cloud. He is a Cloud Architect with 23+ years of experience designing and developing enterprise, large-scale and distributed software systems. He specializes in Machine Learning & Data Analytics with focus on Data and Feature Engineering domain. He is an aspiring marathon runner and his hobbies include hiking, bike riding and spending time with his wife and two boys.

Tim Song is a Software Development Engineer at AWS SageMaker, with 10+ years of experience as software developer, consultant and tech leader he has demonstrated ability to deliver scalable and reliable products and solve complex problems. In his spare time, he enjoys the nature, outdoor running, hiking and etc.

Bosco Albuquerque is a Sr. Partner Solutions Architect at AWS and has over 20 years of experience in working with database and analytics products from enterprise database vendors and cloud providers. He has helped large technology companies design data analytics solutions and has led engineering teams in designing and implementing data analytics platforms and data products.

Read More

Deploy a serverless ML inference endpoint of large language models using FastAPI, AWS Lambda, and AWS CDK

Deploy a serverless ML inference endpoint of large language models using FastAPI, AWS Lambda, and AWS CDK

For data scientists, moving machine learning (ML) models from proof of concept to production often presents a significant challenge. One of the main challenges can be deploying a well-performing, locally trained model to the cloud for inference and use in other applications. It can be cumbersome to manage the process, but with the right tool, you can significantly reduce the required effort.

Amazon SageMaker inference, which was made generally available in April 2022, makes it easy for you to deploy ML models into production to make predictions at scale, providing a broad selection of ML infrastructure and model deployment options to help meet all kinds of ML inference needs. You can use SageMaker Serverless Inference endpoints for workloads that have idle periods between traffic spurts and can tolerate cold starts. The endpoints scale out automatically based on traffic and take away the undifferentiated heavy lifting of selecting and managing servers. Additionally, you can use AWS Lambda directly to expose your models and deploy your ML applications using your preferred open-source framework, which can prove to be more flexible and cost-effective.

FastAPI is a modern, high-performance web framework for building APIs with Python. It stands out when it comes to developing serverless applications with RESTful microservices and use cases requiring ML inference at scale across multiple industries. Its ease and built-in functionalities like the automatic API documentation make it a popular choice amongst ML engineers to deploy high-performance inference APIs. You can define and organize your routes using out-of-the-box functionalities from FastAPI to scale out and handle growing business logic as needed, test locally and host it on Lambda, then expose it through a single API gateway, which allows you to bring an open-source web framework to Lambda without any heavy lifting or refactoring your codes.

This post shows you how to easily deploy and run serverless ML inference by exposing your ML model as an endpoint using FastAPI, Docker, Lambda, and Amazon API Gateway. We also show you how to automate the deployment using the AWS Cloud Development Kit (AWS CDK).

Solution overview

The following diagram shows the architecture of the solution we deploy in this post.

Scope of Solution

Prerequisites

You must have the following prerequisites:

  • Python3 installed, along with virtualenv for creating and managing virtual environments in Python
  • aws-cdk v2 installed on your system in order to be able to use the AWS CDK CLI
  • Docker installed and running on your local machine

Test if all the necessary software is installed:

  1. The AWS Command Line Interface (AWS CLI) is needed. Log in to your account and choose the Region where you want to deploy the solution.
  2. Use the following code to check your Python version:
    python3 --version

  3. Check if virtualenv is installed for creating and managing virtual environments in Python. Strictly speaking, this is not a hard requirement, but it will make your life easier and helps follow along with this post more easily. Use the following code:
    python3 -m virtualenv --version

  4. Check if cdk is installed. This will be used to deploy our solution.
    cdk --version

  5. Check if Docker is installed. Our solution will make your model accessible through a Docker image to Lambda. To build this image locally, we need Docker.
    docker --version

  6. Make sure Docker is up and running with the following code:
    docker ps

How to structure your FastAPI project using AWS CDK

We use the following directory structure for our project (ignoring some boilerplate AWS CDK code that is immaterial in the context of this post):

```

fastapi_model_serving
│
└───.venv
│
└───fastapi_model_serving
│   │   __init__.py
│   │   fastapi_model_serving_stack.py
│   │
│   └───model_endpoint
│       └───docker
│       │      Dockerfile
│       │      serving_api.tar.gz
│
│
│       └───runtime
│            └───serving_api
│                    requirements.txt
│                    serving_api.py
│                └───custom_lambda_utils
│                     └───model_artifacts
│                            ...
│                     └───scripts
│                            inference.py
│
└───templates
│   └───api
│   │     api.py
│   └───dummy
│         dummy.py
│
│ app.py
│   cdk.json
│   README.md
│   requirements.txt
│   init-lambda-code.sh

```

The directory follows the recommended structure of AWS CDK projects for Python.

The most important part of this repository is the fastapi_model_serving directory. It contains the code that will define the AWS CDK stack and the resources that are going to be used for model serving.

The fastapi_model_serving directory contains the model_endpoint subdirectory, which contains all the assets necessary that make up our serverless endpoint, namely the Dockerfile to build the Docker image that Lambda will use, the Lambda function code that uses FastAPI to handle inference requests and route them to the correct endpoint, and the model artifacts of the model that we want to deploy. model_endpoint also contains the following:

  • Docker– This subdirectory contains the following:
  • Dockerfile – This is used to build the image for the Lambda function with all the artifacts (Lambda function code, model artifacts, and so on) in the right place so that they can be used without issues.
  • serving.api.tar.gz – This is a tarball that contains all the assets from the runtime folder that are necessary for building the Docker image. We discuss how to create the .tar.gz file later in this post.
  • runtime– This subdirectory contains the following:
  • serving_api – The code for the Lambda function and its dependencies specified in the requirements.txt file.
  • custom_lambda_utils – This includes an inference script that loads the necessary model artifacts so that the model can be passed to the serving_api that will then expose it as an endpoint.

Additionally, we have the template directory, which provides a template of folder structures and files where you can define your customized codes and APIs following the sample we went through earlier. The template directory contains dummy code that you can use to create new Lambda functions:

  • dummy – Contains the code that implements the structure of an ordinary Lambda function using the Python runtime
  • api – Contains the code that implements a Lambda function that wraps a FastAPI endpoint around an existing API gateway

Deploy the solution

By default, the code is deployed inside the eu-west-1 region. If you want to change the Region, you can change the DEPLOYMENT_REGION context variable in the cdk.json file.

Keep in mind, however, that the solution tries to deploy a Lambda function on top of the arm64 architecture, and that this feature might not be available in all Regions. In this case, you need to change the architecture parameter in the fastapi_model_serving_stack.py file, as well as the first line of the Dockerfile inside the Docker directory, to host this solution on the x86 architecture.

To deploy the solution, complete the following steps:

  1. Run the following command to clone the GitHub repository: git clone https://github.com/aws-samples/lambda-serverless-inference-fastapiBecause we want to showcase that the solution can work with model artifacts that you train locally, we contain a sample model artifact of a pretrained DistilBERT model on the Hugging Face model hub for a question answering task in the serving_api.tar.gz file. The download time can take around 3–5 minutes. Now, let’s set up the environment.
  2. Download the pretrained model that will be deployed from the Hugging Face model hub into the ./model_endpoint/runtime/serving_api/custom_lambda_utils/model_artifacts directory. It also creates a virtual environment and installs all dependencies that are needed. You only need to run this command once: make prep. This command can take around 5 minutes (depending on your internet bandwidth) because it needs to download the model artifacts.
  3. Package the model artifacts inside a .tar.gz archive that will be used inside the Docker image that is built in the AWS CDK stack. You need to run this code whenever you make changes to the model artifacts or the API itself to always have the most up-to-date version of your serving endpoint packaged: make package_model. The artifacts are all in place. Now we can deploy the AWS CDK stack to your AWS account.
  4. Run cdk bootstrap if it’s your first time deploying an AWS CDK app into an environment (account + Region combination):
    make cdk_bootstrap

    This stack includes resources that are needed for the toolkit’s operation. For example, the stack includes an Amazon Simple Storage Service (Amazon S3) bucket that is used to store templates and assets during the deployment process.

    Because we’re building Docker images locally in this AWS CDK deployment, we need to ensure that the Docker daemon is running before we can deploy this stack via the AWS CDK CLI.

  5. To check whether or not the Docker daemon is running on your system, use the following command:
    docker ps

    If you don’t get an error message, you should be ready to deploy the solution.

  6. Deploy the solution with the following command:
    make deploy

    This step can take around 5–10 minutes due to building and pushing the Docker image.

Troubleshooting

If you’re a Mac user, you may encounter an error when logging into Amazon Elastic Container Registry (Amazon ECR) with the Docker login, such as Error saving credentials ... not implemented. For example:

exited with error code 1: Error saving credentials: error storing credentials - err: exit status 1,...dial unix backend.sock: connect: connection refused

Before you can use Lambda on top of Docker containers inside the AWS CDK, you may need to change the ~/docker/config.json file. More specifically, you might have to change the credsStore parameter in ~/.docker/config.json to osxkeychain. That solves Amazon ECR login issues on a Mac.

Run real-time inference

After your AWS CloudFormation stack is deployed successfully, go to the Outputs tab for your stack on the AWS CloudFormation console and open the endpoint URL. Now our model is accessible via the endpoint URL and we’re ready to run real-time inference.

Navigate to the URL to see if you can see “hello world” message and add /docs to the address to see if you can see the interactive swagger UI page successfully. There might be some cold start time, so you may need to wait or refresh a few times.

FastAPI Docs web page

After you log in to the landing page of the FastAPI swagger UI page, you can run via the root / or via /question.

From /, you could run the API and get the “hello world” message.

From /question, you could run the API and run ML inference on the model we deployed for a question answering case. For example, we use the question is What is the color of my car now? and the context is My car used to be blue but I painted red.

FastAPI web page question

When you choose Execute, based on the given context, the model will answer the question with a response, as shown in the following screenshot.

Execute result

In the response body, you can see the answer with the confidence score from the model. You could also experiment with other examples or embed the API in your existing application.

Alternatively, you can run the inference via code. Here is one example written in Python, using the requests library:

import requests

url = "https://<YOUR_API_GATEWAY_ENDPOINT_ID>.execute-api.<YOUR_ENDPOINT_REGION>.amazonaws.com/prod/question?question="What is the color of my car now?"&context="My car used to be blue but I painted red""

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

The code outputs a string similar to the following:

'{"score":0.6947233080863953,"start":38,"end":41,"answer":"red"}'

If you are interested in knowing more about deploying Generative AI and large language models on AWS, check out here:

Clean up

Inside the root directory of your repository, run the following code to clean up your resources:

make destroy

Conclusion

In this post, we introduced how you can use Lambda to deploy your trained ML model using your preferred web application framework, such as FastAPI. We provided a detailed code repository that you can deploy, and you retain the flexibility of switching to whichever trained model artifacts you process. The performance can depend on how you implement and deploy the model.

You are welcome to try it out yourself, and we’re excited to hear your feedback!


About the Authors

Tingyi Li is an Enterprise Solutions Architect from AWS based out in Stockholm, Sweden supporting the Nordics customers. She enjoys helping customers with the architecture, design, and development of cloud-optimized infrastructure solutions. She is specialized in AI and Machine Learning and is interested in empowering customers with intelligence in their AI/ML applications. In her spare time, she is also a part-time illustrator who writes novels and plays the piano.

demir_headshotDemir Catovic is a Machine Learning Engineer from AWS based in Zurich, Switzerland. He engages with customers and helps them implement scalable and fully-functional ML applications. He is passionate about building and productionizing machine learning applications for customers and is always keen to explore around new trends and cutting-edge technologies in the AI/ML world.

Read More

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

How Light & Wonder built a predictive maintenance solution for gaming machines on AWS

This post is co-written with Aruna Abeyakoon and Denisse Colin from Light and Wonder (L&W).

Headquartered in Las Vegas, Light & Wonder, Inc. is the leading cross-platform global game company that provides gambling products and services. Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to stream telemetry and machine health data from roughly half a million electronic gaming machines distributed across its casino customer base globally when LnW Connect reaches its full potential. Over 500 machine events are monitored in near-real time to give a full picture of machine conditions and their operating environments. Utilizing data streamed through LnW Connect, L&W aims to create better gaming experience for their end-users as well as bring more value to their casino customers.

Light & Wonder teamed up with the Amazon ML Solutions Lab to use events data streamed from LnW Connect to enable machine learning (ML)-powered predictive maintenance for slot machines. Predictive maintenance is a common ML use case for businesses with physical equipment or machinery assets. With predictive maintenance, L&W can get advanced warning of machine breakdowns and proactively dispatch a service team to inspect the issue. This will reduce machine downtime and avoid significant revenue loss for casinos. With no remote diagnostic system in place, issue resolution by the Light & Wonder service team on the casino floor can be costly and inefficient, while severely degrading the customer gaming experience.

The nature of the project is highly exploratory—this is the first attempt at predictive maintenance in the gaming industry. The Amazon ML Solutions Lab and L&W team embarked on an end-to-end journey from formulating the ML problem and defining the evaluation metrics, to delivering a high-quality solution. The final ML model combines CNN and Transformer, which are the state-of-the-art neural network architectures for modeling sequential machine log data. The post presents a detailed description of this journey, and we hope you will enjoy it as much as we do!

In this post, we discuss the following:

  • How we formulated the predictive maintenance problem as an ML problem with a set of appropriate metrics for evaluation
  • How we prepared data for training and testing
  • Data preprocessing and feature engineering techniques we employed to obtain performant models
  • Performing a hyperparameter tuning step with Amazon SageMaker Automatic Model Tuning
  • Comparisons between the baseline model and the final CNN+Transformer model
  • Additional techniques we used to improve model performance, such as ensembling

Background

In this section, we discuss the issues that necessitated this solution.

Dataset

Slot machine environments are highly regulated and are deployed in an air-gapped environment. In LnW Connect, an encryption process was designed to provide a secure and reliable mechanism for the data to be brought into an AWS data lake for predictive modeling. The aggregated files are encrypted and the decryption key is only available in AWS Key Management Service (AWS KMS). A cellular-based private network into AWS is set up through which the files were uploaded into Amazon Simple Storage Service (Amazon S3).

LnW Connect streams a wide range of machine events, such as start of game, end of game, and more. The system collects over 500 different types of events. As shown in the following
, each event is recorded along with a timestamp of when it happened and the ID of the machine recording the event. LnW Connect also records when a machine enters a non-playable state, and it will be marked as a machine failure or breakdown if it doesn’t recover to a playable state within a sufficiently short time span.

Machine ID Event Type ID Timestamp
0 E1 2022-01-01 00:17:24
0 E3 2022-01-01 00:17:29
1000 E4 2022-01-01 00:17:33
114 E234 2022-01-01 00:17:34
222 E100 2022-01-01 00:17:37

In addition to dynamic machine events, static metadata about each machine is also available. This includes information such as machine unique identifier, cabinet type, location, operating system, software version, game theme, and more, as shown in the following table. (All the names in the table are anonymized to protect customer information.)

Machine ID Cabinet Type OS Location Game Theme
276 A OS_Ver0 AA Resort & Casino StormMaiden
167 B OS_Ver1 BB Casino, Resort & Spa UHMLIndia
13 C OS_Ver0 CC Casino & Hotel TerrificTiger
307 D OS_Ver0 DD Casino Resort NeptunesRealm
70 E OS_Ver0 EE Resort & Casino RLPMealTicket

Problem definition

We treat the predictive maintenance problem for slot machines as a binary classification problem. The ML model takes in the historical sequence of machine events and other metadata and predicts whether a machine will encounter a failure in a 6-hour future time window. If a machine will break down within 6 hours, it is deemed a high-priority machine for maintenance. Otherwise, it is low priority. The following figure gives examples of low-priority (top) and high-priority (bottom) samples. We use a fixed-length look-back time window to collect historical machine event data for prediction. Experiments show that longer look-back time windows improve model performance significantly (more details later in this post).

low priority and high priority examples

Modeling challenges

We faced a couple of challenges solving this problem:

  • We have a huge amount event logs that contain around 50 million events a month (from approximately 1,000 game samples). Careful optimization is needed in the data extraction and preprocessing stage.
  • Event sequence modeling was challenging due to the extremely uneven distribution of events over time. A 3-hour window can contain anywhere from tens to thousands of events.
  • Machines are in a good state most of the time and the high-priority maintenance is a rare class, which introduced a class imbalance issue.
  • New machines are added continuously to the system, so we had to make sure our model can handle prediction on new machines that have never been seen in training.

Data preprocessing and feature engineering

In this section, we discuss our methods for data preparation and feature engineering.

Feature engineering

Slot machine feeds are streams of unequally spaced time series events; for example, the number of events in a 3-hour window can range from tens to thousands. To handle this imbalance, we used event frequencies instead of the raw sequence data. A straightforward approach is aggregating the event frequency for the entire look-back window and feeding it into the model. However, when using this representation, the temporal information is lost, and the order of events is not preserved. We instead used temporal binning by dividing the time window into N equal sub-windows and calculating the event frequencies in each. The final features of a time window are the concatenation of all its sub-window features. Increasing the number of bins preserves more temporal information. The following figure illustrates temporal binning on a sample window.

temporal binning on a sample window

First, the sample time window is split into two equal sub-windows (bins); we used only two bins here for simplicity for illustration. Then, the counts of the events E1, E2, E3, and E4 are calculated in each bin. Lastly, they are concatenated and used as features.

Along with the event frequency-based features, we used machine-specific features like software version, cabinet type, game theme, and game version. Additionally, we added features related to the timestamps to capture the seasonality, such as hour of the day and day of the week.

Data preparation

To extract data efficiently for training and testing, we utilize Amazon Athena and the AWS Glue Data Catalog. The events data is stored in Amazon S3 in Parquet format and partitioned according to day/month/hour. This facilitates efficient extraction of data samples within a specified time window. We use data from all machines in the latest month for testing and the rest of the data for training, which helps avoid potential data leakage.

ML methodology and model training

In this section, we discuss our baseline model with AutoGluon and how we built a customized neural network with SageMaker automatic model tuning.

Building a baseline model with AutoGluon

With any ML use case, it’s important to establish a baseline model to be used for comparison and iteration. We used AutoGluon to explore several classic ML algorithms. AutoGluon is easy-to-use AutoML tool that uses automatic data processing, hyperparameter tuning, and model ensemble. The best baseline was achieved with a weighted ensemble of gradient boosted decision tree models. The ease of use of AutoGluon helped us in the discovery stage to navigate quickly and efficiently through a wide range of possible data and ML modeling directions.

Building and tuning a customized neural network model with SageMaker automatic model tuning

After experimenting with different neural networks architectures, we built a customized deep learning model for predictive maintenance. Our model surpassed the AutoGluon baseline model by 121% in recall at 80% precision. The final model ingests historical machine event sequence data, time features such as hour of the day, and static machine metadata. We utilize SageMaker automatic model tuning jobs to search for the best hyperparameters and model architectures.

The following figure shows the model architecture. We first normalize the binned event sequence data by average frequencies of each event in the training set to remove the overwhelming effect of high-frequency events (start of game, end of game, and so on). The embeddings for individual events are learnable, while the temporal feature embeddings (day of the week, hour of the day) are extracted using the package GluonTS. Then we concatenate the event sequence data with the temporal feature embeddings as the input to the model. The model consists of the following layers:

  • Convolutional layers (CNN) – Each CNN layer consists of two 1-dimensional convolutional operations with residual connections. The output of each CNN layer has the same sequence length as the input to allow for easy stacking with other modules. The total number of CNN layers is a tunable hyperparameter.
  • Transformer encoder layers (TRANS) – The output of the CNN layers is fed together with the positional encoding to a multi-head self-attention structure. We use TRANS to directly capture temporal dependencies instead of using recurrent neural networks. Here, binning of the raw sequence data (reducing length from thousands to hundreds) helps alleviate the GPU memory bottlenecks, while keeping the chronological information to a tunable extent (the number of the bins is a tunable hyperparameter).
  • Aggregation layers (AGG) – The final layer combines the metadata information (game theme type, cabinet type, locations) to produce the priority level probability prediction. It consists of several pooling layers and fully connected layers for incremental dimension reduction. The multi-hot embeddings of metadata are also learnable, and don’t go through CNN and TRANS layers because they don’t contain sequential information.

customized neural network model architecture

We use the cross-entropy loss with class weights as tunable hyperparameters to adjust for the class imbalance issue. In addition, the numbers of CNN and TRANS layers are crucial hyperparameters with the possible values of 0, which means specific layers may not always exist in the model architecture. This way, we have a unified framework where the model architectures are searched along with other usual hyperparameters.

We utilize SageMaker automatic model tuning, also known as hyperparameter optimization (HPO), to efficiently explore model variations and the large search space of all hyperparameters. Automatic model tuning receives the customized algorithm, training data, and hyperparameter search space configurations, and searches for best hyperparameters using different strategies such as Bayesian, Hyperband, and more with multiple GPU instances in parallel. After evaluating on a hold-out validation set, we obtained the best model architecture with two layers of CNN, one layer of TRANS with four heads, and an AGG layer.

We used the following hyperparameter ranges to search for the best model architecture:

hyperparameter_ranges = {
# Learning Rate
"learning_rate": ContinuousParameter(5e-4, 1e-3, scaling_type="Logarithmic"),
# Class weights
"loss_weight": ContinuousParameter(0.1, 0.9),
# Number of input bins
"num_bins": CategoricalParameter([10, 40, 60, 120, 240]),
# Dropout rate
"dropout_rate": CategoricalParameter([0.1, 0.2, 0.3, 0.4, 0.5]),
# Model embedding dimension
"dim_model": CategoricalParameter([160,320,480,640]),
# Number of CNN layers
"num_cnn_layers": IntegerParameter(0,10),
# CNN kernel size
"cnn_kernel": CategoricalParameter([3,5,7,9]),
# Number of tranformer layers
"num_transformer_layers": IntegerParameter(0,4),
# Number of transformer attention heads
"num_heads": CategoricalParameter([4,8]),
#Number of RNN layers
"num_rnn_layers": IntegerParameter(0,10), # optional
# RNN input dimension size
"dim_rnn":CategoricalParameter([128,256])
}

To further improve model accuracy and reduce model variance, we trained the model with multiple independent random weight initializations, and aggregated the result with mean values as the final probability prediction. There is a trade-off between more computing resources and better model performance, and we observed that 5–10 should be a reasonable number in the current use case (results shown later in this post).

Model performance results

In this section, we present the model performance evaluation metrics and results.

Evaluation metrics

Precision is very important for this predictive maintenance use case. Low precision means reporting more false maintenance calls, which drives costs up through unnecessary maintenance. Because average precision (AP) doesn’t fully align with the high precision objective, we introduced a new metric named average recall at high precisions (ARHP). ARHP is equal to the average of recalls at 60%, 70%, and 80% precision points. We also used precision at top K% (K=1, 10), AUPR, and AUROC as additional metrics.

Results

The following table summarizes the results using the baseline and the customized neural network models, with 7/1/2022 as the train/test split point. Experiments show that increasing the window length and sample data size both improve the model performance, because they contain more historical information to help with the prediction. Regardless of the data settings, the neural network model outperforms AutoGluon in all metrics. For example, recall at the fixed 80% precision is increased by 121%, which enables you to quickly identify more malfunctioned machines if using the neural network model.

Model Window length/Data size AUROC AUPR ARHP Recall@Prec0.6 Recall@Prec0.7 Recall@Prec0.8 Prec@top1% Prec@top10%
AutoGluon baseline 12H/500k 66.5 36.1 9.5 12.7 9.3 6.5 85 42
Neural Network 12H/500k 74.7 46.5 18.5 25 18.1 12.3 89 55
AutoGluon baseline 48H/1mm 70.2 44.9 18.8 26.5 18.4 11.5 92 55
Neural Network 48H/1mm 75.2 53.1 32.4 39.3 32.6 25.4 94 65

The following figures illustrate the effect of using ensembles to boost the neural network model performance. All the evaluation metrics shown on the x-axis are improved, with higher mean (more accurate) and lower variance (more stable). Each box-plot is from 12 repeated experiments, from no ensembles to 10 models in ensembles (x-axis). Similar trends persist in all metrics besides the Prec@top1% and Recall@Prec80% shown.

After factoring in the computational cost, we observe that using 5–10 models in ensembles is suitable for Light & Wonder datasets.

Conclusion

Our collaboration has resulted in the creation of a groundbreaking predictive maintenance solution for the gaming industry, as well as a reusable framework that could be utilized in a variety of predictive maintenance scenarios. The adoption of AWS technologies such as SageMaker automatic model tuning facilitates Light & Wonder to navigate new opportunities using near-real-time data streams. Light & Wonder is starting the deployment on AWS.

If you would like help accelerating the use of ML in your products and services, please contact the Amazon ML Solutions Lab program.


About the authors

Aruna Abeyakoon is the Senior Director of Data Science & Analytics at Light & Wonder Land-based Gaming Division. Aruna leads the industry-first Light & Wonder Connect initiative and supports both casino partners and internal stakeholders with consumer behavior and product insights to make better games, optimize product offerings, manage assets, and health monitoring & predictive maintenance.

Denisse Colin is a Senior Data Science Manager at Light & Wonder, a leading cross-platform global game company. She is a member of the Gaming Data & Analytics team helping develop innovative solutions to improve product performance and customers’ experiences through Light & Wonder Connect.

Tesfagabir Meharizghi is a Data Scientist at the Amazon ML Solutions Lab where he helps AWS customers across various industries such as gaming, healthcare and life sciences, manufacturing, automotive, and sports and media, accelerate their use of machine learning and AWS cloud services to solve their business challenges.

Mohamad Aljazaery is an applied scientist at Amazon ML Solutions Lab. He helps AWS customers identify and build ML solutions to address their business challenges in areas such as logistics, personalization and recommendations, computer vision, fraud prevention, forecasting and supply chain optimization.

Yawei Wang is an Applied Scientist at the Amazon ML Solution Lab. He helps AWS business partners identify and build ML solutions to address their organization’s business challenges in a real-world scenario.

Yun Zhou is an Applied Scientist at the Amazon ML Solutions Lab, where he helps with research and development to ensure the success of AWS customers. He works on pioneering solutions for various industries using statistical modeling and machine learning techniques. His interest includes generative models and sequential data modeling.

Panpan Xu is a Applied Science Manager with the Amazon ML Solutions Lab at AWS. She is working on research and development of Machine Learning algorithms for high-impact customer applications in a variety of industrial verticals to accelerate their AI and cloud adoption. Her research interest includes model interpretability, causal analysis, human-in-the-loop AI and interactive data visualization.

Raj Salvaji leads Solutions Architecture in the Hospitality segment at AWS. He works with hospitality customers by providing strategic guidance, technical expertise to create solutions to complex business challenges. He draws on 25 years of experience in multiple engineering roles across Hospitality, Finance and Automotive industries.

Shane Rai is a Principal ML Strategist with the Amazon ML Solutions Lab at AWS. He works with customers across a diverse spectrum of industries to solve their most pressing and innovative business needs using AWS’s breadth of cloud-based AI/ML services.

Read More

Use the AWS CDK to deploy Amazon SageMaker Studio lifecycle configurations

Use the AWS CDK to deploy Amazon SageMaker Studio lifecycle configurations

Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). Studio provides a single web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. Lifecycle configurations are shell scripts triggered by Studio lifecycle events, such as starting a new Studio notebook. You can use lifecycle configurations to automate customization for your Studio environment. This customization includes installing custom packages, configuring notebook extensions, preloading datasets, and setting up source code repositories. For example, as an administrator for a Studio domain, you may want to save costs by having notebook apps shut down automatically after long periods of inactivity.

The AWS Cloud Development Kit (AWS CDK) is a framework for defining cloud infrastructure through code and provisioning it through AWS CloudFormation stacks. A stack is a collection of AWS resources that can be programmatically updated, moved, or deleted. AWS CDK constructs are the building blocks of AWS CDK applications, representing the blueprint to define cloud architectures.

In this post, we show how to use the AWS CDK to set up Studio, use Studio lifecycle configurations, and enable its access for data scientists and developers in your organization.

Solution overview

The modularity of lifecycle configurations allows you to apply them to all users in a domain or to specific users. This way, you can set up lifecycle configurations and reference them in the Studio kernel gateway or Jupyter server quickly and consistently. The kernel gateway is the entry point to interact with a notebook instance, whereas the Jupyter server represents the Studio instance. This enables you to apply DevOps best practices and meet safety, compliance, and configuration standards across all AWS accounts and Regions. For this post, we use Python as the main language, but the code can be easily changed to other AWS CDK supported languages. For more information, refer to Working with the AWS CDK.

Prerequisites

To get started, make sure you have the following prerequisites:

Clone the GitHub repository

First, clone the GitHub repository.

As you clone the repository, you can observe that we have a classic AWS CDK project with the directory studio-lifecycle-config-construct, which contains the construct and resources required to create lifecycle configurations.

AWS CDK constructs

The file we want to inspect is aws_sagemaker_lifecycle.py. This file contains the SageMakerStudioLifeCycleConfig construct we use to set up and create lifecycle configurations.

The SageMakerStudioLifeCycleConfig construct provides the framework for building lifecycle configurations using a custom AWS Lambda function and shell code read in from a file. The construct contains the following parameters:

  • ID – The name of the current project.
  • studio_lifecycle_content – The base64 encoded content.
  • studio_lifecycle_tags – Labels you assign to organize Amazon resources. They are inputted as key-value pairs and are optional for this configuration.
  • studio_lifecycle_config_app_typeJupyterServer is for the unique server itself, and the KernelGateway app corresponds to a running SageMaker image container.

For more information on the Studio notebook architecture, refer to Dive deep into Amazon SageMaker Studio Notebooks architecture.

The following is a code snippet of the Studio lifecycle config construct (aws_sagemaker_lifecycle.py):

class SageMakerStudioLifeCycleConfig(Construct):
 def __init__(
 self,
 scope: Construct,
 id: str,
 studio_lifecycle_config_content: str,
 studio_lifecycle_config_app_type: str,
 studio_lifecycle_config_name: str,
 studio_lifecycle_config_arn: str,
 **kwargs,
 ):
 super().__init__(scope, id)
 self.studio_lifecycle_content = studio_lifecycle_content
 self.studio_lifecycle_config_name = studio_lifecycle_config_name
 self.studio_lifecycle_config_app_type = studio_lifecycle_config_app_type

 lifecycle_config_role = iam.Role(
 self,
 "SmStudioLifeCycleConfigRole",
 assumed_by=iam.ServicePrincipal("lambda.amazonaws.com"),
 )

 lifecycle_config_role.add_to_policy(
 iam.PolicyStatement(
 resources=[f"arn:aws:sagemaker:{scope.region}:{scope.account}:*"],
 actions=[
 "sagemaker:CreateStudioLifecycleConfig",
 "sagemaker:ListUserProfiles",
 "sagemaker:UpdateUserProfile",
 "sagemaker:DeleteStudioLifecycleConfig",
 "sagemaker:AddTags",
 ],
 )
 )

 create_lifecycle_script_lambda = lambda_.Function(
 self,
 "CreateLifeCycleConfigLambda",
 runtime=lambda_.Runtime.PYTHON_3_8,
 timeout=Duration.minutes(3),
 code=lambda_.Code.from_asset(
 "../mlsl-cdk-constructs-lib/src/studiolifecycleconfigconstruct"
 ),
 handler="onEvent.handler",
 role=lifecycle_config_role,
 environment={
 "studio_lifecycle_content": self.studio_lifecycle_content,
 "studio_lifecycle_config_name": self.studio_lifecycle_config_name,
 "studio_lifecycle_config_app_type": self.studio_lifecycle_config_app_type,
 },
 )

 config_custom_resource_provider = custom_resources.Provider(
 self,
 "ConfigCustomResourceProvider",
 on_event_handler=create_lifecycle_script_lambda,
 )

 studio_lifecyle_config_custom_resource = CustomResource(
 self,
 "LifeCycleCustomResource",
 service_token=config_custom_resource_provider.service_token,
 )
 self. studio_lifecycle_config_arn = studio_lifecycle_config_custom_resource.get_att("StudioLifecycleConfigArn")

After you import and install the construct, you can use it. The following code snippet shows how to create a lifecycle config using the construct in a stack either in app.py or another construct:

my_studio_lifecycle_config = SageMakerStudioLifeCycleConfig(
 self,
 "MLSLBlogPost",
 studio_lifecycle_config_content="base64content",
 studio_lifecycle_config_name="BlogPostTest",
 studio_lifecycle_config_app_type="JupyterServer",
 
 )

Deploy AWS CDK constructs

To deploy your AWS CDK stack, run the following commands in the location where you cloned the repository.

The command may be python instead of python3 depending on your path configurations.

  1. Create a virtual environment:
    1. For macOS/Linux, use python3 -m venv .cdk-venv.
    2. For Windows, use python3 -m venv .cdk-venv.
  2. Activate the virtual environment:
    1. For macOS/Linux, use source .cdk-venvbinactivate.
    2. For Windows, use .cdk-venv/Scripts/activate.bat.
    3. For PowerShell, use .cdk-venv/Scripts/activate.ps1.
  3. Install the required dependencies:
    1. pip install -r requirements.txt
    2. pip install -r requirements-dev.txt
  4. At this point, you can optionally synthesize the CloudFormation template for this code:
    cdk synth

  5. Deploy the solution with the following commands:
    1. aws configure
    2. cdk bootstrap
    3. cdk deploy

When the stack is successfully deployed, you should be able to view the stack on the CloudFormation console.

You will also be able to view the lifecycle configuration on the SageMaker console.

Choose the lifecycle configuration to view the shell code that runs as well as any tags you assigned.

Attach the Studio lifecycle configuration

There are multiple ways to attach a lifecycle configuration. In this section, we present two methods: using the AWS Management Console, and programmatically using the infrastructure provided.

Attach the lifecycle configuration using the console

To use the console, complete the following steps:

  1. On the SageMaker console, choose Domains in the navigation pane.
  2. Choose the domain name you’re using and the current user profile, then choose Edit.
  3. Select the lifecycle configuration you want to use and choose Attach.

From here, you can also set it as default.

Attach the lifecycle configuration programmatically

You can also retrieve the ARN of the Studio lifecycle configuration created by the construct’s and attach it to the Studio construct programmatically. The following code shows the lifecycle configuration ARN being passed to a Studio construct:

default_user_settings=sagemaker.CfnDomain.UserSettingsProperty(
                execution_role=self.sagemaker_role.role_arn,
                jupyter_server_app_settings=sagemaker.CfnDomain.JupyterServerAppSettingsProperty(
                    default_resource_spec=sagemaker.CfnDomain.ResourceSpecProperty(
                        instance_type="system",
                        lifecycle_config_arn = my_studio_lifecycle_config.studio_lifeycycle_config_arn

                    )
                )

Clean up

Complete the steps in this section to clean up your resources.

Delete the Studio lifecycle configuration

To delete your lifecycle configuration, complete the following steps:

  1. On the SageMaker console, choose Studio lifecycle configurations in the navigation pane.
  2. Select the lifecycle configuration, then choose Delete.

Delete the AWS CDK stack

When you’re done with the resources you created, you can destroy your AWS CDK stack by running the following command in the location where you cloned the repository:

cdk destroy

When asked to confirm the deletion of the stack, enter yes.

You can also delete the stack on the AWS CloudFormation console with the following steps:

  1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  2. Choose the stack that you want to delete.
  3. In the stack details pane, choose Delete.
  4. Choose Delete stack when prompted.

If you run into any errors, you may have to manually delete some resources depending on your account configuration.

Conclusion

In this post, we discussed how Studio serves as an IDE for ML workloads. Studio offers lifecycle configuration support, which allows you to set up custom shell scripts to perform automated tasks, or set up development environments at launch. We used AWS CDK constructs to build the infrastructure for the custom resource and lifecycle configuration. Constructs are synthesized into CloudFormation stacks that are then deployed to create the custom resource and lifecycle script that is used in Studio and the notebook kernel.

For more information, visit Amazon SageMaker Studio.


About the Authors

Cory Hairston is a Software Engineer with the Amazon ML Solutions Lab. He currently works on providing reusable software solutions.

Alex Chirayath is a Senior Machine Learning Engineer at the Amazon ML Solutions Lab. He leads teams of data scientists and engineers to build AI applications to address business needs.

Gouri Pandeshwar is an Engineer Manager at the Amazon ML Solutions Lab. He and his team of engineers are working to build reusable solutions and frameworks that help accelerate adoption of AWS AI/ML services for customers’ business use cases.

Read More

Boost agent productivity with Salesforce integration for Live Call Analytics

Boost agent productivity with Salesforce integration for Live Call Analytics

As a contact center agent, would you rather focus on having productive customer conversations or get distracted by having to look up customer information and knowledge articles that could exist in various systems? We’ve all been there. Having a productive conversation while multitasking is challenging. A single negative experience may put a dent on a customer’s perception of your brand.

The Live Call Analytics with Agent Assist (LCA) open-source solution addresses these challenges by providing features such as AI-powered agent assistance, call transcription, call summarization, and much more. As part of our effort to meet the needs of your agents, we strive to add features based on your feedback and our own experience helping contact center operators.

One of the features we added is the ability to write your own AWS Lambda hooks for the start of call and post-call to custom process calls as they occur. This makes it easier to custom integrate with LCA architecture without complex modification to the original source code. It also lets you update LCA stack deployments more easily and quickly than if you were modifying the code directly.

Today, we are excited to announce a feature that lets you integrate LCA with your Customer Relationship Management (CRM) system, built on top of the pre- and post-call Lambda hooks.

In this post, we walk you through setting up the LCA/CRM integration with Salesforce.

Solution overview

LCA now has two additional Lambda hooks:

  • Start of call Lambda hook – The LCA Call Event/Transcript Processor invokes this hook at the beginning of each call. This function can implement custom logic that applies to the beginning of call processing, such as retrieving call summary details logged into a case in a CRM.
  • Post-call summary Lambda hook – The LCA Call Event/Transcript Processor invokes this hook after the call summary is processed. This function can implement custom logic that’s relevant to postprocessing, for example, updating the call summary to a CRM system.

The following diagram illustrates the start of call and post-call (summary) Lambda hooks that integrate with Salesforce to look up and update case records, respectively.

Start of call and Post call (summary) Lambda Hooks that integrate with Salesforce to look-up and update Case records respectively

Here are the steps we walk you through:

  1. Set up Salesforce to allow the custom Lambda hooks to look up or update the case records.
  2. Deploy the LCA and Salesforce integration stacks.
  3. Update the LCA stack with the Salesforce integration Lambda hooks and perform validations.

Prerequisites

You need the following prerequisites:

Create a Salesforce connected app

To set up your Salesforce app, complete the following steps:

  1. Log in to your Salesforce org and go to Setup.
  2. Search for App Manager and choose App Manager.
    Search for App Manager
  3. Choose New Connected App.
  4. For Connected App Name, enter a name.
  5. For Contact Email, enter a valid email.
  6. Select Enable OAuth Settings and enter a value for Callback URL.
  7. Under Available OAuth Scopes, choose Manage user data via APIs (api).
  8. Select Require Secret for Webserver Flow and Require Secret for Refresh Token Flow.
  9. Choose Save.
    New Connected App
  10. Under API (Enable OAuth Settings), choose Manage Consumer Details.
  11. Verify your identity if prompted.
  12. Copy the consumer key and consumer secret.

You need these when deploying the AWS Serverless Application Model (AWS SAM) application.

Get your Salesforce access token

If you don’t already have an access token, you need to obtain one. Before doing this, make sure that you’re prepared to update any applications that are using an access token because this step creates a new one and may invalidate the prior tokens.

  1. Find your personal information by choosing Settings from View profile on the top right.
  2. Choose Reset My Security Token followed by Reset Security Token.
    Reset Security Token
  3. Make note of the new access token that you receive via email.

Create a Salesforce customer contact record for each caller

The Lambda function that performs case look-up and update matches the caller’s phone number with a contact record in Salesforce. To create a new contact, complete the following steps:

  1. Log in to your Salesforce org.
  2. Under App Launcher, search for and choose Service Console.
    Service Console
  3. On the Service Console page, choose Contacts from the drop-down list, then choose New.
    Add new contact
  4. Enter a valid phone number under the Phone field of the New Contact page.
  5. Enter other contact details and choose Save.
  6. Repeat Steps 1–5 for any caller that makes a phone call and test the integration.

Deploy the LCA stack

Complete the following steps to deploy the LCA stack:

  1. Follow the instructions under the Deploy the CloudFormation stack section of Live call analytics and agent assist for your contact center with Amazon language AI services.
  2. Make sure that you choose ANTHROPIC, SAGEMAKER, or LAMBDA for the End of Call Transcript Summary parameter. See Transcript Summarization for more details.

The stacks take about 45 minutes to deploy.

  1. After the main stack shows CREATE_COMPLETE, on the Outputs tab, make a note of the Kinesis data stream ARN (CallDataStreamArn).

Deploy the Salesforce integration stack

To deploy the Salesforce integration stack, complete the following steps:

  1. Open a command-line terminal and run the following commands:
https://github.com/aws-samples/amazon-transcribe-live-call-analytics.git
cd amazon-transcribe-live-call-analytics/plugins/salesforce-integration
sam build
sam deploy —guided

Use the following table as a reference for parameter choices.

Parameter Name Description
AWS Region The Region where you have deployed the LCA solution
SalesforceUsername The user name of your Salesforce organization that has permissions to read and create cases
SalesforcePassword The password associated to your Salesforce user name
SalesforceAccessToken The access token you obtained earlier
SalesforceConsumerKey The consumer key you copied earlier
SalesforceConsumerSecret The consumer secret you obtained earlier
SalesforceHostUrl The login URL of your Salesforce organization
SalesforceAPIVersion The Salesforce API version (choose default or v56.0)
LCACallDataStreamArn The Kinesis data stream ARN (CallDataStreamArn) obtained earlier
  1. After the stack successfully deploys, make a note of StartOfCallLambdaHookFunctionArn and PostCallSummaryLambdaHookFunctionArn from the outputs displayed on your terminal.

Update LCA Stack

Complete the following steps to update the LCA stack:

  1. On the AWS CloudFormation console, update the main LCA stack.
  2. Choose Use current template.
  3. For Lambda Hook Function ARN for Custom Start of Call Processing (existing), provide the StartOfCallLambdaHookFunctionArn that you obtained earlier.
  4. For Lambda Hook Function ARN for Custom Post Processing, after the Call Transcript Summary is processed (existing), provide the PostCallSummaryLambdaHookFunctionArn that you obtained earlier.
  5. Make sure that End of Call Transcript Summary is not DISABLED.

Validate the integration

Make a test call and make sure you can see the beginning of call AGENT ASSIST and post-call AGENT ASSIST transcripts. Refer to the Explore live call analysis and agent assist features section of the Live call analytics and agent assist for your contact center with Amazon language AI services post for guidance.

Clean up

To avoid incurring charges, clean up your resources by following these instructions when you are finished experimenting with this solution:

  1. On the AWS CloudFormation console, and delete the LCA stacks that you deployed. This deletes resources that were created by deploying the solution. The recording S3 buckets, DynamoDB table, and CloudWatch log groups are retained after the stack is deleted to avoid deleting your data.
  2. On your terminal, run sam delete to delete the Salesforce integration Lambda functions.
  3. Follow the instructions in Deactivate a Developer Edition Org to deactivate your Salesforce Developer org.

Conclusion

In this post, we demonstrated how the Live-Call Analytics sample project can accelerate your adoption of real-time contact center analytics and integration. Rather than building from scratch, we show how to use the existing code base with the pre-built integration points with the start of call and post-call Lambda hooks. This enhances agent productivity by integrating with Salesforce to look up and update case records. Explore our open-source project and enhance the CRM pre- and post-call Lambda hooks to accommodate your use case.


About the Authors

Kishore Dhamodaran is a Senior Solutions Architect at AWS.

Bob Strahan Bob Strahan is a Principal Solutions Architect in the AWS Language AI Services team.

Christopher Lott is a Senior Solutions Architect in the AWS AI Language Services team. He has 20 years of enterprise software development experience. Chris lives in Sacramento, California and enjoys gardening, aerospace, and traveling the world.

Babu Srinivasan is a Sr. Specialist SA – Language AI services in the World Wide Specialist organization at AWS, with over 24 years of experience in IT and the last 6 years focused on the AWS Cloud. He is passionate about AI/ML. Outside of work, he enjoys woodworking and entertains friends and family (sometimes strangers) with sleight of hand card magic.

Read More