Build an AI-powered virtual agent for Genesys Cloud using QnABot and Amazon Lex

The rise of artificial intelligence technologies enables organizations to adopt and improve self-service capabilities in contact center operations to create a more proactive, timely, and effective customer experience. Voice bots, or conversational interactive voice response systems (IVR), use natural language processing (NLP) to understand customers’ questions and provide relevant answers. Businesses can automate responses to frequently asked transactional questions by deploying bots that are available 24/7. As a result, customers benefit from reduced wait time and faster call resolution time, especially during peak hours.

In the post Enhancing customer service experiences using Conversational AI: Power your contact center with Amazon Lex and Genesys Cloud, we introduced Amazon Lex support on the Genesys Cloud platform and outlined the process of activating the integration. In this post, we demonstrate how to elevate traditional customer service FAQs with an interactive voice bot. We dive into a common self-service use case, explore Q&A interactions, and offer an automated approach using QnABot on AWS Solution built on Amazon Lex with Genesys Cloud.

Solution overview

Informational interactions are widely applicable, with examples such as hours of operation, policy information, school schedules, or other frequently asked questions that are high volume and straightforward. The solution discussed in this post enables customers to interact with a voice bot backed by a curated knowledge base in a natural and conversational manner. Customers can get answers without having to wait for a human customer service representative, thereby improving resolution time and customer satisfaction. You can also implement the same bot directly as a web client, or embed it into an existing site as a chat widget, expanding touch points through multiple channels and increasing overall engagement with customers.

For a demo video describing the experience of a customer dialing into a contact center and interacting with QnABot, check out the below video:

QnABot provides a preconfigured architecture that delivers a low-code experience, as shown in the following diagram. Behind the scenes, it uses Amazon Lex along with other AWS services. Non-technical users can deploy the solution with the click of a button, build their bot through a user-friendly interface, and integrate the voice bot into a Genesys Cloud call flow.

solution workflow

The solution workflow contains the following steps:

  1. The admin deploys the QnABot solution into their AWS account, opens the Content Designer UI, and uses Amazon Cognito to authenticate.
  2. After authentication, Amazon CloudFront and Amazon Simple Storage Service (Amazon S3) deliver the contents of the Content Designer UI.
  3. The admin configures questions and answers in the Content Designer, and the UI sends requests to Amazon API Gateway to save the questions and answers.
  4. The Content Designer AWS Lambda function saves the input in Amazon OpenSearch Service in a questions bank index.
  5. The admin activates the Amazon Lex integration on Genesys Cloud, exports a sample flow from the Content Designer UI, and imports this flow into Genesys Cloud using the Genesys Archy tool.
  6. The customer dials into Genesys Cloud and begins an interaction with QnABot. Genesys Cloud streams this audio to Amazon Lex, which converts the audio to text and calls the Bot Fulfillment Lambda function.
  7. The Bot Fulfillment function takes the user input and looks up the answer in OpenSearch Service. Alternatively, you can use Amazon Kendra if an index is configured and provided at the time of deployment. The answer is synthesized into voice by Amazon Polly and played back to the customer.
  8. User interactions with the Bot Fulfillment function generate logs and metrics data, which are sent to Amazon Kinesis Data Firehose then to Amazon S3 for later data analysis.

To implement this solution, we walk through the following steps:

  1. Enable Amazon Lex V2 integration with Genesys.
  2. Configure Archy, the Genesys Cloud Architect YAML processor.
  3. Export the Genesys call flow from the QnABot Content Designer.
  4. Import and publish the call flow with Archy.
  5. Import example questions to QnABot.
  6. Create a test call and interact with the bot.
  7. Customize the call flow in Genesys Architect.

Prerequisites

To get started, you need the following:

Enable Amazon Lex V2 integration with Genesys Cloud

The first step is to enable Amazon Lex V2 integration with Genesys Cloud. For instructions, refer to Enhancing customer service experiences using Conversational AI: Power your contact center with Amazon Lex and Genesys Cloud.

Configure Archy

We have prepared a sample inbound call flow to get you started with QnABot and Genesys Cloud. We use Archy, the Genesys Cloud Architect YAML processor tool, to publish this call flow. You must first generate an OAuth client ID and client secret, then you can download and configure Archy.

Generate an OAuth client ID and client secret

Archy requires either a client ID and secret pair or an authorization token. For more information about Archy’s OAuth requirements, refer to Prerequisites in the Archy installation documentation.

To generate a client ID and secret pair, complete the following steps:

  1. On the Genesys Cloud Admin page, navigate to Integrations, then choose OAuth.
  2. Choose Add Client.
  3. For App Name, enter QnABot.
  4. For Description, enter a description.
  5. For Grant Types, select Client Credentials.

A new Roles tab appears.

configure OAuth

  1. On the Roles tab, assign a role that has Architect > flow > publish permissions.

In the following screenshot, we’re assigning the admin role. You may have to also assign the Master Admin role.

  1. Choose Save.

set up admin role

  1. On the Client Details tab, copy the values for the client ID and client secret.

configure client credential

Download and configure Archy

Download and unzip the appropriate version of Archy for your operating system. Then navigate to the folder in a terminal and begin the setup process by running the following command:

./archy setup

welcome page for Archy

Continue through the Archy setup, and provide the client ID and client secret when prompted.

Export the call flow YAML from the QnABot Content Designer

Now that Archy is authorized to publish call flows, we export the preconfigured call flow from the QnABot Content Designer.

  1. Log in to the QnABot Content Designer.
  2. On the Tools menu, choose Genesys Cloud.

Genesys Cloud in QnABot Content Designer

  1. Choose Next until you reach the Download Call Flow section.
  2. Choose Download Inbound Call Flow.

download call flow

You download a file named QnABotFlow.yaml, which is a preconfigured Genesys call flow.

  1. Copy this file to the same folder Archy is located in.

Import and publish the call flow with Archy

To publish the call flow to Archy, run the following command:

./archy publish --file QnABotFlow.yaml

When complete, a new inbound call flow named QnABotFlow is available in Genesys Architect.

import call flow into Architect

To assign this call flow, on the Genesys Cloud Admin page, navigate to Routing and choose Call Routing.

The new QnABotFlow should appear in the list of call flows under Regular Routing. Assign the flow, then choose Save.

configure call routing

Import example questions to QnABot

Navigate back to the QnABot Content Designer, choose the Tools menu, and choose Import.

import sample questions

Expand Examples/Extensions, find the GenesysWizardQnA example, and choose Load.

load sample questions

If you navigate back to the main questions and answers page, you now have the GenesysHelper questions. These are a set of example questions and answers for you to get started.

sample question overview

Create a test phone call and interact with the bot

Back to Genesys Cloud Admin, make sure you have an inbound phone number associated with the QnABotFlow call flow under Call Routing. We now navigate to the agent desktop and make a test call to interact with the bot for the first time.

configure test call

QnABot is designed to answer questions based on the data preconfigured in the Content Designer. Let’s try the following:

  • What is your business hour?
  • What is the meaning of life?

Each time QnABot provides an answer, you have the option to ask another question, conclude the call by saying “Goodbye,” or ask to be connected to a human agent by saying “I would like to speak to an agent.”

Customize the call flow with Genesys Architect

The Genesys call flow is preconfigured to enable specific Amazon Lex session attributes. For example, if you edit the question with ID GenesysHelper.Hours, the answer contains the following statement:

{{setSessionAttr 'genesys_nextPrompt' 'Do you want to know the hours for Seattle or Boston?'}}

This is based on Handlebars, and allows you to set values for session attributes. The exported Genesys Cloud CX call flow contains a block that reads back the value of the genesys_nextPrompt session attribute, which is only spoken by the Genesys call flow.

To branch to a queue or another call flow, a QnABot answer can use setSessionAttr to set genesys_nextAction to a specific value. An example of this is in the question with ID GenesysHelper.Agent, where the answer has {{setSessionAttr 'nextAction' 'AGENT'}}. In the call flow’s QnABot reusable task, there is a switch block that reads the value of this attribute to branch to a specific action. The example call flow contains branches for AGENT, MENU, and END. If there is no value for the genesys_nextAction session attribute, the call flow plays back any string found in the genesys_nextPrompt content, or the value of the defaultPrompt task variable defined at the beginning of the main flow, which is set by default to ask another question or say return to main menu.

The following diagram illustrates the main call flow.

primary call flow

The following diagram illustrates the flow of the reusable task.

reusable task

Clean up

To avoid incurring future charges, delete the resources created via the template by navigating to the AWS CloudFormation console, selecting the QnABot stack created by the template, and choosing Delete. This removes all resources created by the template.

To remove the resources in Genesys Cloud, first remove the call flow from call routing. Then delete the call flow from Genesys Architect.

Conclusion

In this post, we walked through how to get started with QnABot and Genesys Cloud with an easy-to-deploy, readily usable solution to address a transactional interaction use case. This voice bot frees your customer service representatives to spend time with your customers on more complex tasks, and provides users with a better experience through self-service. Customer satisfaction increases, while costs decrease, because you’re consuming fewer connected minutes and maximizing agent utilization.

To get started, you can launch QnABot with a single click and go through the QnABot Workshop to learn about additional features. Amazon Lex integration is available on Genesys AppFoundry.


About the Authors

Christopher Lott is a Senior Solutions Architect in the AWS AI Language Services team. He has 20 years of enterprise software development experience. Chris lives in Sacramento, California, and enjoys gardening, aerospace, and traveling the world.

Jessica Ho is a Solutions Architect at Amazon Web Services, supporting ISV partners who build business applications on AWS. She is passionate about creating differentiated solutions that unlock customers for cloud adoption. Outside of work, she enjoys turning her garden into a mini jungle.

Read More

Set up enterprise-level cost allocation for ML environments and workloads using resource tagging in Amazon SageMaker

As businesses and IT leaders look to accelerate the adoption of machine learning (ML), there is a growing need to understand spend and cost allocation for your ML environment to meet enterprise requirements. Without proper cost management and governance, your ML spend may lead to surprises in your monthly AWS bill. Amazon SageMaker is a fully managed ML platform in the cloud that equips our enterprise customers with tools and resources to establish cost allocation measures and improve visibility into detailed cost and usage by your teams, business units, products, and more.

In this post, we share tips and best practices regarding cost allocation for your SageMaker environment and workloads. Across almost all AWS services, SageMaker included, applying tags to resources is a standard way to track costs. These tags can help you track, report, and monitor your ML spend through out-the-box solutions like AWS Cost Explorer and AWS Budgets, as well as custom solutions built on the data from AWS Cost and Usage Reports (CURs).

Cost allocation tagging

Cost allocation on AWS is a three-step process:

  1. Attach cost allocation tags to your resources.
  2. Activate your tags in the Cost allocation tags section of the AWS Billing console.
  3. Use the tags to track and filter for cost allocation reporting.

After you create and attach tags to resources, they appear in the AWS Billing console’s Cost allocation tags section under User-defined cost allocation tags. It can take up to 24 hours for tags to appear after they’re created. You then need to activate these tags for AWS to start tracking them for your resources. Typically, after a tag is activated, it takes about 24–48 hours for the tags to show up in Cost Explorer. The easiest way to check if your tags are working is to look for your new tag in the tags filter in Cost Explorer. If it’s there, then you’re ready to use the tags for your cost allocation reporting. You can then choose to group your results by tag keys or filter by tag values, as shown in the following screenshot.

One thing to note: if you use AWS Organizations and have linked AWS accounts, tags can only be activated in the primary payer account. Optionally, you can also activate CURs for the AWS accounts that enable cost allocation reports as a CSV file with your usage and costs grouped by your active tags. This gives you more detailed tracking of your costs and makes it easier to set up your own custom reporting solutions.

Tagging in SageMaker

At a high level, tagging SageMaker resources can be grouped into two buckets:

  • Tagging the SageMaker notebook environment, either Amazon SageMaker Studio domains and domain users, or SageMaker notebook instances
  • Tagging SageMaker-managed jobs (labeling, processing, training, hyperparameter tuning, batch transform, and more) and resources (such as models, work teams, endpoint configurations, and endpoints)

We cover these in more detail in this post and provide some solutions on how to apply governance control to ensure good tagging hygiene.

Tagging SageMaker Studio domains and users

Studio is a web-based, integrated development environment (IDE) for ML that lets you build, train, debug, deploy, and monitor your ML models. You can launch Studio notebooks quickly, and dynamically dial up or down the underlying compute resources without interrupting your work.

To automatically tag these dynamic resources, you need to assign tags to SageMaker domain and domain users who are provisioned access to those resources. You can specify these tags in the tags parameter of create-domain or create-user-profile during profile or domain creation, or you can add them later using the add-tags API. Studio automatically copies and assigns these tags to the Studio notebooks created in the domain or by the specific users. You can also add tags to SageMaker domains by editing the domain settings in the Studio Control Panel.

The following is an example of assigning tags to the profile during creation.

aws sagemaker create-user-profile --domain-id <domain id> --user-profile-name data-scientist-full --tags Key=studiouserid,Value=<user id> --user-settings ExecutionRole=arn:aws:iam::<account id>:role/SageMakerStudioExecutionRole_datascientist-full

To tag existing domains and users, use the add-tags API. The tags are then applied to any new notebooks. To have these tags applied to your existing notebooks, you need to restart the Studio app (Kernel Gateway and Jupyter Server) belonging to that user profile. This won’t cause any loss in notebook data. Refer to this Shut Down and Update SageMaker Studio and Studio Apps to learn how to delete and restart your Studio apps.

Tagging SageMaker notebook instances

In the case of a SageMaker notebook instance, tagging is applied to the instance itself. The tags are assigned to all resources running in the same instance. You can specify tags programmatically using the tags parameter in the create-notebook-instance API or add them via the SageMaker console during instance creation. You can also add or update tags anytime using the add-tags API or via the SageMaker console.

Note that this excludes SageMaker managed jobs and resources such as training and processing jobs because they’re in the service environment rather than on the instance. In the next section, we go over how to apply tagging to these resources in greater detail.

Tagging SageMaker managed jobs and resources

For SageMaker managed jobs and resources, tagging must be applied to the tags attribute as part of each API request. An SKLearnProcessor example is illustrated in the following code. You can find more examples of how to assign tags to other SageMaker managed jobs and resources on the GitHub repo.

from sagemaker import get_execution_role
from sagemaker.sklearn.processing import SKLearnProcessor

processing_tags = [{' Key':"cost-center','Value':'TF2WorkflowProcessing'}]
sklearn_processorl = SKLearnProcessor(framework_version='0.23-1' ,
									 role=get_execution_role(),
									 instance_type='ml.m5.xlarge',
									 instance_count=2,
									 tags=processing_tags)

Tagging SageMaker pipelines

In the case of SageMaker pipelines, you can tag the entire pipeline as a whole instead of each individual step. The SageMaker pipeline automatically propagates the tags to each pipeline step. You still have the option to add additional, separate tags to individual steps if needed. In the Studio UI, the pipeline tags appear in the metadata section.

To apply tags to a pipeline, use the SageMaker Python SDK:

pipeline_tags = [ {'Key': 'pipeline-type', 'Value': 'TF2WorkflowPipeline'}]
pipeline.upsert(role_arn=role, tags=pipeline_tags)
execution = pipeline.start()

Enforce tagging using IAM policies

Although tagging is an effective mechanism for implementing cloud management and governance strategies, enforcing the right tagging behavior can be challenging if you just leave it to the end-users. How do you prevent ML resource creation if a specific tag is missing, how do you ensure the right tags are applied, and how do you prevent users from deleting existing tags?

You can accomplish this using AWS Identity and Access Management (IAM) policies. The following code is an example of a policy that prevents SageMaker actions such as CreateDomain or CreateNotebookInstance if the request doesn’t contain the environment key and one of the list values. The ForAllValues modifier with the aws:TagKeys condition key indicates that only the key environment is allowed in the request. This stops users from including other keys, such as accidentally using Environment instead of environment.

"sagemaker:CreateTrainingJob"
      ],
      "{
      "Sid": "SageMakerEnforceEnvtOnCreate",
      "Action": [
        "sagemaker:CreateDomain",
        "sagemaker:CreateEndpoint",
        "sagemaker:CreateNotebookInstance",
        Effect": "Allow",
      "Resource": "*",
  "Condition": {
            "StringEquals": {
                "aws:RequestTag/environment": [
                    "dev","staging","production"
                ]
            },
            "ForAllValues:StringEquals": {"aws:TagKeys": "environment"}
        }
      }

Tag policies and service control policies (SCPs) can also be a good way to standardize creation and labeling of your ML resources. For more information about how to implement a tagging strategy that enforces and validates tagging at the organization level, refer to Cost Allocation Blog Series #3: Enforce and Validate AWS Resource Tags.

Cost allocation reporting

You can view the tags by filtering the views on Cost Explorer, viewing a monthly cost allocation report, or by examining the CUR.

Visualizing tags in Cost Explorer

Cost Explorer is a tool that enables you to view and analyze your costs and usage. You can explore your usage and costs using the main graph: the Cost Explorer cost and usage reports. For a quick video on how to use Cost Explorer, check out How can I use Cost Explorer to analyze my spending and usage?

With Cost Explorer, you can filter how you view your AWS costs by tags. Group by allows us to filter out results by tag keys such as Environment, Deployment, or Cost Center. The tag filter helps us select the value we desire regardless of the key. Examples include Production and Staging. Keep in mind that you must run the resources after adding and activating tags; otherwise, Cost Explorer won’t have any usage data and the tag value won’t be displayed as a filter or group by option.

The following screenshot is an example of filtering by all values of the BusinessUnit tag.

Examining tags in the CUR

The Cost and Usage Rreport contains the most comprehensive set of cost and usage data available. The report contains line items for each unique combination of AWS product, usage type, and operation that your AWS account uses. You can customize the CUR to aggregate the information either by the hour or by the day. A monthly cost allocation report is one way to set up cost allocation reporting. You can set up a monthly cost allocation report that lists the AWS usage for your account by product category and linked account user. The report contains the same line items as the detailed billing report and additional columns for your tag keys. You can set it up and download your report by following the steps in Monthly cost allocation report.

The following screenshot shows how user-defined tag keys show up in the CUR. User-defined tag keys have the prefix user, such as user:Department and user:CostCenter. AWS-generated tag keys have the prefix aws.

Visualize the CUR using Amazon Athena and Amazon QuickSight

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. To integrate Athena with CURs, refer to Querying Cost and Usage Reports using Amazon Athena. You can then build custom queries to query CUR data using standard SQL. The following screenshot is an example of a query to filter all resources that have the value TF2WorkflowTraining for the cost-center tag.

select * from {$table_name} where resource_tags_user_cost-center= 'TF2WorkflowTraining'

In the following example, we’re trying to figure out which resources are missing values under the cost-center tag.

SELECT
 bill_payer_account_id, line_item_usage_account_id, DATE_FORMAT((line_item_usage_start_date), '%Y-%m-%d') AS day_line_item_usage_start_date, line_item_resource_id, line_item_usage_type, resource_tags_user_cost-center
FROM
{$table_name} 
WHERE
 resource_tags_user_cost-center IS NULL
AND line_item_product_code = 'AmazonSageMaker'

More information and example queries can be found in the AWS CUR Query Library.

You can also feed CUR data into Amazon QuickSight, where you can slice and dice it any way you’d like for reporting or visualization purposes. For instructions on ingesting CUR data into QuickSight, see How do I ingest and visualize the AWS Cost and Usage Report (CUR) into Amazon QuickSight.

Budget monitoring using tags

AWS Budgets is an excellent way to provide an early warning if spend spikes unexpectedly. You can create custom budgets that alert you when your ML costs and usage exceed (or are forecasted to exceed) your user-defined thresholds. With AWS Budgets, you can monitor your total monthly ML costs or filter your budgets to track costs associated with specific usage dimensions. For example, you can set the budget scope to include SageMaker resource costs tagged as cost-center: ML-Marketing, as shown in the following screenshot. For additional dimensions and detailed instructions on how to set up AWS Budgets, refer to here.

With budget alerts, you can send notifications when your budget limits are (or are about to be) exceeded. These alerts can also be posted to an Amazon Simple Notification Service (Amazon SNS) topic. An AWS Lambda function that subscribes to the SNS topic is then invoked, and any programmatically implementable actions can be taken.

AWS Budgets also lets you configure budget actions, which are steps that you can take when a budget threshold is exceeded (actual or forecasted amounts). This level of control allows you to reduce unintentional overspending in your account. You can configure specific responses to cost and usage in your account that will be applied automatically or through a workflow approval process when a budget target has been exceeded. This is a really powerful solution to ensure that your ML spend is consistent with the goals of the business. You can select what type of action to take. For example, when a budget threshold is crossed, you can move specific IAM users from admin permissions to read-only. For customers using Organizations, you can apply actions to an entire organizational unit by moving them from admin to read-only. For more details on how to manage cost using budget actions, refer to How to manage cost overruns in your AWS multi-account environment – Part 1.

You can also set up a report to monitor the performance of your existing budgets on a daily, weekly, or monthly cadence and deliver that report to up to 50 email addresses. With AWS Budgets reports, you can combine all SageMaker-related budgets into a single report. This feature enables you to track your SageMaker footprint from a single location, as shown in the following screenshot. You can opt to receive these reports on a daily, weekly, or monthly cadence (I’ve chosen Weekly for this example), and choose the day of week when you want to receive them.

This feature is useful to keep your stakeholders up to date with your SageMaker costs and usage, and help them see when spend isn’t trending as expected.

After you set up this configuration, you should receive an email similar to the following.

Conclusion

In this post, we showed how you can set up cost allocation tagging for SageMaker and shared tips on tagging best practices for your SageMaker environment and workloads. We then discussed different reporting options like Cost Explorer and the CUR to help you improve visibility into your ML spend. Lastly, we demonstrated AWS Budgets and the budget summary report to help you monitor the ML spend of your organization.

For more information about applying and activating cost allocation tags, see User-Defined Cost Allocation Tags.


About the authors

Sean MorganSean Morgan is an AI/ML Solutions Architect at AWS. He has experience in the semiconductor and academic research fields, and uses his experience to help customers reach their goals on AWS. In his free time, Sean is an active open-source contributor and maintainer, and is the special interest group lead for TensorFlow Add-ons.

Brent Rabowsky focuses on data science at AWS, and leverages his expertise to help AWS customers with their own data science projects.

Nilesh Shetty is as a Senior Technical Account Manager at AWS, where he helps enterprise support customers streamline their cloud operations on AWS. He is passionate about machine learning and has experience working as a consultant, architect, and developer. Outside of work, he enjoys listening to music and watching sports.

James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.

Read More

Index your Dropbox content using the Dropbox connector for Amazon Kendra

Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides.

Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should be able to pull together data across several structured and unstructured repositories to index and search on.

One such data repository is Dropbox. Enterprise users use Dropbox to upload, transfer, and store documents to the cloud. Along with the ability to store documents, Dropbox offers Dropbox Paper, a coediting tool that lets users collaborate and create content in one place. Dropbox Paper can optionally use templates to add structure to documents. In addition to files and paper, Dropbox also allows you to store shortcuts to webpages in your folders.

We’re excited to announce that you can now use the Amazon Kendra connector for Dropbox to search information stored in your Dropbox account. In this post, we show how to index information stored in Dropbox and use the Amazon Kendra intelligent search function. In addition, Amazon Kendra’s ML powered intelligent search can accurately find information from unstructured documents having natural language narrative content, for which keyword search is not very effective.

Solution overview

With Amazon Kendra, you can configure multiple data sources to provide a central place to search across your document repository. For our solution, we demonstrate how to index a Dropbox repository or folder using the Amazon Kendra connector for Dropbox. The solution consists of the following steps:

  1. Configure an app on Dropbox and get the connection details.
  2. Store the details in AWS Secrets Manager.
  3. Create a Dropbox data source via the Amazon Kendra console.
  4. Index the data in the Dropbox repository.
  5. Run a sample query to get the information.

Prerequisites

To try out the Amazon Kendra connector for Dropbox, you need the following:

Configure a Dropbox app and gather connection details

Before we set up the Dropbox data source, we need a few details about your Dropbox repository. Let’s gather those in advance.

  1. Go to www.dropbox.com/developers.
  2. Choose App console.
  3. Sign in with your credentials (make sure you’re signing in to an Enterprise account).
  4. Choose Create app.
  5. Select Scoped access.
  6. Select Full Dropbox (or the name of the specific folder you want to index).
  7. Enter a name for your app.
  8. Choose Create app.

    You can see the configuration screen with a set of tabs.
  9. To set up permissions, choose the Permissions tab.
  10. Select a minimal set of permissions, as shown in the following screenshots.
  11. Choose Submit.

    A message appears saying that the permission change was successful.
  12. On the Settings tab, copy the app key.
  13. Choose Show next to App secret and copy the secret.
  14. Under Generated access token, choose Generate and copy the token.

Store these values in a safe place—we need to refer to these later.

The session token is valid for up to 4 hours. You have to generate a new session token each time you index the content.

Store Dropbox credentials in Secrets Manager

To store your Dropbox credentials in Secrets Manager, compete the following steps:

  1. On the Secrets Manager console, choose Store a new secret.
  2. Choose Other type of secret.
  3. Create three key-value pairs for appKey, appSecret, and refreshToken and enter the values saved from Dropbox.
  4. Choose Save.
  5. For Secret name, enter a name (for example, AmazonKendra-dropbox-secret).
  6. Enter an optional description.
  7. Choose Next.
  8. In the Configure rotation section, keep all settings at their defaults and choose Next.
  9. On the Review page, choose Store.

Configure the Amazon Kendra connector for Dropbox

To configure the Amazon Kendra connector, complete the following steps:

  1. On the Amazon Kendra console, choose Create an Index.
  2. For Index name, enter a name for the index (for example, my-dropbox-index).
  3. Enter an optional description.
  4. For Role name, enter an IAM role name.
  5. Configure optional encryption settings and tags.
  6. Choose Next.
  7. In the Configure user access control section, leave the settings at their defaults and choose Next.
  8. For Provisioning editions, select Developer edition.
  9. Choose Create.

    This creates and propagates the IAM role and then creates the Amazon Kendra index, which can take up to 30 minutes.
  10. Choose Data sources in the navigation pane.
  11. Under Dropbox, choose Add connector.
  12. For Data source name, enter a name (for example, my-dropbox-connector).
  13. Enter an optional description.
  14. Choose Next.
  15. For Type of authentication token, select Access Token (temporary use).
  16. For AWS Secrets Manager secret, choose the secret you created earlier.
  17. For IAM role, choose Create a new role.
  18. For Role name, enter a name (for example, AmazonKendra-dropbox-role).
  19. Choose Next.
  20. For Select entities or content types, choose your content types.
  21. For Frequency, choose Run on demand.
  22. Choose Next.
  23. Set any optional field mappings and choose Next.
  24. Choose Review and Create and choose Add data source.
  25. Choose Sync now.
  26. Wait for the sync to complete.

Test the solution

Now that you have ingested the content from your Dropbox account into your Amazon Kendra index, you can test some queries.

Go to your index and choose Search indexed content. Enter a sample search query and test out your search results (your query will vary based on the contents of your account).

The Dropbox connector also crawls local identity information from Dropbox. For users, it sets user email id as principal. For groups, it sets group id as principal. To filter search results by users/groups, go to the Search Console.

Click on “Test query with user name or groups” to expand it and click on the button that says “apply user name or groups”.

Enter the user and/or group names and click Apply. Next, enter the search query and hit enter. This brings you a filtered set of results based on your criteria.

Congratulations! You have successfully used Amazon Kendra to surface answers and insights based on the content indexed from your Dropbox account.

Generate permanent tokens for offline access

The instructions in this post walk you through creating, configuring, and using a temporary access token. Apps can also get long-term access by requesting offline access, in which case the app receives a refresh token that can be used to retrieve new short-lived access tokens as needed, without further manual user intervention. You can find more information in the Dropbox OAuth Guide and Dropbox authorization documentation. Use the following steps to create a permanent refresh token (for example to set the sync to trigger on a schedule):

  1. Get the app key and app secret as before.
  2. In a new browser, navigate to https://www.dropbox.com/oauth2/authorize?token_access_type=offline&response_type=code&client_id=<appkey>.
  3. Accept the defaults and choose Submit.
  4. Choose Continue.
  5. Choose Allow.

    An access code is generated for you.
  6. Copy the access code.

    Now you get the refresh token from the access code.
  7. In a terminal window, run the following curl command:
    curl https://api.dropbox.com/oauth2/token -d code=<receivedcode> -d grant_type=authorization_code -u <appkey>:<appsecret>

You can store this refresh token along with the app key and app secret to configure a permanent token in the data source configuration for Amazon Kendra. Amazon Kendra generates the access token and uses it as needed for access.

Limitations

This solution has the following limitations:

  • File comments are not imported into the index
  • You don’t have the option to add custom metadata for Dropbox
  • Google docs, sheets, and slides need a Google workspace or Google account and are not included

Conclusion

With the Dropbox connector for Amazon Kendra, organizations can tap into the repository of information stored in their account securely using intelligent search powered by Amazon Kendra.

In this post, we introduced you to the basics, but there are many additional features that we didn’t cover. For example:

  • You can enable user-based access control for your Amazon Kendra index and restrict access to users and groups that you configure
  • You can specify allowedUsersColumn and allowedGroupsColumn so you can apply access controls based on users and groups, respectively
  • You can map additional fields to Amazon Kendra index attributes and enable them for faceting, search, and display in the search results
  • You can integrate the Dropbox data source with the Custom Document Enrichment (CDE) capability in Amazon Kendra to perform additional attribute mapping logic and even custom content transformation during ingestion

To learn about these possibilities and more, refer to the Amazon Kendra Developer Guide.


About the author

Ashish Lagwankar is a Senior Enterprise Solutions Architect at AWS. His core interests include AI/ML, serverless, and container technologies. Ashish is based in the Boston, MA, area and enjoys reading, outdoors, and spending time with his family.

Read More

Provision and manage ML environments with Amazon SageMaker Canvas using AWS CDK and AWS Service Catalog

The proliferation of machine learning (ML) across a wide range of use cases is becoming prevalent in every industry. However, this outpaces the increase in the number of ML practitioners who have traditionally been responsible for implementing these technical solutions to realize business outcomes.

In today’s enterprise, there is a need for machine learning to be used by non-ML practitioners who are proficient with data, which is the foundation of ML. To make this a reality, the value of ML is being realized across the enterprise through no-code ML platforms. These platforms enable different personas, for example business analysts, to use ML without writing a single line of code and deliver solutions to business problems in a quick, simple, and intuitive manner. Amazon SageMaker Canvas is a visual point-and-click service that enables business analysts to use ML to solve business problems by generating accurate predictions on their own—without requiring any ML experience or having to write a single line of code. Canvas has expanded the use of ML in the enterprise with a simple-to-use intuitive interface that helps businesses implement solutions quickly.

Although Canvas has enabled democratization of ML, the challenge of provisioning and deploying ML environments in a secure manner still remains. Typically, this is the responsibility of central IT teams in most large enterprises. In this post, we discuss how IT teams can administer, provision, and manage secure ML environments using Amazon SageMaker Canvas, AWS Cloud Development Kit (AWS CDK) and AWS Service Catalog. The post presents a step-by-step guide for IT administrators to achieve this quickly and at scale.

Overview of the AWS CDK and AWS Service Catalog

The AWS CDK is an open-source software development framework to define your cloud application resources. It uses the familiarity and expressive power of programming languages for modeling your applications, while provisioning resources in a safe and repeatable manner.

AWS Service Catalog lets you centrally manage deployed IT services, applications, resources, and metadata. With AWS Service Catalog, you can create, share, organize and govern cloud resources with infrastructure as code (IaC) templates and enable fast and straightforward provisioning.

Solution overview

We enable provisioning of ML environments using Canvas in three steps:

  1. First, we share how you can manage a portfolio of resources necessary for the approved usage of Canvas using AWS Service Catalog.
  2. Then, we deploy an example AWS Service Catalog portfolio for Canvas using the AWS CDK.
  3. Finally, we demonstrate how you can provision Canvas environments on demand within minutes.

Prerequisites

To provision ML environments with Canvas, the AWS CDK, and AWS Service Catalog, you need to do the following:

  1. Have access to the AWS account where the Service Catalog portfolio will be deployed. Make sure you have the credentials and permissions to deploy the AWS CDK stack into your account. The AWS CDK Workshop is a helpful resource you can refer to if you need support.
  2. We recommend following certain best practices that are highlighted through the concepts detailed in the following resources:
  3. Clone this GitHub repository into your environment.

Provision approved ML environments with Amazon SageMaker Canvas using AWS Service Catalog

In regulated industries and most large enterprises, you need to adhere to the requirements mandated by IT teams to provision and manage ML environments. These may include a secure, private network, data encryption, controls to allow only authorized and authenticated users such as AWS Identity and Access Management (IAM) for accessing solutions such as Canvas, and strict logging and monitoring for audit purposes.

As an IT administrator, you can use AWS Service Catalog to create and organize secure, reproducible ML environments with SageMaker Canvas into a product portfolio. This is managed using IaC controls that are embedded to meet the requirements mentioned before, and can be provisioned on demand within minutes. You can also maintain control of who can access this portfolio to launch products.

The following diagram illustrates this architecture.

Example flow

In this section, we demonstrate an example of an AWS Service Catalog portfolio with SageMaker Canvas. The portfolio consists of different aspects of the Canvas environment that are part of the Service Catalog portfolio:

  • Studio domain – Canvas is an application that runs within Studio domains. The domain consists of an Amazon Elastic File System (Amazon EFS) volume, a list of authorized users, and a range of security, application, policy, and Amazon Virtual Private Cloud (VPC) configurations. An AWS account is linked to one domain per Region.
  • Amazon S3 bucket – After the Studio domain is created, an Amazon Simple Storage Service (Amazon S3) bucket is provisioned for Canvas to allow importing datasets from local files, also known as local file upload. This bucket is in the customer’s account and is provisioned once.
  • Canvas user – SageMaker Canvas is an application where you can add user profiles within the Studio domain for each Canvas user, who can proceed to import datasets, build and train ML models without writing code, and run predictions on the model.
  • Scheduled shutdown of Canvas sessions – Canvas users can log out from the Canvas interface when they’re done with their tasks. Alternatively, administrators can shut down Canvas sessions from the AWS Management Console as part of managing the Canvas sessions. In this part of the AWS Service Catalog portfolio, an AWS Lambda function is created and provisioned to automatically shut down Canvas sessions at defined scheduled intervals. This helps manage open sessions and shut them down when not in use.

This example flow can be found in the GitHub repository for quick reference.

Deploy the flow with the AWS CDK

In this section, we deploy the flow described earlier using the AWS CDK. After it’s deployed, you can also do version tracking and manage the portfolio.

The portfolio stack can be found in app.py and the product stacks under the products/ folder. You can iterate on the IAM roles, AWS Key Management Service (AWS KMS) keys, and VPC setup in the studio_constructs/ folder. Before deploying the stack into your account, you can edit the following lines in app.py and grant portfolio access to an IAM role of your choice.

You can manage access to the portfolio for the relevant IAM users, groups, and roles. See Granting Access to Users for more details.

Deploy the portfolio into your account

You can now run the following commands to install the AWS CDK and make sure you have the right dependencies to deploy the portfolio:

npm install -g aws-cdk@2.27.0
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt

Run the following commands to deploy the portfolio into your account:

ACCOUNT_ID=$(aws sts get-caller-identity --query Account | tr -d '"')
AWS_REGION=$(aws configure get region)
cdk bootstrap aws://${ACCOUNT_ID}/${AWS_REGION}
cdk deploy --require-approval never

The first two commands get your account ID and current Region using the AWS Command Line Interface (AWS CLI) on your computer. Following this, cdk bootstrap and cdk deploy build assets locally, and deploy the stack in a few minutes.

The portfolio can now be found in AWS Service Catalog, as shown in the following screenshot.

On-demand provisioning

The products within the portfolio can be launched quickly and easily on demand from the Provisioning menu on the AWS Service Catalog console. A typical flow is to launch the Studio domain and the Canvas auto shutdown first because this is usually a one-time action. You can then add Canvas users to the domain. The domain ID and user IAM role ARN are saved in AWS Systems Manager and are automatically populated with the user parameters as shown in the following screenshot.

You can also use cost allocation tags that are attached to each user. For example, UserCostCenter is a sample tag where you can add the name of each user.

Key considerations for governing ML environments using Canvas

Now that we have provisioned and deployed an AWS Service Catalog portfolio focused on Canvas, we’d like to highlight a few considerations to govern the Canvas-based ML environments focused on the domain and the user profile.

The following are considerations regarding the Studio domain:

  • Networking for Canvas is managed at the Studio domain level, where the domain is deployed on a private VPC subnet for secure connectivity. See Securing Amazon SageMaker Studio connectivity using a private VPC to learn more.
  • A default IAM execution role is defined at the domain level. This default role is assigned to all Canvas users in the domain.
  • Encryption is done using AWS KMS by encrypting the EFS volume in the domain. For additional controls, you can specify your own managed key, also known as a customer managed key (CMK). See Protect Data at Rest Using Encryption to learn more.
  • The ability to upload files from your local disk is done by attaching a cross-origin resource sharing (CORS) policy to the S3 bucket used by Canvas. See Give Your Users Permissions to Upload Local Files to learn more.

The following are considerations regarding the user profile:

  • Authentication in Studio can be done both through single sign-on (SSO) and IAM. If you have an existing identity provider to federate users to access the console, you can assign a Studio user profile to each federated identity using IAM. See the section Assigning the policy to Studio users in Configuring Amazon SageMaker Studio for teams and groups with complete resource isolation to learn more.
  • You can assign IAM execution roles to each user profile. While using Studio, a user assumes the role mapped to their user profile that overrides the default execution role. You can use this for fine-grained access controls within a team.
  • You can achieve isolation using attribute-based access controls (ABAC) to ensure users can only access the resources for their team. See Configuring Amazon SageMaker Studio for teams and groups with complete resource isolation to learn more.
  • You can perform fine-grained cost tracking by applying cost allocation tags to user profiles.

Clean up

In order to clean up the resources created by the AWS CDK stack above, navigate over to the AWS CloudFormation stacks page and delete the Canvas stacks. You can also run cdk destroy from within the repository folder, to do the same.

Conclusion

In this post, we shared how you can quickly and easily provision ML environments with Canvas using AWS Service Catalog and the AWS CDK. We discussed how you can create a portfolio on AWS Service Catalog, provision the portfolio, and deploy it in your account. IT administrators can use this method to deploy and manage users, sessions, and associated costs while provisioning Canvas.

Learn more about Canvas on the product page and the Developer Guide. For further reading, you can learn how to enable business analysts to access SageMaker Canvas using AWS SSO without the console. You can also learn how business analysts and data scientists can collaborate faster using Canvas and Studio.


About the Authors

Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customers throughout Benelux. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.

Sofian Hamiti is an AI/ML specialist Solutions Architect at AWS. He helps customers across industries accelerate their AI/ML journey by helping them build and operationalize end-to-end machine learning solutions.

Shyam Srinivasan is a Principal Product Manager on the AWS AI/ML team, leading product management for Amazon SageMaker Canvas. Shyam cares about making the world a better place through technology and is passionate about how AI and ML can be a catalyst in this journey.

Avi Patel works as a software engineer on the Amazon SageMaker Canvas team. His background consists of working full stack with a frontend focus. In his spare time, he likes to contribute to open source projects in the crypto space and learn about new DeFi protocols.

Jared Heywood is a Senior Business Development Manager at AWS. He is a global AI/ML specialist helping customers with no-code machine learning. He has worked in the AutoML space for the past 5 years and launched products at Amazon like Amazon SageMaker JumpStart and Amazon SageMaker Canvas.

Read More

New features for Amazon SageMaker Pipelines and the Amazon SageMaker SDK

Amazon SageMaker Pipelines allows data scientists and machine learning (ML) engineers to automate training workflows, which helps you create a repeatable process to orchestrate model development steps for rapid experimentation and model retraining. You can automate the entire model build workflow, including data preparation, feature engineering, model training, model tuning, and model validation, and catalog it in the model registry. You can configure pipelines to run automatically at regular intervals or when certain events are triggered, or you can run them manually as needed.

In this post, we highlight some of the enhancements to the Amazon SageMaker SDK and introduce new features of Amazon SageMaker Pipelines that make it easier for ML practitioners to build and train ML models.

Pipelines continues to innovate its developer experience, and with these recent releases, you can now use the service in a more customized way:

  • 2.99.0, 2.101.1, 2.102.0, 2.104.0 – Updated documentation on PipelineVariable usage for estimator, processor, tuner, transformer, and model base classes, Amazon models, and framework models. There will be additional changes coming with newer versions of the SDK to support all subclasses of estimators and processors.
  • 2.90.0 – Availability of ModelStep for integrated model resource creation and registration tasks.
  • 2.88.2 – Availability of PipelineSession for managed interaction with SageMaker entities and resources.
  • 2.88.2 – Subclass compatibility for workflow pipeline job steps so you can build job abstractions and configure and run processing, training, transform, and tuning jobs as you would without a pipeline.
  • 2.76.0 – Availability of FailStep to conditionally stop a pipeline with a failure status.

In this post, we walk you through a workflow using a sample dataset with a focus on model building and deployment to demonstrate how to implement Pipelines’s new features. By the end, you should have enough information to successfully use these newer features and simplify your ML workloads.

Features overview

Pipelines offers the following new features:

  • Pipeline variable annotation – Certain method parameters accept multiple input types, including PipelineVariables, and additional documentation has been added to clarify where PipelineVariables are supported in both the latest stable version of SageMaker SDK documentation and the init signature of the functions. For example, in the following TensorFlow estimator, the init signature now shows that model_dir and image_uri support PipelineVariables, whereas the other parameters do not. For more information, refer to TensorFlow Estimator.

    • Before:
      TensorFlow(
          py_version=None,
          framework_version=None,
          model_dir=None,
          image_uri=None,
          distribution=None,
          **kwargs,
      )

    • After:
      TensorFlow(
          py_version: Union[str, NoneType] = None,
          framework_version: Union[str, NoneType] = None,
          model_dir: Union[str, sagemaker.workflow.entities.PipelineVariable, NoneType] = None,
          image_uri: Union[str, sagemaker.workflow.entities.PipelineVariable, NoneType] = None,
          distribution: Union[Dict[str, str], NoneType] = None,
          compiler_config: Union[sagemaker.tensorflow.training_compiler.config.TrainingCompilerConfig, NoneType] = None,
          **kwargs,
      )

  • Pipeline sessionPipelineSession is a new concept introduced to bring unity across the SageMaker SDK and introduces lazy initialization of the pipeline resources (the run calls are captured but not run until the pipeline is created and run). The PipelineSession context inherits the SageMakerSession and implements convenient methods for you to interact with other SageMaker entities and resources, such as training jobs, endpoints, and input datasets stored in Amazon Simple Storage Service (Amazon S3).
  • Subclass compatibility with workflow pipeline job steps – You can now build job abstractions and configure and run processing, training, transform, and tuning jobs as you would without a pipeline.

    • For example, creating a processing step with SKLearnProcessor previously required the following:
          sklearn_processor = SKLearnProcessor(
              framework_version=framework_version,
              instance_type=processing_instance_type,
              instance_count=processing_instance_count,
              sagemaker_session=sagemaker_session, #sagemaker_session would be passed as an argument
              role=role,
          )
          step_process = ProcessingStep(
              name="{pipeline-name}-process",
              processor=sklearn_processor,
              inputs=[
                ProcessingInput(source=input_data, destination="/opt/ml/processing/input"),  
              ],
              outputs=[
                  ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
                  ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
                  ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
              ],
              code=f"code/preprocess.py",
          )

    • As we see in the preceding code, ProcessingStep needs to do basically the same preprocessing logic as .run, just without initiating the API call to start the job. But with subclass compatibility now enabled with workflow pipeline job steps, we declare the step_args argument that takes the preprocessing logic with .run so you can build a job abstraction and configure it as you would use it without Pipelines. We also pass in the pipeline_session, which is a PipelineSession object, instead of sagemaker_session to make sure the run calls are captured but not called until the pipeline is created and run. See the following code:
      sklearn_processor = SKLearnProcessor(
          framework_version=framework_version,
          instance_type=processing_instance_type,
          instance_count=processing_instance_count,
          sagemaker_session=pipeline_session,#pipeline_session would be passed in as argument
          role=role,
      )
      
      processor_args = sklearn_processor.run(
          inputs=[
            ProcessingInput(source=input_data, destination="/opt/ml/processing/input"),  
          ],
          outputs=[
              ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
              ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
              ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
          ],
          code=f"code/preprocess.py",
      )
      step_process = ProcessingStep(name="{pipeline-name}-process", step_args=processor_args)

  • Model step (a streamlined approach with model creation and registration steps) –Pipelines offers two step types to integrate with SageMaker models: CreateModelStep and RegisterModel. You can now achieve both using only the ModelStep type. Note that a PipelineSession is required to achieve this. This brings similarity between the pipeline steps and the SDK.

    • Before:
      step_register = RegisterModel(
              name="ChurnRegisterModel",
              estimator=xgb_custom_estimator,
              model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
              content_types=["text/csv"],
              response_types=["text/csv"],
              inference_instances=["ml.t2.medium", "ml.m5.large"],
              transform_instances=["ml.m5.large"],
              model_package_group_name=model_package_group_name,
              approval_status=model_approval_status,
              model_metrics=model_metrics,
      )

    • After:
      register_args = model.register(
          content_types=["text/csv"],
          response_types=["text/csv"],
          inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
          transform_instances=["ml.m5.xlarge"],
          model_package_group_name=model_package_group_name,
          approval_status=model_approval_status,
          model_metrics=model_metrics,
      )
      step_register = ModelStep(name="ChurnRegisterModel", step_args=register_args)

  • Fail step (conditional stop of the pipeline run)FailStep allows a pipeline to be stopped with a failure status if a condition is met, such as if the model score is below a certain threshold.

Solution overview

In this solution, your entry point is the Amazon SageMaker Studio integrated development environment (IDE) for rapid experimentation. Studio offers an environment to manage the end-to-end Pipelines experience. With Studio, you can bypass the AWS Management Console for your entire workflow management. For more information on managing Pipelines from within Studio, refer to View, Track, and Execute SageMaker Pipelines in SageMaker Studio.

The following diagram illustrates the high-level architecture of the ML workflow with the different steps to train and generate inferences using the new features.

The pipeline includes the following steps:

  1. Preprocess data to build features required and split data into train, validation, and test datasets.
  2. Create a training job with the SageMaker XGBoost framework.
  3. Evaluate the trained model using the test dataset.
  4. Check if the AUC score is above a predefined threshold.
    • If the AUC score is less than the threshold, stop the pipeline run and mark it as failed.
    • If the AUC score is greater than the threshold, create a SageMaker model and register it in the SageMaker model registry.
  5. Apply batch transform on the given dataset using the model created in the previous step.

Prerequisites

To follow along with this post, you need an AWS account with a Studio domain.

Pipelines is integrated directly with SageMaker entities and resources, so you don’t need to interact with any other AWS services. You also don’t need to manage any resources because it’s a fully managed service, which means that it creates and manages resources for you. For more information on the various SageMaker components that are both standalone Python APIs along with integrated components of Studio, see the SageMaker product page.

Before getting started, install SageMaker SDK version >= 2.104.0 and xlrd >=1.0.0 within the Studio notebook using the following code snippet:

print(sagemaker.__version__)
import sys
!{sys.executable} -m pip install "sagemaker>=2.104.0"
!{sys.executable} -m pip install "xlrd >=1.0.0"
 
import sagemaker

ML workflow

For this post, you use the following components:

  • Data preparation

    • SageMaker Processing – SageMaker Processing is a fully managed service allowing you to run custom data transformations and feature engineering for ML workloads.
  • Model building

  • Model training and evaluation

    • One-click training – The SageMaker distributed training feature. SageMaker provides distributed training libraries for data parallelism and model parallelism. The libraries are optimized for the SageMaker training environment, help adapt your distributed training jobs to SageMaker, and improve training speed and throughput.
    • SageMaker Experiments – Experiments is a capability of SageMaker that lets you organize, track, compare, and evaluate your ML iterations.
    • SageMaker batch transform – Batch transform or offline scoring is a managed service in SageMaker that lets you predict on a larger dataset using your ML models.
  • Workflow orchestration

A SageMaker pipeline is a series of interconnected steps defined by a JSON pipeline definition. It encodes a pipeline using a directed acyclic graph (DAG). The DAG gives information on the requirements for and relationships between each step of the pipeline, and its structure is determined by the data dependencies between steps. These dependencies are created when the properties of a step’s output are passed as the input to another step.

The following diagram illustrates the different steps in the SageMaker pipeline (for a churn prediction use case) where the connections between the steps are inferred by SageMaker based on the inputs and outputs defined by the step definitions.

The next sections walk through creating each step of the pipeline and running the entire pipeline once created.

Project structure

Let’s start with the project structure:

  • /sm-pipelines-end-to-end-example – The project name

    • /data – The datasets
    • /pipelines – The code files for pipeline components

      • /customerchurn
        • preprocess.py
        • evaluate.py
    • sagemaker-pipelines-project.ipynb – A notebook walking through the modeling workflow using Pipelines’s new features

Download the dataset

To follow along with this post, you need to download and save the sample dataset under the data folder within the project home directory, which saves the file in Amazon Elastic File System (Amazon EFS) within the Studio environment.

Build the pipeline components

Now you’re ready to build the pipeline components.

Import statements and declare parameters and constants

Create a Studio notebook called sagemaker-pipelines-project.ipynb within the project home directory. Enter the following code block in a cell, and run the cell to set up SageMaker and S3 client objects, create PipelineSession, and set up the S3 bucket location using the default bucket that comes with a SageMaker session:

import boto3
import pandas as pd
import sagemaker
from sagemaker.workflow.pipeline_context import PipelineSession
 
s3_client = boto3.resource('s3')
pipeline_name = f"ChurnModelPipeline"
sagemaker_session = sagemaker.session.Session()
region = sagemaker_session.boto_region_name
role = sagemaker.get_execution_role()
pipeline_session = PipelineSession()
default_bucket = sagemaker_session.default_bucket()
model_package_group_name = f"ChurnModelPackageGroup"

Pipelines supports parameterization, which allows you to specify input parameters at runtime without changing your pipeline code. You can use the modules available under the sagemaker.workflow.parameters module, such as ParameterInteger, ParameterFloat, and ParameterString, to specify pipeline parameters of various data types. Run the following code to set up multiple input parameters:

from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
    ParameterFloat,
)
auc_score_threshold = 0.75
base_job_prefix = "churn-example"
model_package_group_name = "churn-job-model-packages"
batch_data = "s3://{}/data/batch/batch.csv".format(default_bucket)

processing_instance_count = ParameterInteger(
    name="ProcessingInstanceCount",
    default_value=1
)
processing_instance_type = ParameterString(
    name="ProcessingInstanceType",
    default_value="ml.m5.xlarge"
)
training_instance_type = ParameterString(
    name="TrainingInstanceType",
    default_value="ml.m5.xlarge"
)
input_data = ParameterString(
    name="InputData",
    default_value="s3://{}/data/storedata_total.csv".format(default_bucket),
)

model_approval_status = ParameterString(
    name="ModelApprovalStatus", default_value="PendingManualApproval"
)

Generate a batch dataset

Generate the batch dataset, which you use later in the batch transform step:

def preprocess_batch_data(file_path):
    df = pd.read_csv(file_path)
    ## Convert to datetime columns
    df["firstorder"]=pd.to_datetime(df["firstorder"],errors='coerce')
    df["lastorder"] = pd.to_datetime(df["lastorder"],errors='coerce')
    ## Drop Rows with null values
    df = df.dropna()
    ## Create Column which gives the days between the last order and the first order
    df["first_last_days_diff"] = (df['lastorder']-df['firstorder']).dt.days
    ## Create Column which gives the days between when the customer record was created and the first order
    df['created'] = pd.to_datetime(df['created'])
    df['created_first_days_diff']=(df['created']-df['firstorder']).dt.days
    ## Drop Columns
    df.drop(['custid','created','firstorder','lastorder'],axis=1,inplace=True)
    ## Apply one hot encoding on favday and city columns
    df = pd.get_dummies(df,prefix=['favday','city'],columns=['favday','city'])
    return df
    
# convert the store_data file into csv format
store_data = pd.read_excel("data/storedata_total.xlsx")
store_data.to_csv("data/storedata_total.csv")
 
# preprocess batch data and save into the data folder
batch_data = preprocess_batch_data("data/storedata_total.csv")
batch_data.pop("retained")
batch_sample = batch_data.sample(frac=0.2)
pd.DataFrame(batch_sample).to_csv("data/batch.csv",header=False,index=False)

Upload data to an S3 bucket

Upload the datasets to Amazon S3:

s3_client.Bucket(default_bucket).upload_file("data/batch.csv","data/batch/batch.csv")
s3_client.Bucket(default_bucket).upload_file("data/storedata_total.csv","data/storedata_total.csv")

Define a processing script and processing step

In this step, you prepare a Python script to do feature engineering, one hot encoding, and curate the training, validation, and test splits to be used for model building. Run the following code to build your processing script:

%%writefile pipelines/customerchurn/preprocess.py

import os
import tempfile
import numpy as np
import pandas as pd
import datetime as dt
if __name__ == "__main__":
    base_dir = "/opt/ml/processing"
    #Read Data
    df = pd.read_csv(
        f"{base_dir}/input/storedata_total.csv"
    )
    # convert created column to datetime
    df["created"] = pd.to_datetime(df["created"])
    #Convert firstorder and lastorder to datetime datatype
    df["firstorder"] = pd.to_datetime(df["firstorder"],errors='coerce')
    df["lastorder"] = pd.to_datetime(df["lastorder"],errors='coerce')
    #Drop Rows with Null Values
    df = df.dropna()
    #Create column which gives the days between the last order and the first order
    df['first_last_days_diff'] = (df['lastorder'] - df['firstorder']).dt.days
    #Create column which gives the days between the customer record was created and the first order
    df['created_first_days_diff'] = (df['created'] - df['firstorder']).dt.days
    #Drop columns
    df.drop(['custid', 'created','firstorder','lastorder'], axis=1, inplace=True)
    #Apply one hot encoding on favday and city columns
    df = pd.get_dummies(df, prefix=['favday', 'city'], columns=['favday', 'city'])
    # Split into train, validation and test datasets
    y = df.pop("retained")
    X_pre = df
    y_pre = y.to_numpy().reshape(len(y), 1)
    X = np.concatenate((y_pre, X_pre), axis=1)
    np.random.shuffle(X)
    # Split in Train, Test and Validation Datasets
    train, validation, test = np.split(X, [int(.7*len(X)), int(.85*len(X))])
    train_rows = np.shape(train)[0]
    validation_rows = np.shape(validation)[0]
    test_rows = np.shape(test)[0]
    train = pd.DataFrame(train)
    test = pd.DataFrame(test)
    validation = pd.DataFrame(validation)
    # Convert the label column to integer
    train[0] = train[0].astype(int)
    test[0] = test[0].astype(int)
    validation[0] = validation[0].astype(int)
    # Save the Dataframes as csv files
    train.to_csv(f"{base_dir}/train/train.csv", header=False, index=False)
    validation.to_csv(f"{base_dir}/validation/validation.csv", header=False, index=False)
    test.to_csv(f"{base_dir}/test/test.csv", header=False, index=False)

Next, run the following code block to instantiate the processor and the Pipelines step to run the processing script. Because the processing script is written in Pandas, you use a SKLearnProcessor. The Pipelines ProcessingStep function takes the following arguments: the processor, the input S3 locations for raw datasets, and the output S3 locations to save processed datasets.

# Upload processing script to S3
s3_client.Bucket(default_bucket).upload_file("pipelines/customerchurn/preprocess.py","input/code/preprocess.py")

# Define Processing Step for Feature Engineering
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep

framework_version = "1.0-1"sklearn_processor = SKLearnProcessor(
    framework_version=framework_version,
    instance_type="ml.m5.xlarge",
    instance_count=processing_instance_count,
    base_job_name="sklearn-churn-process",
    role=role,
    sagemaker_session=pipeline_session,
)
processor_args = sklearn_processor.run(
    inputs=[
      ProcessingInput(source=input_data, destination="/opt/ml/processing/input"),  
    ],
    outputs=[
        ProcessingOutput(output_name="train", source="/opt/ml/processing/train",
                         destination=f"s3://{default_bucket}/output/train" ),
        ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation",
                        destination=f"s3://{default_bucket}/output/validation"),
        ProcessingOutput(output_name="test", source="/opt/ml/processing/test",
                        destination=f"s3://{default_bucket}/output/test")
    ],
    code=f"s3://{default_bucket}/input/code/preprocess.py",
)
step_process = ProcessingStep(name="ChurnModelProcess", step_args=processor_args)

Define a training step

Set up model training using a SageMaker XGBoost estimator and the Pipelines TrainingStep function:

from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput

model_path = f"s3://{default_bucket}/output"
image_uri = sagemaker.image_uris.retrieve(
    framework="xgboost",
    region=region,
    version="1.0-1",
    py_version="py3",
    instance_type="ml.m5.xlarge",
)
xgb_train = Estimator(
    image_uri=image_uri,
    instance_type=training_instance_type,
    instance_count=1,
    output_path=model_path,
    role=role,
    sagemaker_session=pipeline_session,
)
xgb_train.set_hyperparameters(
    objective="reg:linear",
    num_round=50,
    max_depth=5,
    eta=0.2,
    gamma=4,
    min_child_weight=6,
    subsample=0.7,
)

train_args = xgb_train.fit(
    inputs={
            "train": TrainingInput(
                s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                    "train"
                ].S3Output.S3Uri,
                content_type="text/csv",
            ),
            "validation": TrainingInput(
                s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                    "validation"
                ].S3Output.S3Uri,
                content_type="text/csv",
            ),
        },
)
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep
step_train = TrainingStep(
    name="ChurnModelTrain",
    step_args=train_args,
    )

Define the evaluation script and model evaluation step

Run the following code block to evaluate the model once trained. This script encapsulates the logic to check if the AUC score meets the specified threshold.

%%writefile pipelines/customerchurn/evaluate.py

import json
import pathlib
import pickle
import tarfile
import joblib
import numpy as np
import pandas as pd
import xgboost
import datetime as dt
from sklearn.metrics import roc_curve,auc
if __name__ == "__main__":   
    #Read Model Tar File
    model_path = f"/opt/ml/processing/model/model.tar.gz"
    with tarfile.open(model_path) as tar:
        tar.extractall(path=".")
    model = pickle.load(open("xgboost-model", "rb"))
    #Read Test Data using which we evaluate the model
    test_path = "/opt/ml/processing/test/test.csv"
    df = pd.read_csv(test_path, header=None)
    y_test = df.iloc[:, 0].to_numpy()
    df.drop(df.columns[0], axis=1, inplace=True)
    X_test = xgboost.DMatrix(df.values)
    #Run Predictions
    predictions = model.predict(X_test)
    #Evaluate Predictions
    fpr, tpr, thresholds = roc_curve(y_test, predictions)
    auc_score = auc(fpr, tpr)
    report_dict = {
        "classification_metrics": {
            "auc_score": {
                "value": auc_score,
            },
        },
    }
    #Save Evaluation Report
    output_dir = "/opt/ml/processing/evaluation"
    pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)
    evaluation_path = f"{output_dir}/evaluation.json"
    with open(evaluation_path, "w") as f:
        f.write(json.dumps(report_dict))

Next, run the following code block to instantiate the processor and the Pipelines step to run the evaluation script. Because the evaluation script uses the XGBoost package, you use a ScriptProcessor along with the XGBoost image. The Pipelines ProcessingStep function takes the following arguments: the processor, the input S3 locations for raw datasets, and the output S3 locations to save processed datasets.

#Upload the evaluation script to S3
s3_client.Bucket(default_bucket).upload_file("pipelines/customerchurn/evaluate.py","input/code/evaluate.py")
from sagemaker.processing import ScriptProcessor
# define model evaluation step to evaluate the trained model
script_eval = ScriptProcessor(
    image_uri=image_uri,
    command=["python3"],
    instance_type=processing_instance_type,
    instance_count=1,
    base_job_name="script-churn-eval",
    role=role,
    sagemaker_session=pipeline_session,
)
eval_args = script_eval.run(
    inputs=[
        ProcessingInput(
            source=step_train.properties.ModelArtifacts.S3ModelArtifacts,
            destination="/opt/ml/processing/model",
        ),
        ProcessingInput(
            source=step_process.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
            destination="/opt/ml/processing/test",
        ),
    ],
    outputs=[
            ProcessingOutput(output_name="evaluation", source="/opt/ml/processing/evaluation",
                             destination=f"s3://{default_bucket}/output/evaluation"),
        ],
    code=f"s3://{default_bucket}/input/code/evaluate.py",
)
from sagemaker.workflow.properties import PropertyFile
evaluation_report = PropertyFile(
    name="ChurnEvaluationReport", output_name="evaluation", path="evaluation.json"
)
step_eval = ProcessingStep(
    name="ChurnEvalModel",
    step_args=eval_args,
    property_files=[evaluation_report],
)

Define a create model step

Run the following code block to create a SageMaker model using the Pipelines model step. This step utilizes the output of the training step to package the model for deployment. Note that the value for the instance type argument is passed using the Pipelines parameter you defined earlier in the post.

from sagemaker import Model
from sagemaker.inputs import CreateModelInput
from sagemaker.workflow.model_step import ModelStep
# step to create model 
model = Model(
    image_uri=image_uri,        
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    sagemaker_session=pipeline_session,
    role=role,
)
step_create_model = ModelStep(
    name="ChurnCreateModel",
    step_args=model.create(instance_type="ml.m5.large", accelerator_type="ml.eia1.medium"),
)

Define a batch transform step

Run the following code block to run batch transformation using the trained model with the batch input created in the first step:

from sagemaker.transformer import Transformer
from sagemaker.inputs import TransformInput
from sagemaker.workflow.steps import TransformStep

transformer = Transformer(
    model_name=step_create_model.properties.ModelName,
    instance_type="ml.m5.xlarge",
    instance_count=1,
    output_path=f"s3://{default_bucket}/ChurnTransform",
    sagemaker_session=pipeline_session
)
                                 
step_transform = TransformStep(
    name="ChurnTransform", 
    step_args=transformer.transform(
                    data=batch_data,
                    content_type="text/csv"
                 )
)

Define a register model step

The following code registers the model within the SageMaker model registry using the Pipelines model step:

model = Model(
    image_uri=image_uri,
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    sagemaker_session=pipeline_session,
    role=role,
)
from sagemaker.model_metrics import MetricsSource, ModelMetrics

model_metrics = ModelMetrics(
    model_statistics=MetricsSource(
        s3_uri="{}/evaluation.json".format(
            step_eval.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
        ),
        content_type="application/json",
    )
)
register_args = model.register(
    content_types=["text/csv"],
    response_types=["text/csv"],
    inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
    transform_instances=["ml.m5.xlarge"],
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status,
    model_metrics=model_metrics,
)
step_register = ModelStep(name="ChurnRegisterModel", step_args=register_args)

Define a fail step to stop the pipeline

The following code defines the Pipelines fail step to stop the pipeline run with an error message if the AUC score doesn’t meet the defined threshold:

from sagemaker.workflow.fail_step import FailStep
from sagemaker.workflow.functions import Join
step_fail = FailStep(
    name="ChurnAUCScoreFail",
    error_message=Join(on=" ", values=["Execution failed due to AUC Score >", auc_score_threshold]),
    )

Define a condition step to check AUC score

The following code defines a condition step to check the AUC score and conditionally create a model and run a batch transformation and register a model in the model registry, or stop the pipeline run in a failed state:

from sagemaker.workflow.conditions import ConditionGreaterThan
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.functions import JsonGet
cond_lte = ConditionGreaterThan(
    left=JsonGet(
        step_name=step_eval.name,
        property_file=evaluation_report,
        json_path="classification_metrics.auc_score.value",
    ),
    right=auc_score_threshold,
)
step_cond = ConditionStep(
    name="CheckAUCScoreChurnEvaluation",
    conditions=[cond_lte],
    if_steps=[step_register, step_create_model, step_transform],
    else_steps=[step_fail],
)

Build and run the pipeline

After defining all of the component steps, you can assemble them into a Pipelines object. You don’t need to specify the order of pipeline because Pipelines automatically infers the order sequence based on the dependencies between the steps.

import json
from sagemaker.workflow.pipeline import Pipeline

pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        processing_instance_count,
        processing_instance_type,
        training_instance_type,
        model_approval_status,
        input_data,
        batch_data,
        auc_score_threshold,
    ],
    steps=[step_process, step_train, step_eval, step_cond],
) 
definition = json.loads(pipeline.definition())
print(definition)

Run the following code in a cell in your notebook. If the pipeline already exists, the code updates the pipeline. If the pipeline doesn’t exist, it creates a new one.

pipeline.start()
# Create a new or update existing Pipeline
pipeline.upsert(role_arn=sagemaker_role)
# start Pipeline execution

Conclusion

In this post, we introduced some of the new features now available with Pipelines along with other built-in SageMaker features and the XGBoost algorithm to develop, iterate, and deploy a model for churn prediction. The solution can be extended with additional data sources

to implement your own ML workflow. For more details on the steps available in the Pipelines workflow, refer to Amazon SageMaker Model Building Pipeline and SageMaker Workflows. The AWS SageMaker Examples GitHub repo has more examples around various use cases using Pipelines.


About the Authors

Jerry Peng is a software development engineer with AWS SageMaker. He focuses on building end-to-end large-scale MLOps system from training to model monitoring in production. He is also passionate about bringing the concept of MLOps to broader audience.

Dewen Qi is a Software Development Engineer in AWS. She currently focuses on developing and improving SageMaker Pipelines. Outside of work, she enjoys practicing Cello.

Gayatri Ghanakota is a Sr. Machine Learning Engineer with AWS Professional Services. She is passionate about developing, deploying, and explaining AI/ ML solutions across various domains. Prior to this role, she led multiple initiatives as a data scientist and ML engineer with top global firms in the financial and retail space. She holds a master’s degree in Computer Science specialized in Data Science from the University of Colorado, Boulder.

Rupinder Grewal is a Sr Ai/ML Specialist Solutions Architect with AWS. He currently focuses on serving of models and MLOps on SageMaker. Prior to this role he has worked as Machine Learning Engineer building and hosting models. Outside of work he enjoys playing tennis and biking on mountain trails.

Ray Li is a Sr. Data Scientist with AWS Professional Services. His specialty focuses on building and operationalizing AI/ML solutions for customers of varying sizes, ranging from startups to enterprise organizations. Outside of work, Ray enjoys fitness and traveling.

Read More

Reduce the time taken to deploy your models to Amazon SageMaker for testing

Data scientists often train their models locally and look for a proper hosting service to deploy their models. Unfortunately, there’s no one set mechanism or guide to deploying pre-trained models to the cloud. In this post, we look at deploying trained models to Amazon SageMaker hosting to reduce your deployment time.

SageMaker is a fully managed machine learning (ML) service. With SageMaker, you can quickly build and train ML models and directly deploy them into a production-ready hosted environment. Additionally, you don’t need to manage servers. You get an integrated Jupyter notebook environment with easy access to your data sources. You can perform data analysis, train your models, and test them using your own algorithms or use SageMaker-provided ML algorithms that are optimized to run efficiently against large datasets spread across multiple machines. Training and hosting are billed by minutes of usage, with no minimum fees and no upfront commitments.

Solution overview

Data scientists sometimes train models locally using their IDE and either ship those models to the ML engineering team for deployment or just run predictions locally on powerful machines. In this post, we introduce a Python library that simplifies the process of deploying models to SageMaker for hosting on real-time or serverless endpoints.

This Python library gives data scientists a simple interface to quickly get started on SageMaker without needing to know any of the low-level SageMaker functionality.

If you have models trained locally using your preferred IDE and want to benefit from the scale of the cloud, you can use this library to deploy your model to SageMaker. With SageMaker, in addition to all the scaling benefits of a cloud-based ML platform, you have access to purpose-built training tools (distributed training, hyperparameter tuning), experiment management, model management, bias detection, model explainability, and many other capabilities that can help you in any aspect of the ML lifecycle. You can choose from the three most popular frameworks for ML: Scikit-learn, PyTorch, and TensorFlow, and can pick the type of compute you want. Defaults are provided along the way so users of this library can deploy their models without needing to make complex decisions or learn new concepts. In this post, we show you how to get started with this library and optimize deploying your ML models on SageMaker hosting.

The library can be found in the GitHub repository.

The SageMaker Migration Toolkit

The SageMakerMigration class is available through a Python library published to GitHub. Instructions to install this library are provided in the repository; make sure that you follow the README to properly set up your environment. After you install this library, the rest of this post talks about how you can use it.

The SageMakerMigration class consists of high-level abstractions over SageMaker APIs that significantly reduce the steps needed to deploy your model to SageMaker, as illustrated in the following figure. This is intended for experimentation so developers can quickly get started and test SageMaker. It is not intended for production migrations.

For Scikit-learn, PyTorch, and TensorFlow models, this library supports deploying trained models to a SageMaker real-time endpoint or serverless endpoint. To learn more about the inference options in SageMaker, refer to Deploy Models for Inference.

Real-time vs. serverless endpoints

Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements. You can deploy your model to SageMaker hosting services and get an endpoint that can be used for inference. These endpoints are fully managed and support auto scaling.

SageMaker Serverless Inference is a purpose-built inference option that makes it easy for you to deploy and scale ML models. Serverless Inference is ideal for workloads that have idle periods between traffic spurts and can tolerate cold starts. Serverless endpoints automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies. This takes away the undifferentiated heavy lifting of selecting and managing servers.

Depending on your use case, you may want to quickly host your model on SageMaker without actually having an instance always on and incurring costs, in which case a serverless endpoint is a great solution.

Prepare your trained model and inference script

After you’ve identified the model you want to deploy on SageMaker, you must ensure the model is presented to SageMaker in the right format. SageMaker endpoints generally consist of two components: the trained model artifact (.pth, .pkl, and so on) and an inference script. The inference script is not always mandatory, but if not provided, the default handlers for the serving container that you’re using are applied. It’s essential to provide this script if you need to customize your input/output functionality for inference.

The trained model artifact is simply a saved Scikit-learn, PyTorch, or TensorFlow model. For Scikit-learn, this is typically a pickle file, for PyTorch this is a .pt or .pth file, and for TensorFlow this is a folder with assets, .pb files, and other variables.

Generally, you need to able to control how your model processes input and performs inference, and control the output format for your response. With SageMaker, you can provide an inference script to add this customization. Any inference script used by SageMaker must have one or more of the following four handler functions: model_fn, input_fn, predict_fn, and output_fn.

Note that these four functions apply to PyTorch and Scikit-learn containers specifically. TensorFlow has slightly different handlers because it’s integrated with TensorFlow Serving. For an inference script with TensorFlow, you have two model handlers: input_handler and output_handler. Again, these have the same preprocessing and postprocessing purpose that you can work with, but they’re configured slightly differently to integrate with TensorFlow Serving. For PyTorch models, model_fn is a compulsory function to have in the inference script.

model_fn

This is the function that is first called when you invoke your SageMaker endpoint. This is where you write your code to load the model. For example:

def model_fn(model_dir):
    model = Your_Model()
    with open(os.path.join(model_dir, 'model.pth'), 'rb') as f:
        model.load_state_dict(torch.load(f))
    return model

Depending on the framework and type of model, this code may change, but the function must return an initialized model.

input_fn

This is the second function that is called when your endpoint is invoked. This function takes the data sent to the endpoint for inference and parses it into the format required for the model to generate a prediction. For example:

def input_fn(request_body, request_content_type):
    """An input_fn that loads a pickled tensor"""
    if request_content_type == 'application/python-pickle':
        return torch.load(BytesIO(request_body))
    else:
        # Handle other content-types here or raise an Exception
        # if the content type is not supported.
        pass

The request_body contains the data to be used for generating inference from the model and is parsed in this function so that it’s in the required format.

predict_fn

This is the third function that is called when your model is invoked. This function takes the preprocessed input data returned from input_fn and uses the model returned from model_fn to make the prediction. For example:

def predict_fn(input_data, model):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)
    model.eval()
    with torch.no_grad():
        return model(input_data.to(device))

You can optionally add output_fn to parse the output of predict_fn before returning it to the client. The function signature is def output_fn(prediction, content_type).

Move your pre-trained model to SageMaker

After you have your trained model file and inference script, you must put these files in a folder as follows:

#SKLearn Model

model_folder/
    model.pkl
    inference.py
    
# Tensorflow Model
model_folder/
    0000001/
        assets/
        variables/
        keras_metadata.pb
        saved_model.pb
    inference.py
    
# PyTorch Model
model_folder/
    model.pth
    inference.py

After your model and inference script have been prepared and saved in this folder structure, your model is ready for deployment on SageMaker. See the following code:

from sagemaker_migration import frameworks as fwk

if __name__ == "__main__":
    ''' '''
    sk_model = fwk.SKLearnModel(
        version = "0.23-1", 
        model_data = 'model.joblib',
        inference_option = 'real-time',
        inference = 'inference.py',
        instance_type = 'ml.m5.xlarge'
    )
    sk_model.deploy_to_sagemaker()

After deployment of your endpoint, make sure to clean up any resources you won’t utilize via the SageMaker console or through the delete_endpoint Boto3 API call.

Conclusion

The goal of the SageMaker Migration Toolkit project is to make it easy for data scientists to onboard their models onto SageMaker to take advantage of cloud-based inference. The repository will continue to evolve and support more options for migrating workloads to SageMaker. The code is open source and we welcome community contributions through pull requests and issues.

Check out the GitHub repository to explore more on utilizing the SageMaker Migration Toolkit, and feel free to also contribute examples or feature requests to add to the project!


About the authors

Kirit Thadaka is an ML Solutions Architect working in the Amazon SageMaker Service SA team. Prior to joining AWS, Kirit spent time working in early stage AI startups followed by some time in consulting in various roles in AI research, MLOps, and technical leadership.

Ram Vegiraju is a ML Architect with the SageMaker Service team. He focuses on helping customers build and optimize their AI/ML solutions on Amazon SageMaker. In his spare time, he loves traveling and writing.

Read More