Amazon’s Joe Tighe on the major trends he sees in the field of computer vision.Read More
How Wix empowers customer care with AI capabilities using Amazon Transcribe
With over 200 million users worldwide, Wix is a leading cloud-based development platform for building fully personalized, high-quality websites. Wix makes it easy for anyone to create a beautiful and professional web presence. When Wix started, it was easy to understand user sentiment and identify product improvement opportunities because the user base was small. Such information could include the quality of support operations, product issues, and feature requests.
Thousands of Wix customer care experts support tens of thousands of calls a day in various languages from countries around the world. Wix previously used user surveys to measure user sentiment regarding the company brand, products, services, or interactions with customer care agents. At best, we managed to receive feedback on 12% of our calls. In addition, this process was manual and limited in coverage. We were losing sight of important information crucial to customer success. This is where machine learning (ML) can solve many of these challenges.
ML capabilities such as automatic speech recognition enables you to process 100% of your customer conversations and improve your ability to understand and serve your customers. With accurate call transcripts, you can unlock further insights such as sentiment, trending issues, and agent effectiveness at resolving calls. Sentiment analysis is the use of natural language processing (NLP), a subfield of ML that determines whether data is positive, negative, or neutral. This helps agents and supervisors better understand and anticipate customer needs while enabling them to make informed decisions using actionable insights.
Wix wanted to expand visibility of customer conversation sentiment to 100% with the help of ML. In this post, we explain how Wix used Amazon Transcribe, a speech to text service, accurately redacted personally identifiable information (PII) from phone calls and other customer interactions from other channels, to develop a sentiment analysis system that can effectively determine how users feel throughout an interaction with customer care agents.
How we integrated Amazon Transcribe
Building a sentiment analysis service requires three main components:
- A data store for audio calls and transcribed data. For our solution, we used Amazon Simple Storage Service (Amazon S3).
- An automatic speech recognition ML model (Amazon Transcribe) for converting audio into text transcriptions.
- A sentiment analysis ML model for predicting sentiment.
For transcription (speech to text), we evaluated three leading vendors. The predominant parameters were accuracy, ease of use, and features for the call center use case (such as PII redaction). We found Amazon Transcribe to be the leading solution. The following are some of the differentiating capabilities:
- Custom vocabulary, is a list of specific words that you want Amazon Transcribe to recognize in your audio input. These are generally domain-specific words, phrases, or proper nouns that Amazon Transcribe isn’t recognizing. Custom vocabularies worked well to capture Wix’s specific terminology and phrases, such as the company name. The following is an example of the vocabulary we used:
Phrase | Sounds Like |
Wix | weeks |
Wix | picks |
Wix.com | weeks-dot-com |
Wix.com | wix-that-come |
Wix Professionals | Wix-affection-als |
After you upload your custom vocabulary list, you can use it for a transcription job.
- Channel identification in which Amazon Transcribe takes an audio file or stream that has multiple channels, transcribes each channel, and distinguishes between two different speakers (such as the agent and caller) automatically.
- Automatic redaction of PII data from output and a blocklist of words and phrases.
- Custom language models allow you to submit training data (a corpus of text data) to train custom language models that target domain-specific use cases and improve transcription accuracy. For example, you can provide Amazon Transcribe with industry-specific terms or acronyms that it might not otherwise recognize.
Custom language models are more powerful than custom vocabularies, because they can utilize a larger corpus of data, allow for tuning data, and understand individual terms as well as context. Because of the additional data and training involved, custom language models can produce significant accuracy improvements. To supercharge your accuracy, you can combine custom vocabularies with custom language models.
With these customization features, boosted the accuracy of Amazon Transcribe to specifically understand how users interact with Wix products and services. We first used a custom language model to produce transcriptions, then used custom vocabulary to replace words (as seen in the preceding examples). Then we trained with additional labeled data such as manually labeled transcriptions from real calls and knowledge base articles related to various vertical domains such as stores and payments.
Word error rate (WER) is the most common way to measure accuracy. WER counts how many words need to be changed to reach 100% accuracy. After we completed our model training with the customization features mentioned, we managed to increase the transcription accuracy (in US English) to 92% (8% WER).
92% is great, but we’re not done yet; we will continue to improve our transcription accuracy.
For sentiment analysis, we decided to develop a proprietary sentiment model that was tailored to identify sentiment regarding specific Wix features and data, and enabled custom integrations across various internal services.
Architecture overview
The following diagram illustrates our solution architecture and workflow.
We start the process by listening for events (via a webhook) of calls that are completed. For every incoming new call, we download the call in audio format (.mp3) and save it to Amazon S3 with call metadata such as user ID and job ID.
When the audio download is complete, we start an asynchronous Amazon Transcribe transcription job. We receive a response (JSON) that consists of a list where each transcribed word is defined as a row containing additional metadata. We can then aggregate sentences based on stopwords or gaps in given timestamps between words.
Amazon Transcribe can have a response time of 1–10 minutes for a call lasting 30–120 minutes. To tackle this issue, we built a service that manages the asynchronous jobs and maintains consistency and synchronization of predefined steps. For example, we define the order of what steps need to be completed in the job before others can.
After the transcription is complete and returned from Amazon Transcribe, we save it to Amazon S3 for future use, and pass it on to our sentiment analysis model for processing. The response for sentiment ranges on a scale of 0–1 (0 being positive and 1 being negative). Finally, we save and log the results for future use.
Conclusion and next steps
Going forward, we want not only to better predict the sentiment of calls and chats, but to also understand and predict the root cause. This approach of combining predictive analytics with proactive care requires innovation and is yet to be tackled at scale.
With sentiment analysis, we can detect and trigger proactive care based on negative sentiment. We can also utilize the findings to improve visibility for our product managers on how our users feel about certain products and features, including negative trends related to specific releases.
Sentiment analysis is just one example of the many use cases that we can achieve with Amazon Transcribe.
In the future, we plan to use Amazon Transcribe to understand not just how users feel, but what topics they’re talking about. This can help us reach an even greater depth of what is needed to increase user success. For example, we can determine which products and features need urgent care, how to improve our customer care interactions across channels, and how to predict and prevent escalations from even happening.
We encourage you to try Amazon Transcribe and review the Developer Guide for more details.
About the Authors
Assaf Elovic is Head of R&D at Wix, leading all customer care engineering efforts in the fields of virtual assistants, predictive analysis, and proactive care. Prior to this role, Assaf was an entrepreneur specializing in conversational user interfaces and NLP. Assaf holds a B.Sc. in Computer Science and B.A. in Economics from IDC Hertzylia.
Mykhailo Ulianchenko is an Engineering Manager at Customer Care, Wix. His teams are responsible for delivering data-driven products that help Wix to provide the best customer care service. Prior to the managerial position, Mykhailo was working as a software engineer in server, mobile and front-end areas. He is a big fan of extreme sports and Brazilian jiu-jitsu.
Vitalii Kloz is a Software Engineer at Wix.com, working on building flexible and resilient applications to automate data pipelines at Wix and, particularly, to enhance users’ experience with Wix Customer Care by providing data-driven insights using Machine Learning. Vitalii holds B.Sc in Computer Science from Kyiv University and is currently studying for M.Sc.
Yaniv Vaknin is a Machine Learning Specialist at Amazon Web Services. Prior to AWS, Yaniv held leadership positions with AI startups and Enterprise including co-founder and CEO of Dipsee.ai. Yaniv works with AWS customers to harness the power of Machine Learning to solve real world tasks and derive value. In his spare time, Yaniv enjoys playing soccer with his boys.
Gili Nachum is a senior AI/ML Specialist Solutions Architect in the AWS EMEA ML team. Gili is passionate about ML and in specific the cost and performance challenges of training deep learning models. Previously Gili was a SW architect working on Big Data, and Search.
How to approach conversation design with Amazon Lex: Building and testing (Part 3)
In parts one and two of our guide to conversation design with Amazon Lex, we discussed how to gather requirements for your conversational AI application and draft conversational flows. In this post, we help you bring all the pieces together. You’ll learn how draft an interaction model to deliver natural conversational experiences, and how to test and tune your application.
This is the final post in a series on conversational design.
|
Draft an interaction model
In the second post of this series, you identified some use cases that you wanted to automate and wrote sample interactions between a user and your application. In this post, we use these use cases to build an Amazon Lex framework, called an interaction model, but first, let’s review some important definitions.
An interaction model defines the tasks that your application can help a user accomplish in a structure that a bot can understand. In your interaction model, you define the five major components for Amazon Lex: intents, sample utterances, slot names, slot values, and slot synonyms. You can make an interaction model (also called a bot definition) in whatever format works best for you and your team, such as in a spreadsheet or directly on the Amazon Lex console.
We start by defining intents. An intent is a user’s goal—what is the user trying to accomplish when they interact with your bot? Continuing with the banking bot example, a user goal might be something like blocking a card or opening a bank account. For example, in the following dialog, the user’s intent is to set up a recurring payment.
Turn | Participant | Dialog | Notes |
1 | Amazon Lex | Thanks for contacting Your Favorite Loan Company. What can I help you with today? | |
1 | Customer | I wanna set up a recurring payment. | |
2 | Amazon Lex | Your current balance is [$200.00]. That includes an overdue balance of [$100.00]. A payment of [$200.00] is due on October 15th. To set up a recurring payment, you’ll need the routing number and the account number. Do you have that information ready? | Can customers change their debit date? Or only the account? |
2 | Customer | No | |
3 | Amazon Lex | Okay. I can wait. Let me know when you have the information. | |
3 | Customer |
… I’m ready. |
|
4 | Amazon Lex | To start, what’s the 9-digit routing number? | |
4 | Customer | 999-999-999 | |
5 | Amazon Lex | And what’s the account number? | |
5 | Customer | 1234 567 890 | |
… | … | … |
Each intent should come directly from a use case that you previously established in the earlier steps. We go into more detail about how to define intents later in this post.
Continuing on, a prompt label is a value defined by the application’s designer that maps to something the bot says.
A sample utterance is something the user says to the bot that is defined in the interaction model to help the bot classify customer intent. For example, if you’re creating an intent for opening a bank account, you’d likely want to include utterances like “open an account,” “help with opening an account,” or “How can I open a bank account?” The idea behind sample utterances is that by defining a class of utterances with similar semantic content, the bot can use these to make an educated guess about what the user’s goal is. Even if you don’t define every possible utterance (and you shouldn’t), the bot can guess what the user is trying to do.
A slot is a piece of information that the user provides in order to accomplish their goal. For example, if a customer wants to open a bank account, we need to know the type of account. We can use a slot to collect those account types, and name it something that builders will understand, like AccountType
. Slots can be either required or optional, depending on the use case. For example, you might need a required slot like BirthDate
to authenticate your user, but collect an optional slot type of AccountType
to disambiguate between the different accounts a user might have. Slot values are the pieces of information that you want the bot to recognize as a slot, like “checking” or “savings.” Synonyms are alternate ways of saying a slot value, like “ISA” or “deposit account.”
Finally, a slot corresponding utterance is an utterance that contains a slot value, but doesn’t contain an intent, such as “to my savings account” or “it’s for my savings account.” In these utterances, you can’t tell what the user is trying to accomplish without the context of the rest of the conversation, but they do contain valuable slot information that you need the bot to capture.
The bot also has some available actions, ElicitIntent
and ElicitSlot
, which mean that the bot is either trying to capture the user’s intent or the bot is looking to gather some slot information.
Now that you’ve defined the values for all these components and put all those pieces together, you’ve created the first draft of an interaction model. Here’s an example, complete with the bot’s available actions.
Turn user stories into intents and slots
From your user stories, you’ve identified the use cases that you want your application to be able to help your users fulfill, such as blocking a bank card due to fraud or opening a new credit card. Make a complete list of all use cases that you developed. Now, it’s time to work backwards from the use cases to create user intents.
Start by getting a group of people together from all different teams of your organization—business analysts, technical pros, and leadership team members should all be present. Ask each person to create a list of possible things that they might reasonably say to a human agent or to an AI application for help with their use case. For example, if your use case is to open an account, you might list things like “I’d like to open a bank account,” “Can you help me open a new account?” or “Opening a savings account.” Be flexible with what you write. Have each person write 10–20 utterances per use case. Keep in mind the variety available in human language:
- Verb variation – Open, start, begin, get started, establish
- Noun variation – Account, savings, first credit card, new customer
- Phrase or full sentence – Open account, I’d like to open an account
- Statements versus questions – I want to open an account, Can you help me with getting started with a new account?
- Implicit understanding – I’m a new customer, Help for new customers
- Tone (formal or informal) – I need some assistance with opening a new high-yield savings account, I wanna open a new card
Now, compare with each other. Combine all the utterances into a team-wide list, and organize them with the most frequent utterances first. You can use these as a head start on your sample utterances. Try to classify each utterance into a single use case. You might think that this seems easy or obvious, since you just created these utterances directly from a list of use cases, but you might be surprised by how ambiguous human language can be.
Now that you have your utterances and your use cases, decide on which ones you want to turn into intents for your bot. Again, this requires input from your team to complete successfully, but here are some basic strategies. For each use case that you created utterances for and classified utterances for, you should turn that into an intent. If you find that you’re running into lots of ambiguity and having trouble classifying utterances, you should make a judgment call with your team about how you want to handle those tricky cases. You can merge use cases into a single intent if you find that there is too much similarity between the utterances, or you can split use cases up into more fine-grained intents if a single intent has too much variety in the utterances to classify them successfully.
Another strategy for dealing with these ambiguous utterances is to use slots. If you have an assortment of similarly defined intents, like OpenACreditCard
and OpenADebitCard
, you might find that utterances like “open a card” cause confusion in the model. After all, as a human being, it’s tough to decide just from that utterance whether the card is credit or debit card without more information. You can use slots to help by defining them in the model as a required piece of information, so that the bot looks for the words “credit” or “debit” in the utterance. Then, if those slots aren’t filled, use that information to surface a disambiguation prompt like “Would you like to open a new credit card or a new debit card?” to help get the necessary information. You should keep a running list of utterances that are difficult to classify and use them for testing to see how users navigate these tricky situations.
Remember that design is an iterative process and that no single interaction model will be perfect on the first try. This is why we continue with the next steps of prototyping and testing in order to build a successful conversational application.
Prototype your design
Given the often ambiguous nature of designing a conversational AI system, prototyping your design is crucial. Prototyping is a great way to gather meaningful feedback from real users in realistic contexts. In a design prototype, you want to build a simple way to test your design and gather feedback, without investing too much time building the software, because the design isn’t even finalized yet.
Following our example from earlier, we can build out a simple prototype to evaluate our user experience and amend our design as needed. Let’s build a mini-prototype with two intents: ReportCreditCardFraudIntent
and OpenANewCreditCardAccountIntent
.
ReportCreditCardFraudIntent |
Unknown charge on my account |
I think someone stole my card |
Credit card fraud department |
Fraudulent charges on my account |
OpenANewCreditCardAccountIntent |
Open a new account |
Help with opening a credit card |
Open a credit card account |
I want to open a credit card |
Before we even build these intents on the Amazon Lex console, we can make a prototype to make sure that we’re covering the most common utterances that a user might say. One simple way to do this would be to engage a few potential end-users, provide them with a scenario like pretending their card was stolen, and have them provide a few utterances. You can match this against what you’ve outlined and collected with your team, and use this data to help enhance your design. You might find that users are very unlikely to just say “credit card” at the open menu, or you might find that it’s the most common utterance. Gathering information from a likely pool of users helps you understand your customers better to make your design more robust.
The preceding example is a quick way to test your initial designs without much code. Other examples for prototyping your design would be Wizard of Oz (where the designer plays the role of the bot opposite a user who doesn’t necessarily know they’re talking to a human) or visual prototypes to help visualize the best experience (like a video simulating a chat window).
Test and tune your bot
Now that you’ve gathered all the different elements of your design, and the experience has been built and integrated, you can start testing.
The first step is to test against the design documentation you’ve put together (the sample dialogs, conversation flows, and interaction model). Thoroughly test all the different intents, slots, slot values, paths, and error handling flows that you’ve designed, going step by step through each one. The following is an example list of things to test:
- Intent classification – Is the bot correctly predicting the intent for all utterances?
- Slot values – Is the bot correctly recognizing all the possible slot values? For example, if you’re using a slot with phone numbers over voice, does the bot recognize both “one zero zero” and “one hundred” as valid inputs?
- Error handling – Are there places in the flow where you get stuck in a loop? Does the bot correctly recover if some kind of error occurs?
- Prompts – Are the prompts eliciting the expected response? Is the wording clear and understandable for all users?
The following is a sample test plan for a call center bot that you can use to guide your own testing.
Test ID | Scenario | Steps to test | Utterance | Successful? |
Sample_100 | You notice a fraudulent charge on your account | Call number | yes | |
Sample_100 | Say “credit card fraud” | Credit card fraud | yes | |
Sample_100 | Say or enter date of birth when prompted | January 1, 1980 | yes |
After testing, you may find that your bot requires some tuning. Go through your interaction model and add in any commonly missed utterances, new intents or slots, or change the wording in problematic prompts that are losing users along the way. This is a great place to explore an automated testing framework to expedite the testing process, but manual testing offers different insights about the user experience that can help alert you to any usability defects before launch.
Finally, you should also provide your users with a way to test what you’ve built against the business requirements that you defined in part one of this series. You need to make sure that before you launch your application to production it handles all customer requests and fulfills the business requirements that you received. Before beginning user testing, define the test plan with all stakeholders so it’s clear to everyone on the team how you define success. Make sure that at this point, you’ve developed your application in an environment that is as close as possible to the production environment, so that any feedback from this testing provides insight for production. Provide testers with the test plan and clearly document the results, so that it’s easy to use the data from testing to make decisions about how best to move forward.
After you’ve launched your application, the work isn’t done! Design is an iterative experience and continually requires fresh perspectives to improve. As part of the business requirements, you should define how you’ll monitor the health of the system in order to identify issues, such as missed utterances. For example, you might want to explore an analytics framework dashboard or a business intelligence dashboard to help spot gaps in utterance coverage or places where users exit early. Use this information to improve your interaction model, test the new design, and ultimately, tune your application.
Conclusion
In this series, we covered all the important basics for creating a great conversational experience using Amazon Lex. We encourage you to test and iterate through your design multiple times to ensure the best possible customer experience. Keeping these best practices in mind, we hope you explore all the different and creative ways that humans interface with the technology around us.
And remember that we at AWS Professional Services and our extensive AWS Partner Network are available to help you and your team through the process. Whether you’re only in need of consultation and advice, or whether you need full access to a designer, our goal is to help you achieve the best conversational interface for you and your customers.
About the Authors
Nancy Clarke is a Conversation Designer with the AWS Professional Services Natural Language AI team. When she’s not at her desk, you’ll find her gardening, hiking, or re-reading the Lord of the Rings for the billionth time.
Rosie Connolly is a Conversation Designer with the AWS Professional Services Natural Language AI team. A linguist by training, she has worked with language in some form for over 15 years. When she’s not working with customers, she enjoys running, reading, and dreaming of her future on American Ninja Warrior.
Claire Mitchell is a Design Strategy Lead with the AWS Professional Services AWS Professional Services Emerging Technologies Intelligence Practice—Solutions team. Occasionally she spends time exploring speculative design practices, textiles, and playing the drums.
Deploying ML models using SageMaker Serverless Inference (Preview)
Amazon SageMaker Serverless Inference (Preview) was recently announced at re:Invent 2021 as a new model hosting feature that lets customers serve model predictions without having to explicitly provision compute instances or configure scaling policies to handle traffic variations. Serverless Inference is a new deployment capability that complements SageMaker’s existing options for deployment that include: SageMaker Real-Time Inference for workloads with low latency requirements in the order of milliseconds, SageMaker Batch Transform to run predictions on batches of data, and SageMaker Asynchronous Inference for inferences with large payload sizes or requiring long processing times.
Serverless Inference means that you don’t need to configure and manage the underlying infrastructure hosting your models. When you host your model on a Serverless Inference endpoint, simply select the memory and max concurrent invocations. Then, SageMaker will automatically provision, scale, and terminate compute capacity based on the inference request volume. SageMaker Serverless Inference also means that you only pay for the duration of running the inference code and the amount of data processed, not for idle time. Moreover, you can scale to zero to optimize your inference costs.
Serverless Inference is a great choice for customers that have intermittent or unpredictable prediction traffic. For example, a document processing service used to extract and analyze data on a periodic basis. Customers that choose Serverless Inference should make sure that their workloads can tolerate cold starts. A cold start can occur when your endpoint doesn’t receive traffic for a period of time. It can also occur when your concurrent requests exceed the current request usage. The cold start time will depend on your model size, how long it takes to download, and your container startup time.
Let’s look at how it works from a high level view.
How it works
A Serverless Inference endpoint can be setup using the AWS Management Console, any standard AWS SDKs, or the AWS CLI. Because Serverless Inference uses the same APIs as SageMaker Hosting persistent endpoints to configure and deploy endpoints, the steps to create a Serverless Inference endpoint are identical. The only modification required is changes to configuration parameters that are setup on your endpoint configuration.
To create a Serverless Inference endpoint, you perform three basic steps:
Step 1: Create a SageMaker Model that packages your model artifacts for deployment on SageMaker using the CreateModel API. This step can also be done via AWS CloudFormation using the AWS::SageMaker::Model resource.
Step 2: Create an endpoint configuration using the CreateEndpointConfig API and the new configuration ServerlessConfig options, or selecting the serverless option in the AWS Management Console as shown in the following image. Note that this step can also be done via AWS CloudFormation using the AWS::SageMaker::EndpointConfig resource. You must specify the Memory Size which, at a minimum, should be as big as your runtime model object, and the Max Concurrency, which represents the max concurrent invocations for a single endpoint.
Step 3: Finally, using the endpoint configuration that you created in Step 2, create your endpoint using either the AWS Management Console, or programmatically using the CreateEndpoint API. This step can also be done via AWS CloudFormation using the AWS::SageMaker::Endpoint resource.
That’s it! Then, SageMaker creates an HTTPS URL that you can use to invoke your endpoint through your client applications using the existing runtime client and the invoke_endpoint request.
Deep Dive
Next, we’ll dive deeper into the high-level steps above by showcasing a detailed how-to for creating a new SageMaker Serverless Inference endpoint.
Setup and training
For preview the following regions are supported so make sure to create a SageMaker Notebook Instance or SageMaker Studio Notebook in one of these regions: us-east-1, us-east-2, us-west-2, eu-west-1, ap-northeast-1, and ap-southeast-2. For this example, we’ll be using the Amazon provided XGBoost Algorithm to solve a regression problem with the Abalone dataset. The notebook code can be found in the sagemaker-examples repository.
First, we must setup the appropriate SDK clients and retrieve the public dataset for model training. Note that an upgrade to the SDK may be required if you are running on an older version.
After setting up the clients and downloading the data that will be used to train the model, we can now prepare for model training using SageMaker Training Jobs. In the following, we are performing the steps to configure and fit our model that will be deployed to a serverless endpoint.
Model creation
Next, we must package our model for deployment on SageMaker. For the Model Creation step, we need two parameters: Image and ModelDataUrl.
Image points to the container image for inference. Because we are using a SageMaker managed container, we retrieved this for training under the variable image_uri, and we can use the same image for inference. If you are bringing your own custom container, then you must supply your own container image that is compatible for hosting on SageMaker as you would today for hosting a SageMaker Hosting persistent endpoint.
ModelDataUrl points to the Amazon Simple Storage Service (S3) URL for the trained model artifact that we will pull from the training estimator.
We can now use our created model to work with creating an Endpoint Configuration, which is where you will add a serverless configuration.
Endpoint configuration creation
Up until now, the steps look identical to if you were deploying a SageMaker Hosting endpoint. This next step is the same. However, you’ll take advantage of a new serverless configuration option in your endpoint configuration. There are two inputs required, and they can be configured to meet your use case:
- MaxConcurrency: This can be set from 1 to 50.
- Memory Size: This can be the following values: 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB.
The configuration above indicates that this endpoint should be deployed as a serverless endpoint because we specified configuration options in ServerlessConfig.
Endpoint creation and invocation
Next, we use the Endpoint Configuration to create our endpoint using the create_endpoint function.
The following step should take a few minutes to deploy successfully.
The created endpoint should display the Serverless Configuration that you provided in the previous step.
Now we can invoke the endpoint with a sample data point from the Abalone dataset.
Monitoring
Serverless Inference emits metrics to Amazon CloudWatch. These metrics include the metrics that are emitted for SageMaker Hosting persistent endpoints, such as MemoryUtilization and Invocations, as well as a new metric called ModelSetupTime. This new metric tracks the time that it takes to launch new compute resources for your serverless endpoint.
Conclusion
In this post, we covered the high level steps for using Serverless Inference, as well as a deep dive on a specific example to help you get started with the new feature using the example provided in SageMaker examples on GitHub. Serverless Inference is currently launched in preview, so we don’t yet recommend it for production workloads. There are some features that Serverless Inference doesn’t support yet, such as SageMaker Model Monitor, Multi-Model Endpoints, and Serial Inference Pipelines.
Please check out the Feature Exclusions portion of the documentation for additional information. The SageMaker Serverless Inference Documentation is also a great resource for diving deeper into Serverless Inference capabilities, and we’re excited to start getting customer feedback!
About the Authors
Ram Vegiraju is a ML Architect with the SageMaker Service team. He focuses on helping customers build and optimize their AI/ML solutions on Amazon SageMaker. In his spare time, he loves traveling and writing.
Shelbee Eigenbrode is a Principal AI and Machine Learning Specialist Solutions Architect at Amazon Web Services (AWS). She holds six AWS certifications and has been in technology for 23 years spanning multiple industries, technologies, and roles. She is currently focusing on combining her DevOps and ML background to deliver and manage ML workloads at scale. With over 35 patents granted across various technology domains, she has a passion for continuous innovation and using data to drive business outcomes. Shelbee co-founded the Denver chapter of Women in Big Data.
Michael Pham is a Software Development Engineer in the Amazon SageMaker team. His current work focuses on helping developers efficiently host machine learning models. In his spare time he enjoys Olympic weightlifting, reading, and playing chess.
Rishabh Ray Chaudhury is a Senior Product Manager with Amazon SageMaker, focusing on Machine Learning inference. He is passionate about innovating and building new experiences for Machine Learning customers on AWS to help scale their workloads. In his spare time, he enjoys traveling and cooking. You can find him on LinkedIn.
Build and visualize a real-time fraud prevention system using Amazon Fraud Detector
We’re living in a world of everything-as-an-online-service. Service providers from almost every industry are in the race to feature the best user experience for their online channels like web portals and mobile applications. This raises a new challenge. How do we stop illegal and fraudulent behaviors without impacting typical legitimate interactions? This challenge is even greater for organizations that offer paid services. These organizations need to validate payment transactions against fraudulent behaviors in your customer-facing applications. Although subsequent checks are performed by financial entities such as card networks and banks that run the payment transaction, the service providers remain responsible for the end-to-end payment process.
Organizations from all around the world have long implemented rule-based fraud detection systems. The following is an example of a sample rule:
Although these systems are easy to implement, they’re not scalable for everyday new fraud trends, because fraudsters are constantly looking for new loopholes to exploit and ways to hijack those static rules. As a result, new rules must be added every day. This can lead to thousands of rules, making the system difficult to maintain.
More advanced ways are needed to detect and stop losses from fraud that may be damaging organizations’ revenue and brand reputation. In this post, we discuss how to create a real-time fraud prevention system using Amazon Fraud Detector.
Solution overview
Emerging technologies like AI and machine learning (ML) can provide a solution that shifts from enforcing rule-based validations to using validations based on learning from examples and trends directly found in the transaction data. By specifying the key features that may contribute to fraudulent behavior, such as customer-related information (card number, email, IP address, and location) and transaction-related information (time, amount, and currency). An ML model can utilize statistical algorithms to identify trends such as the customer’s frequency of purchases, spending patterns, points of interest, and how long their account has been active.
AWS offers AI and ML services to help you achieve this. Amazon Fraud Detector is a scalable, fully managed service that makes it easy to use ML to detect online fraud in real time. It helps you build, deploy, and manage fraud detection models that can also combine ML and rules to ensure successful onboarding for your existing rules that can effectively stop fraudulent scenarios.
Although Amazon Fraud Detector helps you detect fraudulent behaviors, we still need to make sure this is happening without impacting legitimate interactions. To do so, we need two additional components to reduce the processing latency and handle failures: an event store and event processor.
The first component that we need to introduce is an event store to centrally manage and exchange event messages. Apache Kafka is a scalable, durable, and highly available event store for mission-critical applications. It’s designed to support high throughput of thousands of messages per second while providing milliseconds latency. It also decouples the transaction’s producers from consumers by buffering the data so that each consumer can consume the data at their own pace. This is useful if we experience a sudden increase in traffic. For example, let’s assume that on average, your website has tens of payment transactions per second. Then you release a new product that becomes very popular. You start having thousands of checkouts per second. If you’re not using a buffer like Apache Kafka, this traffic spike can overwhelm your backend applications, and potentially lead to downtime.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a low-cost, fully managed Apache Kafka service that we use as a temporary durable store for our payment transactions
The second component that we need is a mission-critical stream processor, that we can use to apply fraud detection logic in real time within the E2E payment transaction journey. This stream processor must be scalable to deal with massive amounts of transactions, and reliable to process transactions with a very low latency, while being able to gracefully recover from a failure as if the failure had never happened.
Apache Flink is a popular open-source framework and distributed processing engine for transforming and analyzing data streams in real time. Apache Flink has been designed to perform computations at in-memory speed and at scale. Applications can run continuously with minimal downtime; it uses a recovery mechanism that is based on consistent checkpoints of an application’s state. In case of a failure, the application is restarted and its state is loaded from the latest checkpoint. Furthermore, Apache Flink provides a powerful API to transform, aggregate, and enrich events, and supports exactly-once semantics. Therefore, Apache Flink is a great fit for our stream processing requirements.
Amazon Kinesis Data Analytics is a fully managed service that provides the underlying infrastructure for your Apache Flink applications. It enables you to quickly build and run those applications with low operational overhead. For our solution, we use it to consume payment transactions stored in Amazon MSK and coordinate with Amazon Fraud Detector to detect the fraudulent transactions in real time.
Solution details
The solution in this post provides two use cases that are built on top of the Transaction Fraud Insights model created in the post Detect online transaction fraud with Amazon Fraud Detector.
The first use case demonstrates fraud prevention by identifying fraudulent transactions, flagging them to be blocked, and sending an alert notification. The second, writes all transactions in real time to Amazon OpenSearch Service (successor to Amazon Elasticsearch Service), this enables real-time transaction reporting using OpenSearch Dashboards.
The following architecture diagram illustrates the overall flow.
In the following subsections, we provide details about each step in the architecture and the two use cases. The steps are as follows:
- Schedule the transactions producer.
- Generate payment transactions.
- Process the input transactions.
- Get fraud predictions.
- Sink the fraud outcome.
- Send email notifications.
- Visualize real time dashboard.
In subsequent sections, we walk through the steps to deploy the solution with AWS CloudFormation, enable the solution, and visualize the data in OpenSearch Dashboards.
Schedule the transactions producer
The transaction producer runs as an AWS Lambda function. The function is scheduled to run every minute using an Amazon EventBridge rule.
Generate payment transactions
We use a Lambda function that generates synthetic transactions. Each transaction is defined by two sets of data: entities and events.
An entity represents who is performing the transaction such as customer’s details. To enhance the accuracy of the fraud detection model, we use a reference dataset that contains entities used earlier while training the model.
An event represents the transaction-related metrics such as amount and currency. For this, we use faker and random Python libraries.
Each transaction is written into an input Amazon MSK topic called transactions
. The following is a sample transaction record:
Process the input transactions
To process the payment transactions in real time, Apache Flink provides the Table API, which allows intuitive processing using relational operators such as selection, filter, and join. For this post, we use the PyFlink Table API running as a Kinesis data analytics application.
The application does the following:
- Reads the transactions from the input topic
transactions
. - Calls Amazon Fraud Detector APIs to get fraud predictions.
- Writes the results to an output topic on the same MSK cluster.
To read data from and write data into an Amazon MSK topics, we use the out-of-the-box Kafka connector provided by Apache Flink.
Get fraud predictions
The Kinesis data analytics application calls the Amazon Fraud Detector GetEventPrediction API to get the predictions in real time. Because this is considered a custom logic, we use Python user-defined functions (UDFs) to call this API.
For detection, we use a Transaction Fraud Insights model that uses feature engineering to dynamically calculate information about your customers, such as their frequency of purchases, spending patterns, and how long their account has been active. Those aggregates are calculated during training and inference. Because Amazon Fraud Detector aggregates data on entities, it’s useful if the inference data contain entities that are already known to the model. This is because in the online transactions’ context, models indicate lower fraud risk for entities with a high number of legitimate transactions.
Apart from that, to improve model accuracy in production, typically, we frequently retrain the model with a more recent dataset. By default, Amazon Fraud Detector automatically stores event data when you generate predictions. These events are available for future model trainings. We then deploy a new detector version from the newly trained model. This new detector version can be published and become the active version, and therefore all requests to GetEventPrediction
API go to this new version. To avoid any downtime in our Kinesis data analytics application, we don’t specify a detector version in our call. When the version is not specified, the detector’s active version is used. This allows us to change the detector version while being fully transparent from our Kinesis data analytics application.
Sink the fraud outcome
The Kinesis data analytics application writes the output containing the transaction outcome (fraud prediction) into an output Amazon MSK topic called processed_transactions
. Writing the output back to Kafka gives us the benefits we discussed earlier. Moreover, it enables us to consume the same output by different use cases concurrently.
Apache Flink supports different guarantee models: exactly-once, at-most-once, and at-least-once. In our solution, we use Flink’s Kafka sink connector to sink the results to the output topic. This connector supports at-least-once (default) or exactly-once. For this post, we use at-least-once, but you can easily enable exactly-once using the connector options. However, setting the consistency guarantees to exactly-once has an impact on latency because Flink uses two-phase commits and Kafka transactions to guarantee exactly-once. For more information, see An Overview of End-to-End Exactly-Once Processing in Apache Flink.
Send email notifications
To notify downstream services about suspicious transactions, the solution uses a Lambda function to consume records from the processed_transactions
topic. The function evaluates the outcome of each transaction and if the outcome is block, it triggers an Amazon Simple Notification Service (Amazon SNS) notification to notify you by email.
Visualize real-time dashboard
To power real-time dashboards, the solution uses Kafka Connect to sink the data in real time to an Amazon OpenSearch Service domain. This makes the data available for visualization as soon as it is indexed in OpenSearch. Kafka Connect is a scalable and reliable framework to stream data between a Kafka cluster and external systems such as databases, Amazon Simple Storage Service (Amazon S3), and OpenSearch.
Amazon MSK Connect, is a feature of Amazon MSK, enables you to run fully managed Apache Kafka Connect workloads on AWS. MSK Connect is fully compatible with Kafka Connect, enabling you to lift and shift your Kafka Connect applications with zero code changes.
The connector used simply creates an index in Amazon OpenSearch Service with the same name as the output topic in Amazon MSK. If throughput is very high, you need to roll over your indices periodically to stay within the recommended shard size (10–50 GB). Alternatively, you can write the data into an OpenSearch data stream by creating an index template and then configuring the connector to use it. Data streams simplify this process and enforce a setup that best suits append-only time-series data. Because our use case doesn’t have the volume you would normally get with time-series data, we write the output to an index instead.
Each event is indexed into a different document in OpenSearch. The document ID is set to topic+partition+offset
. Therefore, if the same Kafka record is written twice to OpenSearch, the same document will be updated because the document ID will have the same offset. This ensures exactly-once delivery.
Prerequisites
The solution builds on top of the post Detect online transaction fraud with new Amazon Fraud Detector features. We use the same schema as the sample dataset used in the post.
The solution code is available in our GitHub repo. Before proceeding, complete the following prerequisites:
- Create a Transaction Fraud Insights model and a publish a detector as per the steps in that post.
- Follow the instruction on GitHub to package and upload the solution artifacts to an Amazon S3 bucket. The newly created S3 bucket should have 4 artifacts,
- Lambda functions code –
lambda-functions.zip
- Flink code –
RealTimeFraudPrevention.zip
- Kafka connector –
confluentinc-kafka-connect-elasticsearch-11.1.3.zip
- Pre-created OpenSearch dashboard NDJSON file –
dashboard.ndjson
- Lambda functions code –
Deploy the solution using AWS CloudFormation
You use CloudFormation templates to create all the necessary resources for the data pipeline. Complete the following steps:
- Choose Launch Stack and navigate to the Region where the Amazon Fraud Detector model is deployed.
- Choose Next.
- For Stack name, enter a name for your stack. The stack name must satisfy the regular expression pattern: [a-z][a-z0-9-]+ and must be fewer than 15 characters long. The default is fraud-prevention.
- Enter the following parameters:
- For BucketName, enter the bucket name where the solution artifacts are stored.
- For S3SourceCodePath, enter the S3 key for the Lambda functions .zip file, the default is
lambda-functions.zip
- For S3connectorPath, enter the S3 key for the Kafka connector .zip file, the default is
confluentinc-kafka-connect-elasticsearch-11.1.6.zip
- For YourEmail, enter the email that receives Amazon SNS notifications.
- For KafkaInputTopic, enter the input topic name, the default is
transactions
- For KafkaOutputTopic, enter the output topic name. We recommend keeping the default value because we use it later in the pre-created OpenSearch dashboard, the default is
processed_transactions
- For FraudDetectorName, enter the detector name, the default is
transaction_fraud_detector
- For FraudDetectorEventName, enter the Amazon Fraud Detector event resource name, the default is
transaction_event
- For FraudDetectorEntityType, enter the Amazon Fraud Detector entity type resource name, the default is
customer
- For OpenSearchMasterUsername, enter the username of the OpenSearch Service domain, the default is
admin
- For OpenSearchMasterPassword, enter the password of the OpenSearch Service domain. The password must meet the following requirements:
- Minimum 8 characters long.
- Contains at least one uppercase letter, one lowercase letter, one digit, and one special character.
- Follow the wizard to create the stack.
Enable the solution
After the stack is successfully created, you can see that the status of the MSK cluster is Updating. The reason for this is that we used a custom resource in the CloudFormation template to change the configuration of the MSK cluster. For the purpose of this post, we set the auto.create.topics.enable to true
. This setting enables automatic creation of topics on the server.
After the status of the MSK cluster changes to Active
, complete the following steps to enable the solution:
- On the AWS Cloud9 console, you should see an AWS Cloud9 environment provisioned by the CloudFormation template.
- Choose Open IDE.
- On the AWS CloudFormation console, navigate to the stack you deployed and choose the Outputs tab.
- Copy the value of the
EnableEventRule
key and run it in your AWS Cloud9 terminal. It should follow the following format: - Go back to the CloudFormation stack Outputs tab and copy the value of the
EnableEventSourceMapping
key and run it in your AWS Cloud9 terminal. It should follow the following format:
Visualize the data in OpenSearch Dashboards
Now that that data is flowing through the system, we can create a simple dashboard to visualize this data in real time. To save you development time and effort, we pre-created a sample dashboard that you can import directly into OpenSearch Dashboards. The dashboard file creates all the necessary objects required by the dashboard, including index patterns, visuals, and the dashboard.
The pre-created template uses an OpenSearch index pattern of processed_transactions
*, which is the same prefix as the default Amazon MSK output topic name. Complete the following steps to import the dashboards:
- On the AWS CloudFormation console, navigate to the stack you deployed and choose the Outputs tab.
- Take note of the OpenSearch dashboard link including the trailing
/_dashboards
.
- In the AWS Cloud9 terminal, download
dashboard.ndjson
(the Amazon OpenSearch Service dashboard object NDJSON file): - Use curl to run the following command to generate the appropriate authorization cookies needed to import the dashboards:
- Run the following command to import all objects defined in the NDJSON file:
Now the dashboard is immediately available in OpenSearch Dashboards. However, because the Amazon OpenSearch Service domain is provisioned in a private VPC, you must have VPN access to the VPC or use a bastion host be able to access OpenSearch Dashboards..
- Follow the instruction on GitHub to access OpenSearch Dashboards.
- After logging in to OpenSearch you will find a new sample fraud detection dashboard, which is updated in real time.
You’ve now created a sample dashboard.
Clean up
To clean up after using this solution, complete the following steps:
- Stop and delete the EC2 bastion instance.
- Delete the CloudFormation stack.
- Delete the detector.
Conclusion
In this post, we showcased a simple, cost-effective, and efficient solution to detect and stop fraud. The solution uses open-source frameworks and tools like Apache Kafka, Apache Flink, and OpenSearch coupled with ML-based fraud detection mechanism using Amazon Fraud Detector. The solution is designed to process transactions (and identify fraud) in the range of milliseconds, and therefore has no negative impact on the experience of legitimate customers.
You can integrate this solution with your current transaction processing application to protect revenue losses that occur from fraud. This can be achieved by modifying the source code available on GitHub to replace the Lambda producer and consumer with your own application microservices.
About the Authors
Ahmed Zamzam is a Specialist Solutions Architect for Analytics AWS. He supports SMB customers in the UK in their digital transformation and cloud journey to AWS, and specializes in streaming and search. Outside of work, he loves traveling, playing tennis, and cycling.
Karim Hammouda is a Specialist Solutions Architect for Analytics at AWS with a passion for data integration, data analysis, and BI. He works with AWS customers to design and build analytics solutions that contribute to their business growth. In his free time, he likes to watch TV documentaries and play video games with his son.
Using NLU labels to improve an ASR rescoring model
Second-pass language models that rescore automatic-speech-recognition hypotheses benefit from multitask training on natural-language-understanding objectives.Read More
Take advantage of advanced deployment strategies using Amazon SageMaker deployment guardrails
Deployment guardrails in Amazon SageMaker provide a new set of deployment capabilities allowing you to implement advanced deployment strategies that minimize risk when deploying new model versions on SageMaker hosting. Depending on your use case, you can use a variety of deployment strategies to release new model versions. Each of these strategies relies on a mechanism to shift inference traffic to one or more versions of a deployed model. The chosen strategy depends on your business requirements for your machine learning (ML) use case. However, any strategy should include the ability to monitor the performance of new model versions and automatically roll back to a previous version as needed to minimize potential risk of introducing a new model version with errors. Deployment guardrails offer new advanced deployment capabilities and as of this writing supports two new traffic shifting policies, canary and linear, as well as the ability to automatically roll back when issues are detected.
As part of your MLOps strategy to create repeatable and reliable mechanisms to deploy your models, you should also ensure that the chosen deployment strategy is implemented as part of your automated deployment pipeline. Deployment guardrails use the existing SageMaker CreateEndpoint and UpdateEndpoint APIs, so you can modify your existing deployment pipeline configurations to take advantage of the new deployment capabilities.
In this post, we show you how to use the new deployment guardrail capabilities to deploy your model versions using both a canary and linear deployment strategy.
Solution overview
Amazon SageMaker inference provides managed deployment strategies for testing new versions of your models in production. We cover two new traffic shifting policies in this post: canary and linear. For each of these traffic shifting modes, two HTTPS endpoints are provisioned. Two endpoints are provisioned to reduce deployment risk as traffic is shifted from the original endpoint variant to the new endpoint variant. You configure the endpoints to contain one or more compute instances to deploy your trained model and perform inference requests. SageMaker manages the routing of traffic between the two endpoints. You define Amazon CloudWatch metrics and alarms to monitor metrics on the new endpoint, when traffic is shifted, for a set baking period. If a CloudWatch alarm is triggered, SageMaker performs an auto-rollback to route all traffic to the original endpoint variant. If no CloudWatch alarms are triggered, the original endpoint variant is stopped and the new endpoint variant continues to receive all traffic. The following diagrams illustrate shifting traffic to the new endpoint.
Let’s dive deeper into examples of the canary and linear traffic shifting policies.
We go over the following high-level steps as part of the deployment procedure:
- Create the model and endpoint configurations required for the three scenarios: the baseline, the update containing the incompatible model version, and the update with the correct model version.
- Invoke the baseline endpoint prior to the update.
- Specify the CloudWatch alarms used to trigger the rollbacks.
- Update the endpoint to trigger a rollback using either the canary or linear strategy.
First, let’s start with canary deployment.
Canary deployment
The canary deployment option lets you shift one small portion of your traffic (a canary) to the green fleet and monitor it for a baking period. If the canary succeeds on the green fleet, the rest of the traffic is shifted from the blue fleet to the green fleet before stopping the blue fleet.
To demonstrate canary deployments and the auto-rollback feature, we update an endpoint with an incompatible model version and deploy it as a canary fleet, taking a small percentage of the traffic. Requests sent to this canary fleet result in errors, which trigger a rollback using preconfigured CloudWatch alarms. We also demonstrate a success scenario where no alarms are tripped and the update succeeds.
Create and deploy the models
First, we upload our pre-trained models to Amazon Simple Storage Service (Amazon S3). These models were trained using the XGBoost churn prediction notebook in SageMaker. You can also use your own pre-trained models in this step. If you already have a pre-trained model in Amazon S3, you can add it by specifying the s3_key
.
The models in this example are used to predict the probability of a mobile customer leaving their current mobile operator. The dataset we use is publicly available and was mentioned in the book Discovering Knowledge in Data by Daniel T. Larose.
Upload the models with the following code:
Next, we create our model definitions. We start with deploying the pre-trained churn prediction models. Here, we create the model objects with the image and model data. The three URIs correspond to the baseline version, the update containing the incompatible version, and the update containing the correct model version:
Now that the three models are created, we create the three endpoint configs:
We then deploy the baseline model to a SageMaker endpoint:
Invoke the endpoint
This step invokes the endpoint with sample data with a maximum invocations count and waiting intervals. See the following code:
For a full list of metrics, see Monitor Amazon SageMaker with Amazon CloudWatch.
Then we plot graphs to show the metrics Invocations
, Invocation4XXErrors
, Invocation5XXErrors
, ModelLatency
, and OverheadLatency
against the endpoint over time.
You can observe a flat line for Invocation4XXErrors
and Invocation5XXErrors
because we’re using the correct version model version and configs. Additionally, ModelLatency
and OverheadLatency
start decreasing over time.
Create CloudWatch alarms to monitor endpoint performance
We create CloudWatch alarms to monitor endpoint performance with the metrics Invocation5XXErrors
and ModelLatency
.
We use metric dimensions EndpointName
and VariantName
to select the metric for each endpoint config and variant. See the following code:
Update the endpoint with deployment configurations
We define the following deployment configuration to perform a blue/green update strategy with canary traffic shifting from the old to the new stack. The canary traffic shifting option can reduce the blast ratio of a regressive update to the endpoint. In contrast, for the all-at-once traffic shifting option, the invocation requests start faulting at 100% after flipping the traffic. In canary mode, invocation requests are shifted to the new version of model gradually, preventing errors from impacting 100% of the traffic. Additionally, the auto-rollback alarms monitor the metrics during the canary stage.
The following diagram illustrates the workflow of our rollback use case.
We update the endpoint with an incompatible model version to simulate errors and trigger a rollback:
When we invoke the endpoint, we encounter errors because of the incompatible version of the model (ep_config_name2
), and this leads to the rollback to a stable version of the model (ep_config_name1
). This is reflected in the following graphs as Invocation5XXErrors
and ModelLatency
increase during this rollback phase.
The following diagram shows a success case where we use the same canary deployment configuration but a valid endpoint configuration.
We update the endpoint configuration to a valid version (using the same canary deployment config as the rollback case):
We plot graphs to show the Invocations
, Invocation5XXErrors
, and ModelLatency
metrics against the endpoint. When the new endpoint config-3
(correct model version) starts getting deployed, it takes over endpoint config-2
(incompatible due to model version) without any errors. We can see this in the graphs as Invocation5XXErrors
and ModelLatency
decrease during this transition phase.
Next, let’s see how linear deployments are configured and how it works.
Linear deployment
The linear deployment option provides even more customization over how many traffic-shifting steps to make and what percentage of traffic to shift for each step. Whereas canary shifting lets you shift traffic in two steps, linear shifting extends this to n linearly spaced steps.
To demonstrate linear deployments and the auto-rollback feature, we update an endpoint with an incompatible model version and deploy it as a linear fleet, taking a small percentage of the traffic. Requests sent to this linear fleet result in errors, which triggers a rollback using preconfigured CloudWatch alarms. We also demonstrate a success scenario where no alarms are tripped and the update succeeds.
The steps to create the models, invoke the endpoint, and create the CloudWatch alarms are the same as with the canary method.
We define the following deployment configuration to perform a blue/green update strategy with linear traffic shifting from old to new stack. The linear traffic shifting option can reduce the blast ratio of a regressive update to the endpoint. In contrast, for the all-at-once traffic shifting option, the invocation requests start faulting at 100% after flipping the traffic. In linear mode, invocation requests are shifted to the new version of the model gradually, with a controlled percentage of traffic shifting for each step. You can use the auto-rollback alarms to monitor the metrics during the linear traffic shifting stage.
The following diagram shows the workflow for our linear rollback case.
We update the endpoint with an incompatible model version to simulate errors and trigger a rollback:
When we invoke the endpoint, we encounter errors because of the incompatible version of the model (ep_config_name2
), which leads to the rollback to a stable version of the model (ep_config_name1
). We can see this in the following graphs as the Invocation5XXErrors
and ModelLatency metrics increase during this rollback phase.
Let’s look at a success case where we use the same linear deployment configuration but a valid endpoint configuration. The following diagram illustrates our workflow.
We update the endpoint to a valid endpoint configuration version with the same linear deployment configuration:
Then we plot graphs to show the Invocations
, Invocation5XXErrors
, and ModelLatency
metrics against the endpoint.
As the new endpoint config-3
(correct model version) starts getting deployed, it takes over endpoint config-2
(incompatible due to model version) without any errors. We can see this in the following graphs as Invocation5XXErrors
and ModelLatency
decrease during this transition phase.
Considerations and best practices
Now that we’ve walked through a comprehensive example, let’s recap some best practices and considerations:
-
Pick the right health check – The CloudWatch alarms determine whether the traffic shift to the new endpoint variant succeeds. In our example, we used
Invocation5XXErrors
(caused by the endpoint failing to return a valid result) andModelLatency
, which measure how long the model takes to return a response. You can consider other built-in metrics in some cases, likeOverheadLatency
, which accounts for other causes of latency, such as unusually large response payloads. You can also have your inference code record custom metrics, and you can configure the alarm measurement evaluation interval. For more information about available metrics, see SageMaker Endpoint Invocation Metrics. - Pick the most suitable traffic shifting policy – The all-at-once policy is a good choice if you just want to make sure that the new endpoint variant is healthy and able to serve traffic. The canary policy is useful if you want to avoid affecting too much traffic if the new endpoint variant has a problem, or if you want to evaluate a custom metric on a small percentage of traffic before shifting over. For example, perhaps you want to emit a custom metric that checks for inference response distribution, and make sure it falls within expected ranges. The linear policy is a more conservative and more complex take on the canary pattern.
- Monitor the alarms – The alarms you use to trigger rollback should also cause other actions, like notifying an operations team.
- Use the same deployment strategy in multiple environments – As part of an overall MLOps pipeline, use the same deployment strategy in test as well as production environments, so that you become comfortable with the behavior. This consideration implies that you can inject realistic load onto your test endpoints.
Conclusion
In this post, we introduced SageMaker inference’s new deployment guardrail options, which let you manage deployment of a new model version in a safe and controlled way. We reviewed the new traffic shifting policies, canary and linear, and showed how to use them in a realistic example. Finally, we discussed some best practices and considerations. Get started today with deployment guardrails on the SageMaker console or, for more information, review Deployment Guardrails.
About the Authors
Raghu Ramesha is an ML Solutions Architect with the Amazon SageMaker Services SA team. He focuses on helping customers migrate ML production workloads to SageMaker at scale. He specializes in machine learning, AI, and computer vision domains, and holds a master’s degree in Computer Science from UT Dallas. In his free time, he enjoys traveling and photography.
Shelbee Eigenbrode is a Principal AI and Machine Learning Specialist Solutions Architect at Amazon Web Services (AWS). She has been in technology for 24 years spanning multiple industries, technologies, and roles. She is currently focusing on combining her DevOps and ML background into the domain of MLOps to help customers deliver and manage ML workloads at scale. With over 35 patents granted across various technology domains, she has a passion for continuous innovation and using data to drive business outcomes. Shelbee is a co-creator and instructor of the Practical Data Science specialization on Coursera. She is also the Co-Director of Women In Big Data (WiBD), Denver chapter. In her spare time, she likes to spend time with her family, friends, and overactive dogs.
Randy DeFauw is a Principal Solutions Architect. He’s an electrical engineer by training who’s been working in technology for 23 years at companies ranging from startups to large defense firms. A fascination with distributed consensus systems led him into the big data space, where he discovered a passion for analytics and machine learning. He started using AWS in his Hadoop days, where he saw how easy it was to set up large complex infrastructure, and then realized that the cloud solved some of the challenges he saw with Hadoop. Randy picked up an MBA so he could learn how business leaders think and talk, and found that the soft skill classes were some of the most interesting ones he took. Lately, he’s been dabbling with reinforcement learning as a way to tackle optimization problems, and re-reading Martin Kleppmann’s book on data intensive design.
Lauren Mullennex is a Solutions Architect based in Denver, CO. She works with customers to help them architect solutions on AWS. In her spare time, she enjoys hiking and cooking Hawaiian cuisine.
Train graph neural nets for millions of proteins on Amazon SageMaker and Amazon DocumentDB (with MongoDB compatibility)
There are over 180,000 unique proteins with 3D structures determined, with tens of thousands new structures resolved every year. This is only a small fraction of the 200 million known proteins with distinctive sequences. Recent deep learning algorithms such as AlphaFold can accurately predict 3D structures of proteins using their sequences, which help scale the protein 3D structure data to the millions. Graph neural network (GNN) has emerged as an effective deep learning approach to extract information from protein structures, which can be represented by graphs of amino acid residues. Individual protein graphs usually contain a few hundred nodes, which is manageable in size. Tens of thousands of protein graphs can be easily stored in serialized data structures such as TFrecord for training GNNs. However, training GNN on millions of protein structures is challenging. Data serialization isn’t scalable to millions of protein structures because it requires loading the entire terabyte-scale dataset into memory.
In this post, we introduce a scalable deep learning solution that allows you to train GNNs on millions of proteins stored in Amazon DocumentDB (with MongoDB compatibility) using Amazon SageMaker.
For illustrative purposes, we use publicly available experimentally determined protein structures from the Protein Data Bank and computationally predicted protein structures from the AlphaFold Protein Structure Database. The machine learning (ML) problem is to develop a discriminator GNN model to distinguish between experimental and predicted structures based on protein graphs constructed from their 3D structures.
Overview of solution
We first parse the protein structures into JSON records with multiple types of data structures, such as an n-dimensional array and nested object, to store the proteins’ atomic coordinates, properties, and identifiers. Storing a JSON record for a protein’s structure takes 45 KB on average; we project storing 100 million proteins would take around 4.2 TB. Amazon DocumentDB storage automatically scales with the data in your cluster volume in 10 GB increments, up to 64 TB. Therefore, the support for JSON data structure and scalability makes Amazon DocumentDB a natural choice.
We next build a GNN model to predict protein properties using graphs of amino acid residues constructed from their structures. The GNN model is trained using SageMaker and configured to efficiently retrieve batches of protein structures from the database.
Finally, we analyze the trained GNN model to gain some insights into the predictions.
We walk through the following steps for this tutorial:
- Create resources using an AWS CloudFormation template.
- Prepare protein structures and properties and ingest the data into Amazon DocumentDB.
- Train a GNN on the protein structures using SageMaker.
- Load and evaluate the trained GNN model.
The code and notebooks used in this post are available in the GitHub repo.
Prerequisites
For this walkthrough, you should have the following prerequisites:
- An AWS account
- Permissions to create AWS resources including SageMaker, Amazon DocumentDB, Amazon Virtual Private Cloud (Amazon VPC), and AWS Secrets Manager
- Knowledge in ML model training and validation
Running this tutorial for an hour should cost no more than $2.00.
Create resources
We provide a CloudFormation template to create the required AWS resources for this post, with a similar architecture as in the post Analyzing data stored in Amazon DocumentDB (with MongoDB compatibility) using Amazon SageMaker. For instructions on creating a CloudFormation stack, see the video Simplify your Infrastructure Management using AWS CloudFormation.
The CloudFormation stack provisions the following:
- A VPC with three private subnets for Amazon DocumentDB and two public subnets intended for the SageMaker notebook instance and ML training containers, respectively.
- An Amazon DocumentDB cluster with three nodes, one in each private subnet.
- A Secrets Manager secret to store login credentials for Amazon DocumentDB. This allows us to avoid storing plaintext credentials in our SageMaker instance.
- A SageMaker notebook instance to prepare data, orchestrate training jobs, and run interactive analyses.
When creating the CloudFormation stack, you need to specify the following:
- Name for your CloudFormation stack
- Amazon DocumentDB user name and password (to be stored in Secrets Manager)
- Amazon DocumentDB instance type (default db.r5.large)
- SageMaker instance type (default ml.t3.xlarge)
It should take about 15 minutes to create the CloudFormation stack. The following diagram shows the resource architecture.
Prepare protein structures and properties and ingest the data into Amazon DocumentDB
All the subsequent code in this section is in the Jupyter notebook Prepare_data.ipynb in the SageMaker instance created in your CloudFormation stack.
This notebook handles the procedures required for preparing and ingesting protein structure data into Amazon DocumentDB.
- We first download predicted protein structures from AlphaFold DB in PDB format and the matching experimental structures from the Protein Data Bank.
For demonstration purposes, we only use proteins from the thermophilic archaean Methanocaldococcus jannaschii, which has the smallest proteome of 1,773 proteins for us to work with. You are welcome to try using proteins from other species.
- We connect to an Amazon DocumentDB cluster by retrieving the credentials stored in Secrets Manager:
- After we set up the connection to Amazon DocumentDB, we parse the PDB files into JSON records to ingest into the database.
We provide utility functions required for parsing PDB files in pdb_parse.py. The parse_pdb_file_to_json_record
function does the heavy lifting of extracting atomic coordinates from one or multiple peptide chains in a PDB file and returns one or a list of JSON documents, which can be directly ingested into the Amazon DocumentDB collection as a document. See the following code:
After we ingest the parsed protein data into Amazon DocumentDB, we can update the contents of the protein documents. For instance, it makes our model training logistics easier if we add a field indicating whether a protein structure should be used in the training, validation, or test sets.
- We first retrieve the all the documents with the field
is_AF
to stratify documents using an aggregation pipeline:
- Next, we use the
update_many
function to store the split information back to Amazon DocumentDB:
Train a GNN on the protein structures using SageMaker
All the subsequent code in this section is in the Train_and_eval.ipynb notebook in the SageMaker instance created in your CloudFormation stack.
This notebook trains a GNN model on the protein structure datasets stored in the Amazon DocumentDB.
We first need to implement a PyTorch dataset class for our protein dataset capable of retrieving mini-batches of protein documents from Amazon DocumentDB. It’s more efficient to retrieve batches documents by the built-in primary id (_id
).
- We use the iterable-style dataset by extending the IterableDataset, which pre-fetches the
_id
and labels of the documents at initialization:
- The
ProteinDataset
performs a database read operation in the__iter__
method. It tries to evenly split the workload if there are multiple workers:
- The preceding
__iter__
method also converts the atomic coordinates of proteins into DGLGraph objects after they’re loaded from Amazon DocumentDB via theconvert_to_graph
function. This function constructs a k-nearest neighbor (kNN) graph for the amino acid residues using the 3D coordinates of the C-alpha atoms and adds one-hot encoded node features to represent residue identities:
- With the
ProteinDataset
implemented, we can initialize instances for train, validation, and test datasets and wrap the training instance withBufferedShuffleDataset
to enable shuffling. - We further wrap them with
torch.utils.data.DataLoader
to work with other components of the SageMaker PyTorch Estimator training script. - Next, we implement a simple two-layered graph convolution network (GCN) with a global attention pooling layer for ease of interpretation:
- Afterwards, we can train this GCN on the
ProteinDataset
instance for a binary classification task of predicting whether a protein structure is predicted by AlphaFold or not. We use binary cross entropy as the objective function and Adam optimizer for stochastic gradient optimization. The full training script can be found in src/main.py.
Next, we set up the SageMaker PyTorch Estimator to handle the training job. To allow the managed Docker container initiated by SageMaker to connect to Amazon DocumentDB, we need to configure the subnet and security group for the Estimator.
- We retrieve the subnet ID where the Network Address Translation (NAT) gateway resides, as well as the security group ID of our Amazon DocumentDB cluster by name:
Load and evaluate the trained GNN model
When the training job is complete, we can load the trained GCN model and perform some in-depth evaluation.
The codes for the following steps are also available in the notebook Train_and_eval.ipynb.
SageMaker training jobs save the model artifacts into the default S3 bucket, the URI of which can be accessed from the estimator.model_data
attribute. We can also navigate to the Training jobs page on the SageMaker console to find the trained model to evaluate.
- For research purposes, we can load the model artifact (learned parameters) into a PyTorch state_dict using the following function:
- Next, we perform quantitative model evaluation on the full test set by calculating accuracy:
We found our GCN model achieved an accuracy of 74.3%, whereas the dummy baseline model making predictions based on class priors only achieved 56.3%.
We’re also interested in interpretability of our GCN model. Because we implement a global attention pooling layer, we can compute the attention scores across nodes to explain specific predictions made by the model.
- Next, we compute the attention scores and overlay them on the protein graphs for a pair of structures (AlphaFold predicted and experimental) from the same peptide:
The preceding codes produce the following protein graphs overlaid with attention scores on the nodes. We find the model’s global attentive pooling layer can highlight certain residues in the protein graph as being important for making the prediction of whether the protein structure is predicted by AlphaFold. This indicates that these residues may have distinctive graph topologies in predicted and experimental protein structures.
In summary, we showcase a scalable deep learning solution to train GNNs on protein structures stored in Amazon DocumentDB. Although the tutorial only uses thousands of proteins for training, this solution is scalable to millions of proteins. Unlike other approaches such as serializing the entire protein dataset, our approach transfers the memory-heavy workloads to the database, making the memory complexity for the training jobs O
(batch_size
), which is independent of the total number of proteins to train.
Clean up
To avoid incurring future charges, delete the CloudFormation stack you created. This removes all the resources you provisioned using the CloudFormation template, including the VPC, Amazon DocumentDB cluster, and SageMaker instance. For instructions, see Deleting a stack on the AWS CloudFormation console.
Conclusion
We described a cloud-based deep learning architecture scalable to millions of protein structures by storing them in Amazon DocumentDB and efficiently retrieving mini-batches of data from SageMaker.
To learn more about the use of GNN in protein property predictions, check out our recent publication LM-GVP, A Generalizable Deep Learning Framework for Protein Property Prediction from Sequence and Structure.
About the Authors
Zichen Wang, PhD, is an Applied Scientist in the Amazon Machine Learning Solutions Lab. With several years of research experience in developing ML and statistical methods using biological and medical data, he works with customers across various verticals to solve their ML problems.
Selvan Senthivel is a Senior ML Engineer with the Amazon ML Solutions Lab at AWS, focusing on helping customers on machine learning, deep learning problems, and end-to-end ML solutions. He was a founding engineering lead of Amazon Comprehend Medical and contributed to the design and architecture of multiple AWS AI services.
How deep learning is reducing Amazon’s packaging waste
A combination of deep learning, natural language processing, and computer vision enables Amazon to hone in on the right amount of packaging for each product.Read More
Identity verification using Amazon Rekognition
In-person user identity verification is slow to scale, costly, and high friction for users. Machine learning (ML) powered facial recognition technology can enable online user identity verification. Amazon Rekognition offers pre-trained facial recognition capabilities that you can quickly add to your user onboarding and authentication workflows to verify opted-in users’ identities online. No ML expertise is required. With Amazon Rekognition, you can onboard and authenticate users in seconds while detecting fraudulent or duplicate accounts. As a result, you can grow users faster, reduce fraud, and lower user verification costs.
In this post, we describe a typical identity verification workflow and show how to build an identity verification solution using various Amazon Rekognition APIs. We provide a complete sample implementation in our GitHub repository.
User registration workflow
The following figure shows a sample workflow of a new user registration. Typical steps in this process are:
- User captures selfie image and the image of a government-issued identity document.
- Quality check of the selfie image and optional liveness detection of the user face.
- Comparison of the selfie image with the identity document face image.
- Check of the selfie against a database of existing user faces.
You can customize the flow according to the business process. It often contains some or all of the steps presented in the preceding diagram. You can choose to run all the steps synchronously (wait for one step to complete before moving on to the next step). Alternately, you can run some of the steps highlighted in orange asynchronously (don’t wait for that step to complete) to speed up the user registration process and improve the customer experience. If the steps aren’t successful, you must roll back the user registration.
In addition to new user registration, another common flow is an existing or returning user login. In this flow, a check of the user face (selfie) is performed against a previously registered face. Typical steps in this process include user face capture (selfie), check of the selfie image quality, and search and compare of the selfie against the faces database. The following diagram shows a possible flow.
You can customize the steps of the process according to your business needs, and choose to include or exclude the liveness detection.
Solution overview
The following reference architecture shows how you can use Amazon Rekognition, along with other AWS services, to implement identity verification.
The architecture includes the following components:
- Applications invoke Amazon API Gateway to route requests to the correct AWS Lambda function depending on the user flow. There are four major actions in this solution: authenticate, register, register with ID card, and update.
- API Gateway uses a service integration to run the AWS Step Functions express state machine corresponding to the specific path called from API Gateway. Within each step, Lambda functions are responsible for triggering the correct set of calls to and from Amazon DynamoDB and Amazon Simple Storage Service (Amazon S3), along with the relevant Amazon Rekognition APIs.
- DynamoDB holds face IDs (
face-id
), S3 path URIs, and unique IDs (for example employee ID number) for eachface-id
. Amazon S3 stores all the face images. - The final major component of the solution is Amazon Rekognition. Each flow (authenticate, register, register with ID card, and update) calls different Amazon Rekognition APIs depending on the task.
Before we deploy the solution, it’s important to know the following concepts and API descriptions:
-
Collections – Amazon Rekognition stores information about detected faces in server-side containers known as collections. You can use the facial information that’s stored in a collection to search for known faces in images, stored videos, and streaming videos. You can use collections in a variety of scenarios. For example, you might create a face collection to store scanned badge images by using the
IndexFaces
operation. When an employee enters the building, an image of the employee’s face is captured and sent to theSearchFacesByImage
operation. If the face match produces a sufficiently high similarity score (say 99%), you can authenticate the employee. - DetectFaces API – This API detects faces within an image provided as input and returns information about faces. In a user registration workflow, this operation may help you screen images before moving to the next step. For example, you can check if a photo contains a face, if the person identified is in the right orientation, and if they’re not wearing any face blocker such as sunglasses or a cap.
- IndexFaces API – This API detects faces in the input image and adds them to the specified collection. This operation is used to add a screened image to a collection for future queries.
- SearchFacesByImage API – For a given input image, the API first detects the largest face in the image, and then searches the specified collection for matching faces. The operation compares the features of the input face with face features in the specified collection.
- CompareFaces API – This API compares a face in the source input image with each of the 100 largest faces detected in the target input image. If the source image contains multiple faces, the service detects the largest face and compares it with each face detected in the target image. For our use case, we expect both the source and target image to contain a single face.
- DeleteFaces API – This API deletes faces from a collection. You specify a collection ID and an array of face IDs to remove.
Prerequisites
Before you get started, complete the following prerequisites:
- Create an AWS account.
- Clone the sample repo on your local machine:
We use the test client in this repository to test the various workflows.
- Install Python 3.6+ on your local machine.
Deploy the solution
Choose the appropriate AWS CloudFormation stack to provision the solution into your AWS account in your preferred Region:
As we discussed earlier, this solution uses API Gateway integrated with Step Functions and Amazon Rekognition APIs to run the identity verification workflows. To test the solution, follow the steps in the code repository to use the provided test client.
The following sections describe the various workflows implemented via Step Functions.
New user registration
The following image illustrates the Step Functions definition for new user registration. The steps are defined in the register_user.py
file.
Three functions are called in this workflow: detect-faces, search-faces, and index-faces. The detect-faces function calls the Amazon Rekognition DetectFaces API to determine if a face is detected in an image and is usable. Some of the quality checks include determining that only the face is present in the image, ensuring the face isn’t obscured by sunglasses or a hat, and confirming that the face isn’t rotated by using the pose dimension. If the image passes the quality check, the search-faces function searches for an existing face match in the Amazon Rekognition collections by confirming the FaceMatchThreshold confidence score meets your threshold objective. For more information, refer to the section on using similarity thresholds to match faces. If the face image doesn’t exist in the collections, the index-faces function is called to index the face into the collection. The face image metadata is stored in the DynamoDB table and the face images are stored in an S3 bucket.
To register a new user, run the app.py script (test-client
) by running the following code:
If the new user registration succeeds, the face image attribute information is added in DynamoDB.
New user registration with ID card
The steps to register a new user with an ID card are similar to the steps for registering a new user. The following image illustrates the steps, which are defined in the register_idcard.py
file.
The same three functions that we used to register a user (detect-faces
, search-faces
, and index-faces
) are called in for this workflow. First, the customer captures an image of their ID and a live image of their face. The face image is checked to confirm it meets our defined quality standards using the DetectFaces
API. If the image meets the quality standards, the live face image is compared to the face in the ID to determine if they’re a match. If the images don’t match, the user receives an error and the process ends. If the images match, we check if the face already exists in the Amazon Rekognition collections using the SearchFacesByImage
API. The search results are compared to the user’s current face image. If the user already exists, the user isn’t registered. If the user doesn’t exist in the collections, the relevant properties are extracted from the ID card. You can extract key-value pairs from identity documents using the newly launched Amazon Textract AnalyzeID
API. The extracted properties from the ID card are merged and the user’s face is indexed in the DynamoDB table. After the image is indexed, the new user ID registration process is complete.
Existing user authentication
The following image illustrates the workflow for authenticating an existing user. The steps are defined in the auth.py
file.
This Step Function workflow calls three functions: detect-faces
, compare-faces
, and search-faces
. After the detect-faces
function verifies that the captured face image is valid, the compare-faces
function checks the DynamoDB table for a face image that matches an existing user. If a match is found in DynamoDB, the user authenticates successfully. If a match isn’t found, the search-faces
function is called to search for the face image in the collections. The user is verified and the authentication process completes if their face image exists in the collections. Otherwise, the user’s access is denied.
To test authenticating an existing user, run the app.py
script (test-client
) by running the following code::
Existing user login with a request for photo update
The following image illustrates the workflow to update an existing user’s photo. The steps are defined in the update.py
file.
This workflow calls four functions: detect-faces
, compare-faces
, search-faces
, and index-faces
. The steps are similar to the steps in the existing user authentication workflow. After the user captures their face image and the image quality is checked, we check for a matching face image in DynamoDB using the compare-faces
function. If a match for the user is found, their user profile is updated, their new face image is indexed by calling the index-faces
function, and the update process completes. Alternatively, if a match isn’t found, the search-faces function is called to search for the face image in the Amazon Rekognition collections. If the face image is found in the collection, the user’s profile is updated and their new face image is indexed. The user’s access is denied if their image isn’t found in the collections.
To update an existing user’s photo, run the app.py script (test-client
) by running the following code:
Clean up
To prevent accruing additional charges in your AWS account, delete the resources you provisioned by navigating to the AWS CloudFormation console and deleting the Riv-Prod
stack.
Deleting the stack doesn’t delete the S3 bucket you created. This bucket stores all the face images. If you choose to delete the S3 bucket, navigate to the Amazon S3 console, empty the bucket, and then confirm you want to permanently delete it.
Conclusion
Amazon Rekognition makes it easy to add image analysis to your identity verification applications using proven, highly scalable, deep learning technology that requires no ML expertise to use. Amazon Rekognition provides face detection and comparison capabilities. With a combination of the DetectFaces, CompareFaces, IndexFaces, and SearchFacesByImage APIs, you can implement the common flows around new user registration and existing user logins.
Amazon Rekognition collections provide a method to store information about detected faces in server-side containers. You can then use the facial information stored in a collection to search for known faces in images. When using collections, you don’t need to store original photos after you index faces in the collection. Amazon Rekognition collections don’t persist actual images. Instead, the underlying detection algorithm detects the faces in the input image, extracts facial features into a feature vector for each face, and stores it in the collection.
To start your journey towards identity verification, visit Identity Verification using Amazon Rekognition.
About the Authors
Nate Bachmeier is an AWS Senior Solutions Architect that nomadically explores New York, one cloud integration at a time. He specializes in migrating and modernizing applications. Besides this, Nate is a full-time student and has two kids.
Anthony Pasquariello is an Enterprise Solutions Architect based in New York City. He provides technical consultation to customers during their cloud journey, especially around security best practices. He has an MS and BS in electrical and computer engineering from Boston University. In his free time, he enjoys ramen, writing non-fiction, and philosophy.
Lauren Mullennex is a Solutions Architect based in Denver, CO. She works with customers to help them architect solutions on AWS. In her spare time, she enjoys hiking and cooking Hawaiian cuisine.
Amit Gupta is a Senior AI Services Solutions Architect at AWS. He is passionate about enabling customers with well-architected machine learning solutions at scale.