AWS customers rely on Amazon Lex bots to power their Amazon Connect self service conversational experiences on telephone and other channels. With Lex, callers (or customers, in Amazon Connect terminology) can get their questions conveniently answered regardless of agent availability. What architecture patterns can you use to make a bot resilient to service availability issues? In this post, we describe a cross-regional approach to yield higher availability by deploying Amazon Lex bots in multiple Regions.
Architecture overview
In this solution, Amazon Connect flows can achieve business continuity with minimal disruptions in the event of service availability issues with Amazon Lex. The architecture pattern uses the following components:
- Two Amazon Lex bots, each in a different Region.
- An Amazon Connect flow integrated with the bots triggered based on the result from the region check AWS Lambda function.
- A Lambda function to check the health of the bot.
- A Lambda function to read the Amazon DynamoDB table for the primary bot’s Region for a given Amazon Connect Region.
- A DynamoDB table to store a Region mapping between Amazon Connect and Amazon Lex. The health check function updates this table. The region check function reads this table for the most up-to-date primary Region mapping for Amazon Connect and Amazon Lex.
The goal of having identical Amazon Lex Bots in two Regions is to bring up the bot in the secondary Region and make it the primary in the event of an outage in the primary Region.
Multi-region pattern for Amazon Lex
The next two sections describe how an Amazon Connect flow integrated with an Amazon Lex bot can recover quickly in case of a service failure or outage in the primary Region and start servicing calls using Amazon Lex in the secondary Region.
The health check function calls one of the two Amazon Lex Runtime API calls—PutSession or PostText, depending on the TEST_METHOD Lambda environment variable. You can choose either one based on your preference and use case. The PutSession API call doesn’t have any extra costs associated with Amazon Lex, but it doesn’t test any natural language understanding (NLU) features of Amazon Lex. The PostTextAPI allows you to check the NLU functionality of Amazon Lex, but includes a minor cost.
The health check function updates the lexRegion column of the DynamoDB table (lexDR) with the Region name in which the test passed. If the health check passes the test in the primary Region, lexRegion gets updated with the name of the primary Region. If the health check fails, the function issues a call to the corresponding Runtime API based on the TEST_METHOD environment variable in the secondary Region. If the test succeeds, the lexRegion column in the DynamoDB table gets updated to the secondary Region; otherwise, it gets updated with err, which indicates both Regions have an outage.
On every call that Amazon Connect receives, it issues a region check function call to get the active Amazon Lex Region for that particular Amazon Connect Region. The primary Region returned by the region check function is the last entry written to the DynamoDB table by the health check function. Amazon Connect invokes the respective Get Customer Input Block configured with the Amazon Lex bot in the Region returned by the region check function. If the function returns the same Region as the Amazon Connect Region, it indicates that the health check has passed, and Amazon Connect calls the Amazon Lex bot in its same Region. If the function returns the secondary Region, Amazon Connect invokes the bot in the secondary Region.
Deploying Amazon Lex bots
You need to create an identical bot in both your primary and secondary Region. In this blog post, we selected us-east-1 as the primary and us-west-2 secondary Region. Begin by creating the bot in your primary Region, us-east-1.
- On the Amazon Lex console, click Create.
- In the Try a Sample section, select
OrderFlowers
. Select COPPA to No - Leave all other settings at their default value and click Create.
- The bot is created and will start to build automatically.
- After your bot is built (in 1–2 minutes), choose Publish.
- Create an alias with the name
ver_one
.
Repeat the above steps for us-west-2
. You should now have a working Amazon Lex bot in both us-east-1
and us-west-2
.
Creating a DynamoDB table
Make sure your AWS Region is us-east-1
.
- On the DynamoDB console, choose Create.
- For Table name, enter
lexDR
. - For Primary key, enter
connectRegion
with type String. - Leave everything else at their default and choose Create.
- On the Items tab, choose Create item.
- Set the
connectRegion
value tous-east-1
, and Append a new column of type String calledlexRegion
and set its value tous-east-1
.
- Click Save.
Creating IAM roles for Lambda functions
In this step, you create an AWS Identity and Access Management (IAM) for both Lambda functions to use.
- On the IAM console, click on Access management and select Policies.
- Click on Create Policy.
- Click on JSON.
- Paste the following custom IAM policy that allows read and write access to the DynamoDB table,
lexDR
. Replace the “xxxxxxxxxxxx” in the policy definition with your AWS Account Number.{ "Version": "2012-10-17", "Statement": [{ "Sid": "VisualEditor0", "Effect": "Allow", "Action": ["dynamodb:GetItem", "dynamodb:UpdateItem"], "Resource": "arn:aws:dynamodb:us-east-1:xxxxxxxxxxxx:table/lexDR" }] }
- Click on Review Policy.
- Give it a name
DynamoDBReadWrite
and click on Create Policy. - On the IAM console, click on Roles under Access management and then click on Create Role.
- Select Lambda for the service and click Next.
- Attach the following permissions policies:
AWSLambdaBasicExecutionRole
AmazonLexRunBotsOnly
DynamoDBReadWrite
- Click Next: Tags. Skip the Tags page by clicking Next: Review.
- Name the role
lexDRRole
. Click Save.
Deploying the region check function
You first create a Lambda function to read from the DynamoDB table to decide which Amazon Lex bot is in the same Region as the Amazon Connect instance. This function is later called by Amazon Connect or your application that’s using the bot.
- On the Lambda console, choose Create function.
- For Function name, enter
lexDRGetRegion
. - For Runtime, choose Python 3.8.
- Under Permissions, choose Use an existing role.
- Choose the role
lexDRRole
. - Choose Create function.
- In the Lambda code editor, enter the following code (downloaded from lexDRGetRegion.zip):
import json import boto3 import os import logging dynamo_client=boto3.client('dynamodb') logger = logging.getLogger() logger.setLevel(logging.DEBUG) def getCurrentPrimaryRegion(key): result = dynamo_client.get_item( TableName=os.environ['TABLE_NAME'], Key = { "connectRegion": {"S": key } } ) logger.debug(result['Item']['lexRegion']['S'] ) return result['Item']['lexRegion']['S'] def lambda_handler(event, context): logger.debug(event) region = event["Details"]["Parameters"]["region"] return { 'statusCode': 200, 'primaryCode': getCurrentPrimaryRegion(region) }
- In the Environment variables section, choose Edit.
- Add an environment variable with Key as
TABLE_NAME
and Value aslexDR
. - Click Save to save the environment variable.
- Click Save to save the Lambda function.
Deploying the health check function
Create another Lambda function in us-east-1
to implement the health check functionality.
- On the Lambda console, choose Create function.
- For Function name, enter
lexDRTest
. - For Runtime, choose Python 3.8.
- Under Permissions, choose Use an existing role.
- Choose
lexDRRole
. - Choose Create function.
- In the Lambda code editor, enter the following code (downloaded from lexDRTest.zip):
import json import boto3 import sys import os dynamo_client = boto3.client('dynamodb') primaryRegion = os.environ['PRIMARY_REGION'] secondaryRegion = os.environ['SECONDARY_REGION'] tableName = os.environ['TABLE_NAME'] primaryRegion_client = boto3.client('lex-runtime',region_name=primaryRegion) secondaryRegion_client = boto3.client('lex-runtime',region_name=secondaryRegion) def getCurrentPrimaryRegion(): result = dynamo_client.get_item( TableName=tableName, Key={ 'connectRegion': {'S': primaryRegion} } ) return result['Item']['lexRegion']['S'] def updateTable(region): result = dynamo_client.update_item( TableName= tableName, Key={ 'connectRegion': {'S': primaryRegion } }, UpdateExpression='set lexRegion = :region', ExpressionAttributeValues={ ':region': {'S':region} } ) #SEND MESSAGE/PUT SESSION ENV VA def put_session(botname, botalias, user, region): print(region,botname, botalias) client = primaryRegion_client if region == secondaryRegion: client = secondaryRegion_client try: response = client.put_session(botName=botname, botAlias=botalias, userId=user) if (response['ResponseMetadata'] and response['ResponseMetadata']['HTTPStatusCode'] and response['ResponseMetadata']['HTTPStatusCode'] != 200) or (not response['sessionId']): return 501 else: if getCurrentPrimaryRegion != region: updateTable(region) return 200 except: print('ERROR: {}',sys.exc_info()[0]) return 501 def send_message(botname, botalias, user, region): print(region,botname, botalias) client = primaryRegion_client if region == secondaryRegion: client = secondaryRegion_client try: message = os.environ['SAMPLE_UTTERANCE'] expectedOutput = os.environ['EXPECTED_RESPONSE'] response = client.post_text(botName=botname, botAlias=botalias, userId=user, inputText=message) if response['message']!=expectedOutput: print('ERROR: Expected_Response=Success, Response_Received='+response['message']) return 500 else: if getCurrentPrimaryRegion != region: updateTable(region) return 200 except: print('ERROR: {}',sys.exc_info()[0]) return 501 def lambda_handler(event, context): print(event) botName = os.environ['BOTNAME'] botAlias = os.environ['BOT_ALIAS'] testUser = os.environ['TEST_USER'] testMethod = os.environ['TEST_METHOD'] if testMethod == 'send_message': primaryRegion_response = send_message(botName, botAlias, testUser, primaryRegion) else: primaryRegion_response = put_session(botName, botAlias, testUser, primaryRegion) if primaryRegion_response != 501: primaryRegion_client.delete_session(botName=botName, botAlias=botAlias, userId=testUser) if primaryRegion_response != 200: if testMethod == 'send_message': secondaryRegion_response = send_message(botName, botAlias, testUser, secondaryRegion) else: secondaryRegion_response = put_session(botName, botAlias, testUser, secondaryRegion) if secondaryRegion_response != 501: secondaryRegion_client.delete_session(botName=botName, botAlias=botAlias, userId=testUser) if secondaryRegion_response != 200: updateTable('err') #deleteSessions(botName, botAlias, testUser) return {'statusCode': 200,'body': 'Success'}
- In the Environment variables section, choose Edit, and add the following environment variables:
- BOTNAME –
OrderFlowers
- BOT_ALIAS –
ver_one
- SAMPLE_UTTERANCE –
I would like to order some flowers.
(The example utterance you want to use to send a message to the bot.) - EXPECTED_RESPONSE –
What type of flowers would you like to order?
(The expected response from the bot when it receives the above sample utterance.) - PRIMARY_REGION –
us-east-1
- SECONDARY_REGION –
us-west-2
- TABLE_NAME –
lexDR
- TEST_METHOD –
put_session
orsend_message
- send_message : This method calls the Lex Runtime function postText function which takes an utterance and maps it to one of the trained intents. postText will test the Natural Language Understanding capability of Lex. You will also iincur a small charge of $0.00075 per request)
- put_session: This method calls the Lex Runtime function put_session function which creates a new session for the user. put_session will NOT test the Natual Language Understanding capability of Lex.)
- TEST_USER –
test
- BOTNAME –
- Click Save to save the environment variable.
- In the Basic Settings section, update the Timeout value to 15 seconds.
- Click Save to save the Lambda function.
Creating an Amazon CloudWatch rule
To trigger the health check function to run every 5 minutes, you create an Amazon CloudWatch rule.
- On the CloudWatch console, under Events, choose Rules.
- Choose Create rule.
- Under Event Source, change the option to Schedule.
- Set the Fixed rate of to 5 minutes
- Under Targets, choose Add target.
- Choose Lambda function as the target.
- For Function, choose
lexDRTest
. - Under Configure input, choose Constant(JSON text), and enter
{}
- Choose Configure details.
- Under Rule definition, for Name, enter
lexHealthCheckRule
. - Choose Create rule.
You should now have a lexHealthCheckRule
CloudWatch rule scheduled to invoke your lexDRTest
function every 5 minutes. This checks if your primary bot is healthy and updates the DynamoDB table accordingly.
Creating your Amazon Connect instance
You now create an Amazon Connect instance to test the multi-region pattern for the bots in the same Region where you created the lexDRTest
function.
- Create an Amazon Connect instance if you don’t already have one.
- On the Amazon Connect console, choose the instance alias where you want the Amazon Connect flow to be.
- Choose Contact flows.
- Under Amazon Lex, select
OrderFlowers
bot fromus-east-1
and click Add Lex Bot - Select
OrderFlowers
bot fromus-west-2
and click Add Lex Bot
- Under AWS Lambda, select
lexDRGetRegion
and click Add Lambda Function. - Log in to your Amazon Connect instance by clicking Overview in the left panel and clicking the login link.
- Click Routing in the left panel, and then click Contact Flows in the drop down menu.
- Click the Create Contact Flow button.
- Click the down arrow button next to the Save button, and click on Import Flow.
- Download the contact flow Flower DR Flow. Upload this file in the Import Flow dialog.
- In the Contact Flow, Click on the Inovke AWS Lambda Function block, and it will open a properties panel on the right.
- Select the
lexDRGetRegion
function and click Save. - Click on the Publish button to publish the contact flow.
Associating a phone number with the contact flow
Next, you will associate a phone number with your contact flow, so you can call in and test the OrderFlowers bot.
- Click on the Routing option in the left navigation panel.
- Click on Phone Numbers.
- Click on Claim Number.
- Select your country code and select a Phone Number.
- In the Contact flow/IVR select box, select the contact flow
Flower DR Flow
imported in the earlier step. - Wait for a few minutes, and then call into that number to interact with the OrderFlowers bot.
Testing your integration
To test this solution, you can simulate a failure in the us-east-1 Region by implementing the following:
- Open Amazon Lex Console in
us-east-1
Region - Select the
OrderFlowers
bot. - Click on Settings.
- Delete the bot alias
ver_one
When the health check runs the next time, it will try to communicate with the Lex Bot in us-east-1 region. It will fail in getting a successful response, as the bot alias no longer exists. So, it will then make the call to the secondary Region, us-west-2. Upon receiving a successful response. Upon receiving this response, it will update the lexRegion column in the lexDR, DynamoDB table with us-west-2.
After this, all subsequent calls to Connect in us-east-1 will start interacting with the Lex Bot in us-west-2. This automatic switch over demonstrates how this architectural pattern can help achieve business continuity in the event of a service failure.
Between the time you delete the bot alias, and the next health check run, calls to Amazon Connect will receive a failure. However, after the health check runs, you will see a continuity in business operational automatically. The smaller the duration between your health check runs, the shorter the outage you will have. The duration between your health check runs can be changed by editing the Amazon CloudWatch rule, lexHealthCheckRule
.
To make the health check pass in us-east-1
again, recreate the ver_one
alias of the OrderFlowers
bot in us-east-1
.
Cleanup
To avoid incurring any charges in the future, delete all the resources created above.
- Amazon Lex bot
OrderFlowers
created inus-east-1
andus-west-2
- The Cloudwatch rule
lexHealthCheckRule
- The DynamoDB Table
lexDR
- The Lambda functions
lexDRTest
andlexDRGetRegion
- Delete the IAM role
lexDRRole
- Delete the Contact Flow
Flower DR Flow
Conclusion
Coupled with Amazon Lex for self-service, Amazon Connect allows you to easily create intuitive customer service experiences. This post offers a multi-region approach for high availability so that, if a bot or the supporting fulfillment APIs are under pressure in one Region, resources from a different Region can continue to serve customer demand.
About the Authors
Shanthan Kesharaju is a Senior Architect in the AWS ProServe team. He helps our customers with their Conversational AI strategy, architecture, and development. Shanthan has an MBA in Marketing from Duke University, MS in Management Information Systems from Oklahoma State University, and a Bachelors in Technology from Kakaitya University in India. He is also currently pursuing his third Masters in Analytics from Georgia Tech.
Soyoung Yoon is a Conversation A.I. Architect at AWS Professional Services where she works with customers across multiple industries to develop specialized conversational assistants which have helped these customers provide their users faster and accurate information through natural language. Soyoung has M.S. and B.S. in Electrical and Computer Engineering from Carnegie Mellon University.