Create a multi-region Amazon Lex bot with Amazon Connect for high availability

AWS customers rely on Amazon Lex bots to power their Amazon Connect self service conversational experiences on telephone and other channels. With Lex, callers (or customers, in Amazon Connect terminology) can get their questions conveniently answered regardless of agent availability. What architecture patterns can you use to make a bot resilient to service availability issues? In this post, we describe a cross-regional approach to yield higher availability by deploying Amazon Lex bots in multiple Regions.

Architecture overview

In this solution, Amazon Connect flows can achieve business continuity with minimal disruptions in the event of service availability issues with Amazon Lex. The architecture pattern uses the following components:

  • Two Amazon Lex bots, each in a different Region.
  • An Amazon Connect flow integrated with the bots triggered based on the result from the region check AWS Lambda function.
  • A Lambda function to check the health of the bot.
  • A Lambda function to read the Amazon DynamoDB table for the primary bot’s Region for a given Amazon Connect Region.
  • A DynamoDB table to store a Region mapping between Amazon Connect and Amazon Lex. The health check function updates this table. The region check function reads this table for the most up-to-date primary Region mapping for Amazon Connect and Amazon Lex.

The goal of having identical Amazon Lex Bots in two Regions is to bring up the bot in the secondary Region and make it the primary in the event of an outage in the primary Region.

Mulit-region pattern for Amazon Lex in Amazon Connect

Multi-region pattern for Amazon Lex

The next two sections describe how an Amazon Connect flow integrated with an Amazon Lex bot can recover quickly in case of a service failure or outage in the primary Region and start servicing calls using Amazon Lex in the secondary Region.

The health check function calls one of the two Amazon Lex Runtime API calls—PutSession or PostText, depending on the TEST_METHOD Lambda environment variable. You can choose either one based on your preference and use case. The PutSession API call doesn’t have any extra costs associated with Amazon Lex, but it doesn’t test any natural language understanding (NLU) features of Amazon Lex. The PostTextAPI allows you to check the NLU functionality of Amazon Lex, but includes a minor cost.

The health check function updates the lexRegion column of the DynamoDB table (lexDR) with the Region name in which the test passed. If the health check passes the test in the primary Region, lexRegion gets updated with the name of the primary Region. If the health check fails, the function issues a call to the corresponding Runtime API based on the TEST_METHOD environment variable in the secondary Region. If the test succeeds, the lexRegion column in the DynamoDB table gets updated to the secondary Region; otherwise, it gets updated with err, which indicates both Regions have an outage.

On every call that Amazon Connect receives, it issues a region check function call to get the active Amazon Lex Region for that particular Amazon Connect Region. The primary Region returned by the region check function is the last entry written to the DynamoDB table by the health check function. Amazon Connect invokes the respective Get Customer Input Block configured with the Amazon Lex bot in the Region returned by the region check function. If the function returns the same Region as the Amazon Connect Region, it indicates that the health check has passed, and Amazon Connect calls the Amazon Lex bot in its same Region. If the function returns the secondary Region, Amazon Connect invokes the bot in the secondary Region.

Deploying Amazon Lex bots

You need to create an identical bot in both your primary and secondary Region. In this blog post, we selected us-east-1 as the primary and us-west-2 secondary Region. Begin by creating the bot in your primary Region, us-east-1.

  1. On the Amazon Lex console, click Create.
  2. In the Try a Sample section, select OrderFlowers.  Select COPPA to No
  3. Leave all other settings at their default value and click Create.
  4. The bot is created and will start to build automatically.
  5. After your bot is built (in 1–2 minutes), choose Publish.
  6. Create an alias with the name ver_one.

Repeat the above steps for us-west-2.  You should now have a working Amazon Lex bot in both us-east-1 and us-west-2.

Creating a DynamoDB table

Make sure your AWS Region is us-east-1.

  1. On the DynamoDB console, choose Create.
  2. For Table name, enter lexDR.
  3. For Primary key, enter connectRegion with type String.
  4. Leave everything else at their default and choose Create.
  5. On the Items tab, choose Create item.
  6. Set the connectRegion value to us-east-1 , and Append a new column of type String called lexRegion and set its value to us-east-1.
    Appending additional column to the Dynamo Table
  7. Click Save.
    Dynamo DB Table Entry showing Connect and Lex mapping

Creating IAM roles for Lambda functions

In this step, you create an AWS Identity and Access Management (IAM) for both Lambda functions to use.

  1. On the IAM console, click on Access management and select Policies.
  2. Click on Create Policy.
  3. Click on JSON.
  4. Paste the following custom IAM policy that allows read and write access to the DynamoDB table, lexDR. Replace the “xxxxxxxxxxxx” in the policy definition with your AWS Account Number.
    {
    	"Version": "2012-10-17",
    	"Statement": [{
    		"Sid": "VisualEditor0",
    		"Effect": "Allow",
    		"Action": ["dynamodb:GetItem", "dynamodb:UpdateItem"],
    		"Resource": "arn:aws:dynamodb:us-east-1:xxxxxxxxxxxx:table/lexDR"
    	}]
    }
  5. Click on Review Policy.
  6. Give it a name DynamoDBReadWrite and click on Create Policy.
  7. On the IAM console, click on Roles  under Access management  and then click on Create Role.
  8. Select Lambda for the service and click Next.
  9. Attach the following permissions policies:
    1. AWSLambdaBasicExecutionRole
    2. AmazonLexRunBotsOnly
    3. DynamoDBReadWrite
  10. Click Next: Tags. Skip the Tags page by clicking Next: Review.
  11. Name the role lexDRRole. Click Save.

Deploying the region check function

You first create a Lambda function to read from the DynamoDB table to decide which Amazon Lex bot is in the same Region as the Amazon Connect instance. This function is later called by Amazon Connect or your application that’s using the bot.

  1. On the Lambda console, choose Create function.
  2. For Function name, enter lexDRGetRegion.
  3. For Runtime, choose Python 3.8.
  4. Under Permissions, choose Use an existing role.
  5. Choose the role lexDRRole.
  6. Choose Create function.
  7. In the Lambda code editor, enter the following code (downloaded from lexDRGetRegion.zip):
    import json
    import boto3
    import os
    import logging
    dynamo_client=boto3.client('dynamodb')
    logger = logging.getLogger()
    logger.setLevel(logging.DEBUG)
     
    def getCurrentPrimaryRegion(key):
        result = dynamo_client.get_item(
            TableName=os.environ['TABLE_NAME'],
            Key = { 
                "connectRegion": {"S": key } 
            }
        )
        logger.debug(result['Item']['lexRegion']['S'] )
        return result['Item']['lexRegion']['S'] 
     
    def lambda_handler(event, context):
        logger.debug(event)
        region = event["Details"]["Parameters"]["region"]
        return {
            'statusCode': 200,
            'primaryCode': getCurrentPrimaryRegion(region)
        }

  8. In the Environment variables section, choose Edit.
  9. Add an environment variable with Key as TABLE_NAME and Value as lexDR.
  10. Click Save to save the environment variable.
  11. Click Save to save the Lambda function.

Environment Variables section in Lambda Console

Deploying the health check function

Create another Lambda function in us-east-1 to implement the health check functionality.

  1. On the Lambda console, choose Create function.
  2. For Function name, enter lexDRTest.
  3. For Runtime, choose Python 3.8.
  4. Under Permissions, choose Use an existing role.
  5. Choose lexDRRole.
  6. Choose Create function.
  7. In the Lambda code editor, enter the following code (downloaded from lexDRTest.zip):
    import json
    import boto3
    import sys
    import os
    
    dynamo_client = boto3.client('dynamodb')
    primaryRegion = os.environ['PRIMARY_REGION']
    secondaryRegion = os.environ['SECONDARY_REGION']
    tableName = os.environ['TABLE_NAME']
    primaryRegion_client = boto3.client('lex-runtime',region_name=primaryRegion)
    secondaryRegion_client = boto3.client('lex-runtime',region_name=secondaryRegion)
    
    def getCurrentPrimaryRegion():
        result = dynamo_client.get_item(
            TableName=tableName,
            Key={  
                'connectRegion': {'S': primaryRegion}  
            }
        )
        return result['Item']['lexRegion']['S'] 
        
    def updateTable(region):
        result = dynamo_client.update_item( 
            TableName= tableName,
            Key={  
                'connectRegion': {'S': primaryRegion } 
            },  
            UpdateExpression='set lexRegion = :region',
            ExpressionAttributeValues={
            ':region': {'S':region}
            }
        )
        
    #SEND MESSAGE/PUT SESSION ENV VA
    def put_session(botname, botalias, user, region):
        print(region,botname, botalias)
        client = primaryRegion_client
        if region == secondaryRegion:
            client = secondaryRegion_client
        try:
            response = client.put_session(botName=botname, botAlias=botalias, userId=user)
            if (response['ResponseMetadata'] and response['ResponseMetadata']['HTTPStatusCode'] and response['ResponseMetadata']['HTTPStatusCode'] != 200) or (not response['sessionId']):  
                return 501
            else:
                if getCurrentPrimaryRegion != region:
                    updateTable(region)
            return 200
        except:
            print('ERROR: {}',sys.exc_info()[0])
            return 501
    
    def send_message(botname, botalias, user, region):
        print(region,botname, botalias)
        client = primaryRegion_client
        if region == secondaryRegion:
            client = secondaryRegion_client
        try:
            message = os.environ['SAMPLE_UTTERANCE']
            expectedOutput = os.environ['EXPECTED_RESPONSE']
            response = client.post_text(botName=botname, botAlias=botalias, userId=user, inputText=message)
            if response['message']!=expectedOutput:
                print('ERROR: Expected_Response=Success, Response_Received='+response['message'])
                return 500
            else:
                if getCurrentPrimaryRegion != region:
                    updateTable(region)
                return 200
        except:
            print('ERROR: {}',sys.exc_info()[0])
            return 501
    
    def lambda_handler(event, context):
        print(event)
        botName = os.environ['BOTNAME']
        botAlias = os.environ['BOT_ALIAS']
        testUser = os.environ['TEST_USER']
        testMethod = os.environ['TEST_METHOD']
        if testMethod == 'send_message':
            primaryRegion_response = send_message(botName, botAlias, testUser, primaryRegion)
        else:
            primaryRegion_response = put_session(botName, botAlias, testUser, primaryRegion)
        if primaryRegion_response != 501:
            primaryRegion_client.delete_session(botName=botName, botAlias=botAlias, userId=testUser)
        if primaryRegion_response != 200:
            if testMethod == 'send_message':
                secondaryRegion_response = send_message(botName, botAlias, testUser, secondaryRegion)
            else:
                secondaryRegion_response = put_session(botName, botAlias, testUser, secondaryRegion)
            if secondaryRegion_response != 501:
                secondaryRegion_client.delete_session(botName=botName, botAlias=botAlias, userId=testUser)
            if secondaryRegion_response != 200:
                updateTable('err')
        #deleteSessions(botName, botAlias, testUser)
        return {'statusCode': 200,'body': 'Success'}
    

  8. In the Environment variables section, choose Edit, and add the following environment variables:
    • BOTNAMEOrderFlowers
    • BOT_ALIASver_one
    • SAMPLE_UTTERANCEI would like to order some flowers.
      (The example utterance you want to use to send a message to the bot.)
    • EXPECTED_RESPONSE What type of flowers would you like to order?
      (The expected response from the bot when it receives the above sample utterance.)
    • PRIMARY_REGIONus-east-1
    • SECONDARY_REGIONus-west-2
    • TABLE_NAMElexDR
    • TEST_METHODput_session or send_message
      • send_message : This method calls the Lex Runtime function postText function which takes an utterance and maps it to one of the trained intents. postText will test the Natural Language Understanding capability of Lex. You will also iincur a small charge of $0.00075 per request)
      • put_session: This method calls the Lex Runtime function put_session function which creates a new session for the user. put_session will NOT test the Natual Language Understanding capability of Lex.)
    • TEST_USERtest
  9. Click Save to save the environment variable.
  10. In the Basic Settings section, update the Timeout value to 15 seconds.
  11. Click Save to save the Lambda function.

Environment Variables section in Lambda Console

Creating an Amazon CloudWatch rule

To trigger the health check function to run every 5 minutes, you create an Amazon CloudWatch rule.

  1. On the CloudWatch console, under Events, choose Rules.
  2. Choose Create rule.
  3. Under Event Source, change the option to Schedule.
  4. Set the Fixed rate of to 5 minutes
  5. Under Targets, choose Add target.
  6. Choose Lambda function as the target.
  7. For Function, choose lexDRTest.
  8. Under Configure input, choose Constant(JSON text), and enter {}
  9. Choose Configure details.
  10. Under Rule definition, for Name, enter lexHealthCheckRule.
  11. Choose Create rule.

You should now have a lexHealthCheckRule CloudWatch rule scheduled to invoke your lexDRTest function every 5 minutes. This checks if your primary bot is healthy and updates the DynamoDB table accordingly.

Creating your Amazon Connect instance

You now create an Amazon Connect instance to test the multi-region pattern for the bots in the same Region where you created the lexDRTest function.

  1. Create an Amazon Connect instance if you don’t already have one.
  2. On the Amazon Connect console, choose the instance alias where you want the Amazon Connect flow to be.
  3. Choose Contact flows.
  4. Under Amazon Lex, select OrderFlowers bot from us-east-1 and click Add Lex Bot
  5. Select OrderFlowers bot from us-west-2 and click Add Lex Bot
    Adding Lex Bots in Connect Contact Flows
  6. Under AWS Lambda, select lexDRGetRegion and click Add Lambda Function.
  7. Log in to your Amazon Connect instance by clicking Overview in the left panel and clicking the login link.
  8. Click Routing in the left panel, and then click Contact Flows in the drop down menu.
  9. Click the Create Contact Flow button.
  10. Click the down arrow button next to the Save button, and click on Import Flow.
  11. Download the contact flow Flower DR Flow. Upload this file in the Import Flow dialog.
    Amazon Connect Contact Flow
  12. In the Contact Flow, Click on the Inovke AWS Lambda Function block, and it will open a properties panel on the right.
  13. Select the lexDRGetRegion function and click Save.
  14. Click on the Publish button to publish the contact flow.

Associating a phone number with the contact flow

Next, you will associate a phone number with your contact flow, so you can call in and test the OrderFlowers bot.

  1. Click on the Routing option in the left navigation panel.
  2. Click on Phone Numbers.
  3. Click on Claim Number.
  4. Select your country code and select a Phone Number.
  5. In the Contact flow/IVR select box, select the contact flow Flower DR Flow imported in the earlier step.
  6. Wait for a few minutes, and then call into that number to interact with the OrderFlowers bot.

Testing your integration

To test this solution, you can simulate a failure in the us-east-1 Region by implementing the following:

  1. Open Amazon Lex Console in us-east-1 Region
  2. Select the OrderFlowers bot.
  3. Click on Settings.
  4. Delete the bot alias ver_one

When the health check runs the next time, it will try to communicate with the Lex Bot in us-east-1 region. It will fail in getting a successful response, as the bot alias no longer exists. So, it will then make the call to the secondary Region, us-west-2. Upon receiving a successful response. Upon receiving this response, it will update the lexRegion column in the lexDR, DynamoDB table with us-west-2.

After this, all subsequent calls to Connect in us-east-1 will start interacting with the Lex Bot in us-west-2. This automatic switch over demonstrates how this architectural pattern can help achieve business continuity in the event of a service failure.
Between the time you delete the bot alias, and the next health check run, calls to Amazon Connect will receive a failure. However, after the health check runs, you will see a continuity in business operational automatically. The smaller the duration between your health check runs, the shorter the outage you will have. The duration between your health check runs can be changed by editing the Amazon CloudWatch rule, lexHealthCheckRule .

To make the health check pass in us-east-1 again, recreate the ver_one alias of the OrderFlowers bot in us-east-1.

Cleanup

To avoid incurring any charges in the future, delete all the resources created above.

  1. Amazon Lex bot OrderFlowers created in us-east-1 and us-west-2
  2. The Cloudwatch rule lexHealthCheckRule
  3. The DynamoDB Table lexDR
  4. The Lambda functions lexDRTest and lexDRGetRegion
  5. Delete the IAM role lexDRRole
  6. Delete the Contact Flow Flower DR Flow

Conclusion

Coupled with Amazon Lex for self-service, Amazon Connect allows you to easily create intuitive customer service experiences. This post offers a multi-region approach for high availability so that, if a bot or the supporting fulfillment APIs are under pressure in one Region, resources from a different Region can continue to serve customer demand.


About the Authors

Shanthan Kesharaju is a Senior Architect in the AWS ProServe team. He helps our customers with their Conversational AI strategy, architecture, and development. Shanthan has an MBA in Marketing from Duke University, MS in Management Information Systems from Oklahoma State University, and a Bachelors in Technology from Kakaitya University in India. He is also currently pursuing his third Masters in Analytics from Georgia Tech.

 

 

Soyoung Yoon is a Conversation A.I. Architect at AWS Professional Services where she works with customers across multiple industries to develop specialized conversational assistants which have helped these customers provide their users faster and accurate information through natural language. Soyoung has M.S. and B.S. in Electrical and Computer Engineering from Carnegie Mellon University.

 

 

 

Read More