In its collaboration with the NFL, AWS contributes cloud computing technology, machine learning services, business intelligence services — and, sometimes, the expertise of its scientists.Read More
Using genetic algorithms on AWS for optimization problems
Machine learning (ML)-based solutions are capable of solving complex problems, from voice recognition to finding and identifying faces in video clips or photographs. Usually, these solutions use large amounts of training data, which results in a model that processes input data and produces numeric output that can be interpreted as a word, face, or classification category. For many types of problems, this approach works very well.
But what if you have a problem that doesn’t have training data available, or doesn’t fit within the concept of a classification or regression? For example, what if you need to find an optimal ordering for a given set of worker tasks with a given set of conditions and constraints? How do you solve that, especially if the number of tasks is very large?
This post describes genetic algorithms (GAs) and demonstrates how to use them on AWS. GAs are unsupervised ML algorithms used to solve general types of optimization problems, including:
- Optimal data orderings – Examples include creating work schedules, determining the best order to perform a set of tasks, or finding an optimal path through an environment
- Optimal data subsets – Examples include finding the best subset of products to include in a shipment, or determining which financial instruments to include in a portfolio
- Optimal data combinations – Examples include finding an optimal strategy for a task that is composed of many components, where each component is a choice of one of many options
For many optimization problems, the number of potential solutions (good and bad) is very large, so GAs are often considered a type of search algorithm, where the goal is to efficiently search through a huge solution space. GAs are especially advantageous when the fitness landscape is complex and non-convex, so that classical optimization methods such as gradient descent are an ineffective means to find a global solution. Finally, GAs are often referred to as heuristic search algorithms because they don’t guarantee finding the absolute best solution, but they do have a high probability of finding a sufficiently good solution to the problem in a short amount of time.
GAs use concepts from evolution such as survival of the fittest, genetic crossover, and genetic mutation to solve problems. Rather than trying to create a single solution, those evolutionary concepts are applied to a population of different problem solutions, each of which is initially random. The population goes through a number of generations, literally evolving solutions through mechanisms like reproduction (crossover) and mutation. After a number of generations of evolution, the best solution found across all the generations is chosen as the final problem solution.
As a prerequisite to using a GA, you must be able to do the following:
- Represent each potential solution in a data structure.
- Evaluate that data structure and return a numeric fitness score that accurately reflects the solution quality. For example, imagine a fitness score that measures the total time to perform a set of tasks. In that case, the goal would be to minimize that fitness score in order to perform the tasks as quickly as possible.
Each member of the population has a different solution stored in its data structure, so the fitness function must return a score that can be used to compare two candidates against each other. That’s the “survival of the fittest” part of the algorithm—one candidate is evaluated as better than another, and that fitter candidate’s information is passed on to future generations.
One note about terminology: because many of the ideas behind a genetic algorithm come from the field of genetics, the data representation that each member of a population uses is sometimes called a genome. That’s simply another way to refer to the data used to represent a particular solution.
Use case: Finding an optimal route for a delivery van
As an example, let’s say that you work for a company that ships lots of packages all over the world, and your job is focused on the final step, which is delivering a package by truck or van to its final destination.
A given delivery vehicle might have up to 100 packages at the start of a day, so you’d like to calculate the shortest route to deliver all the packages and return the truck to the main warehouse when done. This is a version of a classic optimization problem called The Travelling Salesman Problem, originally formulated in 1930. In the following visualization of the problem, displayed as a top-down map of a section of a city, the warehouse is shown as a yellow dot, and each delivery stop is shown as a red dot.
To keep things simple for this demonstration, we assume that when traveling from one delivery stop to another, there are no one-way roads. Due to this assumption, the total distance traveled from one stop to the next is the difference in X coordinates added to the difference in Y coordinates.
If the problem had a slightly different form (like traveling via airplane rather than driving through city streets), we might calculate the distance using the Pythagorean equation, taking the square root of the total of the difference in X coordinates (squared) added to the difference in Y coordinates (squared). For this use case, however, we stick with the total of the difference in X coordinates added to the total of the difference in Y coordinates, because that matches how a truck travels to deliver the packages, assuming two-way streets.
Next, let’s get a sense of how challenging this problem is. In other words, how many possible routes are there with 100 stops where you visit each stop only once? In this case, the math is simple: there are 100 possible first stops multiplied by 99 possible second stops, multiplied by 98 possible third stops, and so on—100 factorial (100!), in other words. That’s 9.3 x 10157 possibilities, which definitely counts as a large solution space and rules out any thoughts of using a brute force approach. After all, with that volume of potential solutions, there really is no way to iterate through all the possible solutions in any reasonable amount of time.
Given that, it seems that a GA could be a good approach for this problem, because GAs are effective at finding good-quality solutions within very large solution spaces. Let’s develop a GA to see how that works.
Representation and a fitness function
As mentioned earlier, the first step in writing a GA is to determine the data structure for a solution. Suppose that we have a list of all 100 destinations with their associated locations. A useful data representation for this problem is to have each candidate store a list of 100 location indexes that represent the order the delivery van must visit each location. The X and Y coordinates found in the lookup table could be latitude and longitude coordinates or other real-world data.
To implement our package delivery solution, we use a Python script, although almost any modern computer language like Java or C# works well. Open-source packages like inspyred also create the general structure of a GA, allowing you to focus on just the parts that vary from project to project. However, for the purposes of introducing the ideas behind a GA, we write the code without relying on third-party libraries.
As a first step, we represent a potential solution as the following code:
class CandidateSolution(object):
def __init__(self):
self.fitness_score = 0
num_stops = len(delivery_stop_locations) # a list of (X,Y) tuples
self.path = list(range(num_stops))
random.shuffle(self.path)
The class has a fitness_score
field and a path
field. The path is a list of indexes into delivery_stop_locations
, which is a list of (X,Y) coordinates for each delivery stop. That list is loaded from a database elsewhere in the code. We also use random.shuffle
(), which ensures that each potential solution is a randomly shuffled list of indexes into the delivery_stop_locations
list. GAs always start with a population of completely random solutions, and then rely on evolution to home in on the best solution possible.
With this data structure, the fitness function is straightforward. We start at the warehouse, then travel to the first location in our list, then the second location, and so on until we’ve visited all delivery stop locations, and then we return to the warehouse. In the end, the fitness function simply totals up the distance traveled over that entire trip. The goal of this GA is to minimize that distance, so the smaller the fitness score, the better the solution. We use the following the code to implement the fitness function:
def dist(location_a, location_b):
xdiff = abs(location_a['X'] - location_b['X'])
ydiff = abs(location_a['Y'] - location_b['Y'])
return xdiff + ydiff
def calc_score_for_candidate(candidate):
# start with the distance from the warehouse to the first stop
warehouse_location = {'X': STARTING_WAREHOUSE_X, 'Y': STARTING_WAREHOUSE_Y}
total_distance = dist(warehouse_location, delivery_stop_locations[candidate.path[0]])
# then travel to each stop
for i in range(len(candidate.path) - 1):
total_distance += dist(
delivery_stop_locations[candidate.path[i]],
delivery_stop_locations[candidate.path[i + 1]])
# then travel back to the warehouse
total_distance += dist(warehouse_location, delivery_stop_locations
[candidate.path[-1]])
return total_distance
Now that we have a representation and a fitness function, let’s look at the overall flow of a genetic algorithm.
Program flow for a genetic algorithm
When you have a data representation and a fitness function, you’re ready to create the rest of the GA. The standard program flow includes the following pseudocode:
- Generation 0 – Initialize the entire population with completely random solutions.
- Fitness – Calculate the fitness score for each member of the population.
- Completion check – Take one of the following actions:
- If the best fitness score found in the current generation is better than any seen before, save it as a potential solution.
- If you go through a certain number of generations without any improvement (no better solution has been found), then exit this loop, returning the best found to date.
- Elitism – Create a new generation, initially empty. Take a small percentage (like 5%) of the best-scoring candidates from the current generation and copy them unchanged into the new generation.
- Selection and crossover – To populate the remainder of the new generation, repeatedly select two good candidate solutions from the current generation and combine them to form a new child candidate that gets added to the next generation.
- Mutation – On rare occasions (like 2%, for example), mutate a newly created child candidate by randomly perturbing its data.
- Replace the current generation with the next generation and return to step 2.
When the algorithm exits the main loop, the best solution found during that run is used as the problem’s final solution. However, it’s important to realize that because there is so much randomness in a GA—from the initially completely random candidates to randomized selection, crossover, and mutation—each time you run a GA, you almost certainly get a different result. Because of that randomness, a best practice when using a GA is to run it multiple times to solve the same problem, keeping the very best solutions found across all the runs.
Using a genetic algorithm on AWS via Amazon SageMaker Processing
Due to the inherent randomness that comes with a GA, it’s usually a good idea to run the code multiple times, using the best result found across those runs. This can be accomplished using Amazon SageMaker Processing, which is an Amazon SageMaker managed service for running data processing workloads. In this case, we use it to launch the GA so that multiple instances of the code run in parallel.
Before we start, we need to set up a couple of AWS resources that our project needs, like database tables to store the delivery stop locations and GA results, and an AWS Identity and Access Management (IAM) role to run the GA. Use the AWS CloudFormation template included in the associated GitHub repo to create these resources, and make a note of the resulting ARN of the IAM role. Detailed instructions are included in the README file found in the GitHub repo.
After you create the required resources, populate the Amazon DynamoDB table DeliveryStops
(indicating the coordinates for each delivery stop) using the Python script create_delivery_stops.py
, which is included in the code repo. You can run this code from a SageMaker notebook or directly from a desktop computer, assuming you have Python and Boto3 installed. See the README in the repo for detailed instructions on running this code.
We use DynamoDB for storing the delivery stops and the results. DynamoDB is a reasonable choice for this use case because it’s highly scalable and reliable, and doesn’t require any maintenance due to it being a fully managed service. DynamoDB can handle more than 10 trillion requests per day and can support peaks of more than 20 million requests per second, although this use case doesn’t require anywhere near that kind of volume.
After you create the IAM role and DynamoDB tables, we’re ready to set up the GA code and run it using SageMaker.
- To start, create a notebook in SageMaker.
Be sure to use a notebook instance rather than SageMaker Studio, because we need a kernel with Docker installed.
To use SageMaker Processing, we first need to create a Docker image that we use to provide a runtime environment for the GA.
- Upload
Dockerfile
andgenetic_algorithm.py
from the code repo into the root folder for your Jupyter notebook instance. - Open
Dockerfile
and ensure that theENV AWS_DEFAULT_REGION
line refers to the AWS Region that you’re using.
The default Region in the file from the repo is us-east-2
, but you can use any Region you wish.
- Create a cell in your notebook and enter the following code:
import boto3 print("Building container...") region = boto3.session.Session().region_name account_id = boto3.client('sts').get_caller_identity().get('Account') ecr_repository = 'sagemaker-processing-container-for-ga' tag = ':latest' base_uri = '{}.dkr.ecr.{}.amazonaws.com'.format(account_id, region) repo_uri = '{}/{}'.format(base_uri, ecr_repository + tag) # Create ECR repository and push docker image !docker build -t $ecr_repository docker !aws ecr get-login-password --region $region | docker login --username AWS --password-stdin $base_uri !aws ecr create-repository --repository-name $ecr_repository !docker tag {ecr_repository + tag} $repo_uri !docker push $repo_uri print("Container Build done") iam_role = 'ARN_FOR_THE_IAM_ROLE_CREATED_EARLIER'
Be sure to fill in the iam_role
ARN, which is displayed on the Outputs page of the CloudFormation stack that you created earlier. You can also change the name of the Docker image if you wish, although the default value of sagemaker-processing-container-for-ga
is reasonable.
Running that cell creates a Docker image that supports Python with the Boto3 package installed, and then registers it with Amazon Elastic Container Registry (Amazon ECR), which is a fully-managed Docker registry that handles everything required to scale or manage the storage of Docker images.
Add a new cell to your notebook and enter and run the following code:
from sagemaker.processing import ScriptProcessor
processor = ScriptProcessor(image_uri=repo_uri,
role=iam_role,
command=['python3']
instance_count=1,
instance_type="ml.m5.xlarge")
processor.run(code='./genetic_algorithm.py')
This image shows the job launched, and the results displayed below as the GA does its processing:
The ScriptProcessor
class is used to create a container that the GA code runs in. We don’t include the code for the GA in the container itself because the ScriptProcessor
class is designed to be used as a generic container (preloaded with all required software packages), and the run command chooses a Python file to run within that container. Although the GA Python code is located on your notebook instance, SageMaker Processing copies it to an Amazon Simple Storage Service (Amazon S3) bucket in your account so that it can be referenced by the processing job. Because of that, the IAM role we use must include a read-only permission policy for Amazon S3, along with other required permissions related to services like DynamoDB and Amazon ECR.
Calculating fitness scores is something that can and should be done in parallel, because fitness calculations tend to be fairly slow and each candidate solution is independent of all other candidate solutions. The GA code for this demonstration uses multiprocessing to calculate multiple fitness scores at the same time, which dramatically increases the speed at which the GA runs. We also specify the instance type in the ScriptProcessor
constructor. In this case, we chose ml.m5.xlarge in order to use a processor with 4 vCPUs. Choosing an instance type with more vCPUs results in faster runs of each run of the GA, at a higher price per hour. There is no benefit to using an instance type with GPUs for a GA, because all of the work is done via a CPU.
Finally, the ScriptProcessor
constructor also specifies the number of instances to run. If you specify a number of instances greater than 1, the same code runs in parallel, which is exactly what we want for a GA. Each instance is a complete run of the GA, run in its own container. Because each instance is completely self-contained, we can run multiple instances at once, and each instance does its calculations and writes its results into the DynamoDB results table.
To review, we’re using two different forms of parallelism for the GA: one is through running multiple instances at once (one per container), and the other is through having each container instance use multiprocessing in order to effectively calculate fitness scores for multiple candidates at the same time.
The following diagram illustrates the overall architecture of this approach.
The Docker image defines the runtime environment, which is stored in Amazon ECR. That image is combined with a Python script that runs the GA, and SageMaker Processing uses one or more containers to run the code. Each instance reads configuration data from DynamoDB and writes results into DynamoDB.
Genetic operations
Now that we know how to run a GA using SageMaker, let’s dive a little deeper into how we can apply a GA to our delivery problem.
Selection
When we select two parents for crossover, we want a balance between good quality and randomness, which can be thought of as genetic diversity. If we only pick candidates with the best fitness scores, we miss candidates that have elements that might eventually help find a great solution, even though the candidate’s current fitness score isn’t the best. On the other hand, if we completely ignore quality when selecting parents, the evolutionary process doesn’t work very well—we’re ignoring survival of the fittest.
There are a number of approaches for selection, but the simplest is called tournament selection. With a tournament of size 2, you randomly select two candidates from the population and keep the best one. The same applies to a tournament of size 3 or more—you simply use the one with the best fitness score. The larger the number you use, the better quality candidate you get, but at a cost of reduced genetic diversity.
The following code shows the implementation of tournament selection:
def tourney_select(population):
selected = random.sample(population, TOURNEY_SIZE)
best = min(selected, key=lambda c: c.fitness_score)
return best
def select_parents(population):
# using Tourney selection, get two candidates and make sure they're distinct
while True:
candidate1 = tourney_select(population)
candidate2 = tourney_select(population)
if candidate1 != candidate2:
break
return candidate1, candidate2
Crossover
After we select two candidates, how can we combine them to form one or two children? If both parents are simply lists of numbers and we can’t duplicate or leave out any numbers from the list, combining the two can be challenging.
One approach is called partially mapped crossover. It works as follows:
- Copy each parent, creating two children.
- Randomly select a starting and ending point for crossover within the genome. We use the same starting and ending points for both children.
- For each child, iterate from the starting crossover point to the ending crossover point and perform the following actions on each gene in the child at the current point:
- Find the corresponding gene in the other parent (the one that wasn’t copied into the current child), using the same crossover point. If that gene matches what’s already in the child at that point, continue to the next point, because no crossover is required for the gene.
- Otherwise, find the gene from the alternate parent and swap it with the current gene within the child.
The following diagram illustrates the first step, making copies of both parents.
Each child is crossed over with the alternate parent. The following diagram shows the randomly selected start and end points, with the thick arrow indicating which gene is crossed over next.
In the first swap position, the parent contributes the value 8. Because the current gene value in the child is 4, the 4 and 8 are swapped within the child.
That swap has the effect of taking the gene with value 8 from the parent and placing it within the child at the corresponding position. When the swap is complete, the large arrow moves to the next gene to cross over.
At this point, the sequence is repeated. In this case, both gene values in the current position are the same (6), so the crossover position advances to the next position.
The gene value from the parent is 7 in this case, so the swap occurs within the child.
The following diagram shows the final result, with the arrows indicating how the genes were crossed over.
Crossover isn’t a mandatory step, and most GAs use a crossover rate parameter to control how often crossover happens. If two parents are selected but crossover isn’t used, both parents are copied unchanged into the next generation.
We used the following code for the crossover in this solution:
def crossover_parents_to_create_children(parent_one, parent_two):
child1 = copy.deepcopy(parent_one)
child2 = copy.deepcopy(parent_two)
# sometimes we don't cross over, so use copies of the parents
if random.random() >= CROSSOVER_RATE:
return child1, child2
num_genes = len(parent_one.path)
start_cross_at = random.randint(0, num_genes - 2) # pick a point between 0 and the end - 2, so we can cross at least 1 stop
num_remaining = num_genes - start_cross_at
end_cross_at = random.randint(num_genes - num_remaining + 1, num_genes - 1)
for index in range(start_cross_at, end_cross_at + 1):
child1_stop = child1.path[index]
child2_stop = child2.path[index]
# if the same, skip it since there is no crossover needed at this gene
if child1_stop == child2_stop:
continue
# find within child1 and swap
first_found_at = child1.path.index(child1_stop)
second_found_at = child1.path.index(child2_stop)
child1.path[first_found_at], child1.path[second_found_at] = child1.path[second_found_at], child1.path[first_found_at]
# and the same for the second child
first_found_at = child2.path.index(child1_stop)
second_found_at = child2.path.index(child2_stop)
child2.path[first_found_at], child2.path[second_found_at] = child2.path[second_found_at], child2.path[first_found_at]
return child1, child2
Mutation
Mutation is a way to add genetic diversity to a GA, which is often desirable. However, too much mutation causes the GA to lose its way, so it’s best to use it in moderation if it’s needed at all.
You can approach mutation for this problem in two different ways: swapping and displacement.
A swap mutation is just what it sounds like—two randomly selected locations (genes) are swapped within a genome (see the following diagram).
The following code performs the swap:
def swap_mutation(candidate):
indexes = range(len(candidate.path))
pos1, pos2 = random.sample(indexes, 2)
candidate.path[pos1], candidate.path[pos2] = candidate.path[pos2], candidate.path[pos1]
A displacement mutation randomly selects a gene, randomly selects an insertion point, and moves the selected gene into the selected insertion point, shifting other genes as needed to make space (see the following diagram).
The following code performs the displacement:
def displacement_mutation(candidate):
num_stops = len(candidate.path)
stop_to_move = random.randint(0, num_stops - 1)
insert_at = random.randint(0, num_stops - 1)
# make sure it's moved to a new index within the path, so it's really different
while insert_at == stop_to_move:
insert_at = random.randint(0, num_stops - 1)
stop_index = candidate.path[stop_to_move]
del candidate.path[stop_to_move]
candidate.path.insert(insert_at, stop_index)
Elitism
An optional part of any GA is elitism, which is done when populating a new generation of candidates. When used, elitism copies a certain percentage of the best-scoring candidates from the current generation into the next generation. Elitism is a method for ensuring that the very best candidates always remain in the population. See the following code:
num_elites = int(ELITISM_RATE * POPULATION_SIZE)
current_generation.sort(key=lambda c: c.fitness_score)
next_generation = [current_generation[i] for i in range(num_elites)]
Results
It’s helpful to compare the results from our GA to those from a baseline algorithm. One common non-GA approach to solving this problem is known as the Nearest Neighbor algorithm, which you can apply in this manner:
- Set our current location to be the warehouse.
- While there are unvisited delivery stops, perform the following:
- Find the unvisited delivery stop that is closest to our current location.
- Move to that stop, making it the current location.
- Return to the warehouse.
The following diagrams illustrate the head-to-head results, using varying numbers of stops.
10 Delivery Stops
Nearest Neighbor Total distance: 142 ![]() |
Genetic Algorithm Total distance: 124 ![]() |
25 Delivery Stops
Nearest Neighbor Total distance: 202 ![]() |
Genetic Algorithm Total distance: 170 ![]() |
50 Delivery Stops
Nearest Neighbor Total distance: 268 ![]() |
Genetic Algorithm Total distance: 252 ![]() |
75 Delivery Stops
Nearest Neighbor Total distance: 370 ![]() |
Genetic Algorithm Total distance: 318 ![]() |
100 Delivery Stops
Nearest Neighbor Total distance: 346 ![]() |
Genetic Algorithm Total distance: 368 ![]() |
The following table summarizes the results.
# delivery stops | Nearest Neighbor Distance | Genetic Algorithm Distance |
10 | 142 | 124 |
25 | 202 | 170 |
50 | 268 | 252 |
75 | 370 | 318 |
100 | 346 | 368 |
The Nearest Neighbor algorithm performs well in situations where many locations are clustered tightly together, but can perform poorly when dealing with locations that are more widely distributed. The path calculated for 75 delivery stops is significantly longer than the path calculated for 100 delivery stops—this is an example of how the results can vary widely depending on the data. We need a deeper statistical analysis using a broader set of sample data to thoroughly compare the results of the two algorithms.
On the other hand, for the majority of test cases, the GA solution finds the shorter path, even though it could admittedly be improved with tuning. Like other ML methodologies, genetic algorithms benefit from hyperparameter tuning. The following table summarizes the hyperparameters used in our runs, and we could further tune them to improve the GA’s performance.
Hyperparameter | Value Used |
Population size | 5,000 |
Crossover rate | 50% |
Mutation rate | 10% |
Mutation method | 50/50 split between swap and displacement |
Elitism rate | 10% |
Tournament size | 2 |
Conclusion and resources
Genetic algorithms are a powerful tool to solve optimization problems, and running them using SageMaker Processing allows you to leverage the power of multiple containers at once. Additionally, you can select instance types that have useful characteristics, like multiple virtual CPUs to optimize running jobs.
If you’d like to learn more about GAs, see Genetic algorithm on Wikipedia, which contains a number of useful links. Although several GA frameworks exist, the code for a GA tends to be relatively simple (because there’s very little math) and you may be able to write the code yourself, or use the accompanying code in our GitHub repo, which includes the CloudFormation template that creates the required AWS infrastructure. Be sure to shut down the CloudFormation stack when you’re done, in order to avoid running up charges.
Although optimization problems are relatively rare compared to other ML applications like classification or regression, when you need to solve one, a genetic algorithm is usually a good option, and SageMaker Processing makes it easy.
About the Author
Greg Sommerville is a Prototyping Architect on the AWS Envision Engineering Americas Prototyping team, where he helps AWS customers implement innovative solutions to challenging problems with machine learning, IoT and serverless technologies. He lives in Ann Arbor, Michigan and enjoys practicing yoga, catering to his dogs, and playing poker.
The quantum gambit
How an Amazon quantum computing scientist won the first-ever quantum chess tournament.Read More
AWS scientists coauthor 13 QIP 2021 quantum computing papers
Researchers affiliated with Amazon Web Services’ Center for Quantum Computing are presenting their work this week at the Conference on Quantum Information Processing.Read More
Creating a BankingBot on Amazon Lex V2 Console with support for English and Spanish
Amazon Lex is a service for building conversational interfaces into any application. The new Amazon Lex V2 Console and APIs make it easier to build, deploy, and manage bots. In this post, you will learn about about the 3 main benefits of Amazon Lex V2 Console and API, basic bot building concepts, and how to create a simple BankingBot on the Amazon Lex V2 Console.
The new Amazon Lex V2 Console and API have three main benefits:
- You can add a new language to a bot at any time and manage all the languages through the lifecycle of design, test, and deployment as a single resource. The new console dashboard allows you to quickly move between different languages to compare and refine your conversations.
- The Amazon Lex V2 API follows a simplified information architecture (IA) where intent and slot types are scoped to a specific language. Versioning is performed at the bot level so that resources such as intents and slot types don’t have to be versioned individually.
- Amazon Lex V2 Console and API provides additional builder productivity tools and capabilities that give you more flexibility and control of your bot design process. For example, you can now save partially completed work as you script, test, and tune your configuration. You can also use the Conversation flow section to view the utterances and slot types for each intent.
You can access new Amazon Lex V2 Console from the AWS Management Console, the AWS Command Line Interface (AWS CLI), or via APIs. With the enhanced console and revised APIs, you can expedite building virtual agents, conversational IVR systems, self-service chatbots, or informational bots.
Basic bot concepts
Amazon Lex enables you to add self-service, natural language chatbots to your applications or devices. You can build bots to perform automated tasks such as scheduling an appointment or to find answers to frequent customer queries such as return policies. Depending on your user base, you can also configure your bot to converse in multiple languages.
In this post, you learn the basic concepts needed to create a simple BankingBot
that can handle requests such as checking account balances, making bill payments, and transferring funds. When building conversational interfaces, you need to understand five main concepts:
- Intents – An intent represents an action that the user wants to perform. This enables the bot to understand and classify what task a user is trying to accomplish. Your bot can support one or more related intents and are scoped to individual languages. For this post, our
BankingBot
is configured to understand intents in English and Spanish, such asCheckBalance
, allowing your users to check the balance in their accounts, orTransferFunds
for paying bills. - Utterances – Utterances are phrases that are used to trigger your intent. Each intent can be trained by providing a set of sample utterances. Based on these utterances, Amazon Lex can identify and invoke an intent based on natural language user input.
- Slots and slot types – Slots are input data that a bot needs to complete an action or fulfill an intent. For the
CheckBalance
intent, the bot needs information regarding which account and date of birth to verify the user’s identity. This data is captured as slots, which is used to fulfill the intents. Amazon Lex has two types of slots:- Built-in slots – These slots provide a definition of how the data is recognized and handled. For example, Amazon Lex has the built-in slot type for
AMAZON.DATE
, which recognizes words or phrases that represent a date and converts them into a standard date format (for example, “tomorrow,” “the fifth of November,” or “22 December”). - Custom slots – These slots allow you to define and manage a custom catalog of items. You can define a custom slot by providing a list of values. Amazon Lex uses these values to train the natural language understanding model used for recognizing values for the slot. For example, you can define a slot type as
accountType
with values such asChecking
,Savings
, andCredit
. You can also add synonyms for each value, such as definingVisa
as a synonym for yourCredit
account.
- Built-in slots – These slots provide a definition of how the data is recognized and handled. For example, Amazon Lex has the built-in slot type for
- Prompts and responses – These are bot messages that can be used to get information, acknowledge what the user said earlier, or confirm an action with the user before completing a transaction.
- Fulfilling the user request – As part of fulfilling the user’s request, you can configure the bot to respond with a closing response. Optionally, you can enable code hooks such as AWS Lambda functions to run business logic.
Creating the bot
Now that you know about the basic building blocks of a bot, let’s get started. We configure the BankingBot
to interact and understand the five intents in English and four intents in Spanish. We start off with a basic Welcome intent and then increase the complexity of the intent confirmations by adding custom slots, Lambda functions, and context management. The following table provides an overview of our intents.
Intent | Built-in Slots |
Custom Slots |
Context | Prompts/ Responses |
App Integration |
Welcome |
x | ||||
CheckBalance |
x | x | Lambda | ||
FollowupCheckBalance* |
x | x | x | Lambda | |
TransferFunds |
x | x | x | ||
FallbackIntent |
x |
*As of this writing, context management is only supported in US English.
To create your bot, complete the following steps:
- On the Amazon Lex V2 Console, choose Bots. If you’re in the Lex V1 Console, click on Switch to the new Lex V2 Console located in the left hand menu.
- Choose Create bot.
- For Creation method, select Create.
- For Bot name, enter
BankingBot
. - Optionally, enter a description.
- For Runtime role, select Create a new role with basic Amazon Lex permissions.
- Because this bot is only for demo purposes, it’s not subject to COPPA, so select No.
- Leave the Idle session timeout and Advanced settings at their default.
- Choose Next.
Adding languages
This sample BankingBot
is configured for both US English and US Spanish. Let’s first add US English.
- For Select language, choose English (US).
If you’re building a voice-based bot, Amazon Lex comes pre-integrated with the neural speech-to-text voices from Amazon Polly. Try them out and see what voice fits your bot.
- Choose Add another language.
- For Select language, choose Spanish (US).
- Choose Done.
Congratulations, you have successfully created your BankingBot
! Now, let’s bring it to life.
Creating intents and slots
In this section, we walk you through how to create 5 intents and related slots for your bot.
Intent 1: Welcome
At this point, the console automatically takes you into the Intent editor page, where a NewIntent
is ready for you to configure. The BankingBot
is a friendly bot, so let’s start by creating a simple Welcome intent to greet users.
- Scroll to Intent details, for Intent name, replace
NewIntent
toWelcome
.
- Under Sample utterances, choose the Plain Text tab and add the following:
Hi Hello I need help Can you help me?
- Under Closing responses, for Message, enter:
Hi! I’m BB, the BankingBot. How can I help you today?
- Choose Save intent.
- After your intent is saved, choose Build.
- Now that you’ve successfully built your first intent, choose Test and give it a try.
Intent 2: CheckBalance
Now let’s get a bit fancier with a CheckBalance
intent. This intent allows a user to check an account balance. The bot first validates the user by requesting their date of birth and then asks which account they want to check. This intent requires you to create a custom slot type, set up the intent and finally first set up a Lambda function for fulfillment.
Creating a custom slot
Now that you have added your Lambda function, you need to create a custom slot that can capture a user’s account types with valid values such as Checking
, Savings
, and Credit
before creating the intents. To create a custom slot, follow these steps:
- In the navigation pane, drill down to the English (US) version of your bot.
- Under English (US), choose Slot types.
- On the Add slot type menu, choose Add blank slot type.
- For Slot type name¸ enter
accountType
. - Choose Add.
- For Slot value resolution, select Restrict to slot values.
- Under Slot type values, add values for
Checking
,Savings
, andCredit
.
You’ve now created a custom slot type for accountType
. You can also add synonyms in the second column to help the bot recognize additional references to the Credit
slot, such as credit card
, Visa
, and Mastercard
.
- Choose Save slot type.
Congratulations! You now have your first custom slot type.
Creating the intent
Now let’s create the CheckBalance
intent. This intent allows a user to check an account balance. The bot first validates the user by requesting their date of birth and then asks which account they want to check. This intent uses the accountType
custom slot and a Lambda function for fulfillment.
- In the navigation pane, under English (US), choose Intents.
- Choose New intent.
- For Intent name, enter
CheckBalance
. - Choose Add.
- Under Intent details, for Description, add a description.
- Under Sample utterances, choose the Plain Text tab and enter the following utterances:
What’s the balance in my account? Check my account balance What’s the balance in {accountType}? How much do I have in {accountType}? I want to check the balance Can you help me with account balance? Balance in {accountType}
- Choose Save intent.
In the chatbot lifecycle, this component can be leveraged to expand the chatbot’s understanding of its users by providing additional utterances. The phrases don’t need to be an exact match for user inputs, but should be representative of real-world natural language queries.
- Under Slots, choose Add slot.
For the CheckBalance
intent, we set up two slots: account type and date of birth.
- For Name, enter
accountType
. - For Slot type, choose accountType.
- For Prompts, enter:
Sure. For which account would you like your balance?
- Choose Add.
- Choose Add slot.
- For Name, enter
dateofBirth
. - For Slot type, choose AMAZON.Date.
- For Prompts, enter For verification purposes, what is your date of birth?
- Choose Add.
- Choose Save intent.
Preparing for intent 3: FollowupCheckBalance with context
Understanding the direction and context of an ever-evolving conversation is beneficial to building natural, human-like conversational interfaces. Being able to classify utterances as the conversation develops requires managing context across multiple turns. Consider when a user wants to follow up and check their account balance in a different account. You don’t want the bot to ask the user for their date of birth again. You want the bot to understand the context of the question and carry over the date of birth slot value from this intent into the follow-up intent.
To prepare for the third BankingBot
intent, FollowupCheckBalance
, you need to preserve this CheckBalance
context as an output for future use.
- Under Contexts, for Output contexts, choose New Context tag.
- For Context tag name, enter
contextCheckBalance
. - Choose Add.
Now your context is stored for future use.
- Under Code hooks, select Use a Lambda function for fulfillment. To create the Lambda function, please follow the instructions in Appendix B below.
- Choose Save Intent.
- Choose Build.
- After the bot building process is complete, you can test the intent by choosing Test.
You can also use the Conversation flow section to view the current state of your conversation flow and links to help you quickly get to that specific utterance, slot, or prompt.
You learn how to create an intent with prompts and closing responses in the fourth intent, TransferFunds
.
Intent 3: FollowupBalance
Next, we create a FollowupBalance
intent, where the user might ask what the balance is for a different account. With this intent, you want to use the context management feature and utilize the context that you set up earlier with the CheckBalance
intent.
- On the Intent editor page, under Intents, choose Add.
- Choose Add empty intent.
- For Intent name¸ enter
FollowupBalance
. - Choose Add.
- For Description, enter:
Intent to provide detail of expenses made for an account over a period of time.
- In the Contexts section, for Input contexts, choose the context you just created in the
CheckBalance
intent.
- Under Sample utterances, on the Plain text tab, enter the following sample utterances:
How about my {accountType} account What about {accountType} And in {accountType}?
- In the Slots section, choose Add slot.
- For Name, enter
accountType
. - For Slot type, choose accountType.
- For Prompts, enter:
You’d like the balance for which account?
- Choose Add.
Next, you create a second slot.
- Choose Add slot.
- For Name, enter
dateofBirth
. - For Slot type, choose AMAZON.Date.
- For Prompts, enter:
For verification purposes. What is your date of birth?
- Choose Add.
- In the Slots section, open the
dateofBirth
slot and choose Advanced options. - Under Default values, enter the context and slot value for the
CheckBalance
intent
(contextCheckBalance.dateofBirth
). - Choose Add default value.
- Choose Save.
- In the Code hooks section, select Use a Lambda function for fulfillment.
- Choose Save intent.
- Choose Build.
- When your
BankingBot
is built, choose Test and try theFollowupBalance
intent and can see if thedateofBirth
slot from theCheckBalance
intent is used.
Intent 4: TransferFunds
The TransferFunds
intent offers the functionality of moving funds from one account to a target account. In this intent, you learn how to create two different slots using the same slot type and how to configure confirmation prompts and declines.
- On the Intent editor page, under Intents, choose Add.
- Choose Add empty intent.
- For Name, enter
TransferFunds
. - Choose Add.
- For the intent description, enter:
Help user transfer funds between bank accounts
- Under Sample Utterances, on the Plain Text tab, enter the following:
I want to transfer funds Can I make a transfer? I want to make a transfer I'd like to transfer {transferAmount} from {sourceAccountType} to {targetAccountType} Can I transfer {transferAmount} to my {targetAccountType} Would you be able to help me with a transfer? Need to make a transfer
Next, we create the transferAmount
slot.
- Choose Add slot.
- For Name, enter
transferAmount
. - For Slot type, choose AMAZON.Number.
- For Prompts, enter:
How much would you like to transfer?
- Choose Add.
Next, create the sourceAccountType
slot.
- Choose Add slot.
- For Name, enter
sourceAccountType
. - For Slot type¸ choose accountType.
- For Prompts¸ enter:
Which account would you like to transfer from?
- Choose Add.
Next, create the targetAccountType
slot.
- Choose Add slot.
- For Name, enter
targetAccountType
. - For Slot type¸ choose accountType.
- For Prompts¸ enter:
Which account are we transferring to?
- Choose Add.
- Under Prompts, for Confirmation prompts, enter:
Got it. So we are transferring {transferAmount} from {sourceAccountType} to {targetAccountType}. Can I go ahead with the transfer?
- For Decline responses, enter:
The transfer has been cancelled.
- Under Closing responses, for Message, enter:
The transfer is complete. {transferAmount} should now be available in your {targetAccountType} account.
- Choose Save intent.
- Choose Build.
- Choose Test.
Intent 5: FallbackIntent
Your last intent is the fallback intent, which is used when the bot can’t understand or identify a specific intent. It serves as a catchall intent and can also be used to route the conversation to a human agent for more assistance.
- On the Intents list, choose FallbackIntent.
- Under Closing responses¸ for Message, enter:
Sorry I am having trouble understanding. Can you describe what you'd like to do in a few words? I can help you find your account balance, transfer funds and make a payment.
- Choose Save intent.
- Choose Build.
Configuring the bot for Spanish
Amazon Lex V2 Console also allows you add multiple languages to a bot. Each language has an independent set of intents and slot types. Each intent follows the same structure as its English counterpart.
Intent 1: Welcome (Spanish)
To create the Welcome
intent in Spanish, complete the following steps:
- In the navigation pane, under Spanish (US), choose Intents.
- Choose NewIntent.
- For Intent name, enter
Welcome
. - Choose Add.
- Under Sample utterances, on the Plain text tab, enter the following:
Hola Necesito ayuda Me podría ayudar?
- Under Closing responses, for Message, enter:
Bienvenido! Puedo ayudarle con tareas como chequear balance o realizar un pago. Cómo puedo ayudarle hoy?
- Choose Save intent.
- Choose Build.
Intent 2: CheckBalance (Spanish)
To create the CheckBalance
intent, you first need to create the accountType
custom slot type as you did for the English bot.
- Under Spanish (US), choose Slot types.
- On the Add slot type menu, choose Add a blank slot type.
- For Slot type name, enter
accountType
. - Choose Add.
- In the Slot value resolution section, select Restrict to slot values.
- Add the Spanish slot values
Cheques
(Checking),Ahorro
(Savings), andCrédito
(Credit).
- Choose Save slot type.
You have now created the accountType
custom slot.
- In the navigation pane, under Spanish (US), choose Intents.
- On the Add intent menu, choose Add empty intent.
- For Intent name, enter
CheckBalance
. - Choose Add.
- For Description, enter:
Intent to check balance in the specified account
- Under Sample utterances, on the Plain text tab, enter the following:
Cuál es el balance en mi cuenta? Verificar balance en mi cuenta Cuál es el balance en la cuenta {accountType} Cuál es el balance en {accountType} Cuánto hay en {accountType} Quiero verificar el balance Me podría ayudar con el balance de mi cuenta? Balance en {accountType}
- Choose Add slot.
- For Name, enter
accountType
. - For Slot type, choose accountType.
- For Prompts, enter:
Por supuesto. De qué cuenta le gustaría conocer el balance?
- Choose Add slot.
- For Name, enter
dateofBirth
. - For Slot type, choose AMAZON.Date.
- For Prompts, enter:
Por supuesto. Por motivos de verificación. Podría por favor compartir su fecha de nacimiento?
- Choose Add.
- Under Code hooks, select Use a Lambda function for fulfillment. For the Spanish version of your bot, you will need a new Lambda function for Spanish. To create a Lambda function, please follow the instructions in Appendix B below.
- Choose Save intent.
- Choose Build.
Intent 3: TransferFunds (Spanish)
Like the English version, the TransferFunds
intent offers the functionality of moving funds from one account to another. This intent allows you to work with two slots of the same type and configure confirmations and prompts.
- Create a new intent and name it
TransferFunds
. - For the intent description, enter:
Intent to transfer funds between checking and savings accounts.
- Under Sample utterances, on the Plain Text tab, enter the following:
Quisiera transferir fondos Puedo realizar una transferencia? Necesito hacer una transferencia. Quisiera transferir {transferAmount} desde {sourceAccountType} hacia {targetAccountType} Puedo transferir {transferAmount} hacia {targetAccountType} ? Can I transfer {transferAmount} to my {targetAccountType} Necesito ayuda con una transferencia. Me ayudaría a realizar una transferencia? Necesito realizar una transferencia.
Next, create the transferAmount
slot.
- Choose Add slot.
- For Name, enter
transferAmount.
- For Slot Type, choose AMAZON.Number.
- For Prompts,
Qué monto desea transferir?
- Choose Add.
Next, create the sourceAccountType
slot.
- Choose Add slot.
- For Name, enter
sourceAccountType
. - For Slot type, choose accountType.
- For Prompts, enter
Desde qué cuenta desea iniciar la transferencia?
- Choose Add.
Next, create the targetAccountType
slot.
- Choose Add slot.
- For Name, enter
targetAccountType
. - For Slot type, choose accountType.
- For Prompts, enter
Hacia qué cuenta desea realizar la transferencia?
- Choose Add.
- Under Prompts, for Confirmation prompts, enter:
Usted desea transferir {transferAmount} dólares desde la cuenta {sourceAccountType} hacia la cuenta {targetAccountType}. Puedo realizar la transferencia?
- For Decline responses, enter:
No hay problema. La transferencia ha sido cancelada.
- Under Closing responses, for Message, enter:
La transferencia ha sido realizada. {transferAmount} deberían estar disponibles en su cuenta {targetAccountType}.
- Choose Save intent.
- Choose Build.
Conclusion
Congratulations! You have successfully built a BankingBot
that can check balances, transfer funds, and properly greet a customer. You also seen how easy it is to add and manage new languages. Additionally, the Conversation flow section lets you view and jump to different parameters of the conversation as you build and refine the dialogue for each of your intents.
To learn more about Amazon Lex V2 Console and APIs, check out the following resources:
- Amazon Lex Introduces an Enhanced Console Experience and new V2 APIs
- Amazon Lex V2 Console Developer Guide
- How to deliver natural conversations with Amazon Lex streaming APIs
Also, you could give your bot the ability to reply to natural language questions by integrating it with Amazon Kendra. For more information, see Integrate Amazon Kendra and Amazon Lex using a search intent.
Appendix A: Bot configuration
This example bot contains five intents that allow a user to interact with the financial institution and perform the following tasks:
- Welcome – Intent to greet users
- CheckBalance – Intent to check balance in the specified account
- FollowupBalance – Intent to provide detail of expenses made for an account over a period of time
- TransferFunds – Intent to transfer funds between checking and savings accounts
- FallbackIntent – Default intent to respond when no other intent matches user input
Intents details: English
- Welcome configuration
- Description: Intent to greet users
- Sample utterances:
- Hi
- Hello
- I need help
- Can you help me?
- Closing response: Hi! I’m BB, the BankingBot. How can I help you today?
- CheckBalance configuration
- Description: Intent to check balance in the specified account
- Sample utterances:
- What’s the balance in my account?
- Check my account balance
- What’s the balance in {accountType}?
- How much do I have in {accountType}?
- I want to check the balance
- Can you help me with account balance?
- Balance in {accountType}
- Slots:
- accountType:
- Custom slot type: accountType
- Prompt: For which account would you like to check the balance?
- dateofBirth:
- Built-in slot type: AMAZON.Date
- Prompt: For verification purposes, what is your date of birth?
- accountType:
- Context tag:
- contextCheckBalance
- Output contexts
- contextCheckBalance
- Closing response: Response comes from the fulfillment by the Lambda function.
- FollowupBalance configuration
- Description: Intent to provide detail of expenses made for an account over a period of time.
- Sample utterances:
- How about my {accountType} account
- What about {accountType}
- And in {accountType}?
- how about {accountType}
- Slots:
- dateofBirth:
- Built-in slot type: AMAZON.Date (default: #CheckBalance.dateOfBirth)
- Prompt: For verification purposes. What is your date of birth?
- accountType:
- Custom slot type: accountType
- Prompt: Which account do you need the balance details for? (default:
#contextCheckBalance.dateofBirth
)
- dateofBirth:
- Closing response: Response comes from the fulfillment Lambda function.
- TransferFunds configuration
- Description: Intent to transfer funds between checking and savings accounts.
- Sample utterances:
- I want to transfer funds
- Can I make a transfer?
- I want to make a transfer
- I’d like to transfer {transferAmount} from {sourceAccountType} to {targetAccountType}
- Can I transfer {transferAmount} to my {targetAccountType}
- Would you be able to help me with a transfer?
- Need to make a transfer
- Slots:
- sourceAccountType:
- Custom slot type: accountType
- Prompt: Which account would you like to transfer from?
- targetAccountType:
- Custom slot type: accountType
- Prompt: Which account are we transferring to?
- transferAmount:
- Built-in slot type: AMAZON.Number
- Prompt: What amount are we transferring today?
- sourceAccountType:
- Confirmation prompt: Got it. So we are transferring {transferAmount} dollars from {sourceAccountType} to {targetAccountType}. Can I go ahead with the transfer?
- Decline response: Sure. The transfer has been cancelled.
- Closing response: The transfer is complete. {transferAmount} should now be available in your {targetAccountType} account.
- FallbackIntent configuration
- Description: Default intent to respond when no other intent matches user input.
- Closing response: Sorry I am having trouble understanding. Can you describe what you’d like to do in a few words? I can help you with account balance, transfer funds and payments.
Intent details: Spanish
- AccountBalance configuration
- Description: Intent to check balance in the specified account.
- Sample utterances:
- Cuál es el balance en mi cuenta?
- Verificar balance en mi cuenta
- Cuál es el balance en la cuenta {accountType}
- Cuál es el balance en {accountType}
- Cuánto hay en {accountType}
- Quiero verificar el balance
- Me podría ayudar con el balance de mi cuenta?
- Balance en {accountType}
- Slots:
- accountType:
- Custom slot type: Restrict values to Cheques, Ahorros, and Crédito
- Prompt: Por supuesto. De qué cuenta le gustaría conocer el balance?
- dateofBirth:
- Built-in slot type: AMAZON.Date (default: #CheckBalance.dateOfBirth)
- Prompt: Por supuesto. Por motivos de verificación. Podría por favor compartir su fecha de nacimiento?
- accountType:
- Closing response: Response comes from the fulfillment Lambda function.
- TransferFunds configuration
- Description: Intent to transfer funds between checking and savings accounts.
- Sample utterances:
- Quisiera transferir fondos
- Puedo realizar una transferencia?
- Necesito hacer una transferencia.
- Quisiera transferir {transferAmount} desde {sourceAccountType} hacia {targetAccountType}
- Puedo transferir {transferAmount} hacia {targetAccountType} ?
- Necesito ayuda con una transferencia.
- Me ayudaría a realizar una transferencia?
- Necesito realizar una transferencia.
- Slots:
- sourceAccountType:
- Custom slot type: Restrict Values to Checking, Savings, and Credit
- Prompt: Desde qué cuenta desea iniciar la transferencia?
- targetAccountType:
- Custom slot type: Restrict Values to Checking, Savings, and Credit
- Prompt: Hacia qué cuenta desea realizar la transferencia?
- transferAmount:
- Built-in slot type: AMAZON.Number
- Prompt: Qué monto desea transferir?
- sourceAccountType:
- Confirmation prompt: Entendido. Usted desea transferir {transferAmount} dólares desde la cuenta {sourceAccountType} hacia la cuenta {targetAccountType}. Puedo realizar la transferencia?
- Decline response: No hay problema. La transferencia ha sido cancelada.
- Closing response: La transferencia ha sido realizada. {transferAmount} deberían estar disponibles en su cuenta {targetAccountType}
- FallbackIntent configuration
- Description: Default intent to respond when no other intent matches user input
- Closing response: Lo siento, no he entendido. En pocas palabras, podría describer que necesita hacer? Puedo ayudarlo con balance de cuenta, transferir fondos y pagos.
Appendix B: Creating a Lambda function
In Amazon Lex V2 Console, you use a single Lambda function as a fulfillment mechanism for all of your intents. This function is defined in the Alias language support page. You will need to create a separate Lambda function for each language you have specified for your bot. For this example bot, you will need to create a unique Lambda function for the English US and Spanish US language in your bot.
- In the top left corner, click on the Services drop down menu and select Lambda in the Compute section.
- On the Lambda console, choose Functions.
- Choose Create function.
- Select Author from scratch.
- For Function name, enter a name. For the English version, enter
BankingBotEnglish
for the Function name. For the Spanish version, enterBankingBotSpanish
. - For Runtime, choose Python 3.8.
- Choose Create function.
- In the Function code section, choose lambda_function.py.
- Download the Lambda BankingBotEnglish or BankingBotSpanish code for the specific language and open it in a text editor.
- Copy the code and replace the current function code with the Lambda
BankingBotEnglish
code orBankingBotSpanish
code for the respective language. - Choose Deploy.
Adding the Lambda function to your language
Now you have set up your Lambda function. In Amazon Lex V2 Console, Lambda functions are defined at the bot alias level. Follow these steps to set up your bot to use a Lambda function:
- On the Amazon Lex V2 Console, in the navigation pane, under your bot, choose Aliases.
- Choose TestBotAlias.
- For Languages, select English (US) or Spanish (US) depending on which fulfillment Lambda function you are creating.
- For Source, choose
BankingBotEnglish
orBankingBotSpanish
as your source depending on which language you are configuring. - For Lambda function version or alias, choose your function.
- Choose Save.
Now your Lambda function is ready to work with your BankingBot intents.
About the Author
Juan Pablo Bustos is an AI Services Specialist Solutions Architect at Amazon Web Services, based in Dallas, TX. Outside of work, he loves spending time writing and playing music as well as trying random restaurants with his family.
As a Product Manager on the Amazon Lex team, Harshal Pimpalkhute spends his time trying to get machines to engage (nicely) with humans.
Esther Lee is a Product Manager for AWS Language AI Services. She is passionate about the intersection of technology and education. Out of the office, Esther enjoys long walks along the beach, dinners with friends and friendly rounds of Mahjong.
Using Amazon Translate to provide language support to Amazon Kendra
Amazon Kendra is a highly accurate and easy-to-use intelligent search service powered by machine learning (ML). Amazon Kendra supports English. This post provides a set of techniques to provide non-English language support when using Amazon Kendra.
We demonstrate these techniques within the context of a question-answer chatbot use case (Q&A bot) where a user can submit a question in any language that Amazon Translate supports through the chatbot. Amazon Kendra searches across a number of documents and returns a result in the language of that query. Amazon Comprehend and Amazon Translate are essential to providing non-English language support.
Our Q&A bot implementation relies on Amazon Simple Storage Service (Amazon S3) to store the documents prior to their ingestion into Amazon Kendra, Amazon Comprehend to detect the query’s dominant language to enable proper query and response translation, Amazon Translate to translate the query and response to and from English, and Amazon Lex to build the conversational user interface and provide the conversational interactions.
All queries, except for English, are translated from their native language into English before being submitted to Amazon Kendra. The Amazon Kendra responses a user sees are also translated. We have stored predefined Spanish response translations while performing real-time translation on all other languages. We use metadata attributes associated with each ingested document to point to the predefined Spanish translations.
We use three use cases to illustrate these techniques and assume that all the languages needing to be translated are supported by Amazon Translate. First, for Spanish language users, each document (we use small documents for the Q&A bot scenario) is translated by Amazon Translate into Spanish and has human vetting. This pre-translation is relevant as a description for Amazon Kendra document ranking model results.
Second, on-the-fly translation of the reading comprehension model responses occurs for all language responses except for English. On-the-fly translation occurs for the document ranking model results except for English and Spanish. We go into more detail on how to implement on-the-fly translation for Amazon Kendra’s different models later in this post.
Third, for English speaking users, translation doesn’t occur, allowing both the query and Amazon Kendra’s responses to be passed to and from Amazon Kendra without change.
The following exchange illustrates the three use cases. We start with English followed by Spanish, French, and Italian.
Translation considerations and prerequisites
We perform the following steps on the document:
- Run the document through Amazon Translate to get a Spanish language version of the document as well as the title.
- Manually review the translation and make any changes desired.
- Create a metadata file where one of the attributes is the Spanish translation of the document.
- Ingest the English language document and the associated metadata file into Kendra.
The following code is the metadata file for the document:
{
"Attributes": {
"_created_at": "2020-10-28T16:48:26.059730Z",
"_source_uri": "https://aws.amazon.com/kendra/faqs/",
"spanish_text": "R: Amazon Kendra es un servicio de búsqueda empresarial muy preciso y fácil de usar que funciona con Machine Learning.
"spanish_title": "P: ¿Qué es Amazon Kendra?"
},
"Title": "Q: What is Amazon Kendra?",
"ContentType": "PLAIN_TEXT"
}
In this case, we have some predefined attributes, such as _created_a
t and _source_uri
, as well as custom attributes such as spanish_text
and spanish_title
.
In the case of queries in Spanish, you use these attributes to build the response to send back to the user. The fact that the title of the document is in itself a possible user query allows you to have control over the translations.
If your documents are in another language, you need to run Amazon Translate to translate the documents into English before ingestion into Amazon Kendra.
We have not tried translation in other scenarios where the document types and answers can vary widely. However, we believe that the techniques shown in this post allow you to try translation in other scenarios and evaluate the accuracy.
Amazon Kendra processing overview
Now that we have the documents squared away, we build a chatbot using Amazon Lex. The chatbot identifies the language using Amazon Comprehend, translates the query from the user’s language to English, submits a query to the Amazon Kendra index, and translates the result back to the language the query was in. You can apply this approach to any language that Amazon Translate supports.
We use the Amazon Kendra built-in Amazon S3 connector to ingest documents and the Amazon Kendra FAQ ingestion process for getting question-answer pairs into Amazon Kendra. The ingested documents are in English. We manually created a description of each document in Spanish and attached that Spanish description as a metadata attribute. Ideally, all the documents that you use are in English.
If these documents have an overview section, you can use Amazon Translate as the method of generating this metadata description attribute. If your documents are in another language, you need to run Amazon Translate to translate the documents into English before ingestion into Amazon Kendra. The following diagram illustrates our architecture.
We use the Amazon Kendra built-in Amazon S3 connector to ingest documents. If you also have FAQ documents, you also use the Amazon Kendra FAQ ingestion process.
Setting up your resources
In this section, we discuss the steps needed to implement this solution. See the appendix for details on the specifics of these steps. The AWS Lambda function is critical in order to understand where and how to implement the translation. We go into further details on the translation specifics in the next section.
- Download the documents and metadata files, decompress the archive, and store them in an S3 bucket. You use this bucket as the source for your Amazon Kendra S3 connector.
- Set up Amazon Kendra:
- Create an Amazon Kendra index. For instructions, see Getting started with the Amazon Kendra SharePoint connector.
- Create an Amazon Kendra S3 data source.
- Add attributes.
- Ingest the example data source from Amazon S3 into Amazon Kendra.
- Set up the fulfillment Lambda function.
- Set up the chatbot.
Understanding translation in the fulfillment Lambda function
The Lambda function has been structured into three main sections to process and respond to the user’s query: language detection, submitting a query, and returning the translated result.
Language detection
In the first section, you use Amazon Comprehend to detect the dominant language. For this post, we obtain the user input from the key inputTranscript
part of the event submitted by Amazon Lex. Also, if Amazon Comprehend doesn’t have enough confidence in the language detected, it defaults to English. See the following code:
query = event['inputTranscript']
response = comprehend.detect_dominant_language(Text = query)
confidence = response["Languages"][0]['Score']
if confidence > 0.50:
language = response["Languages"][0]['LanguageCode']
else:
#Default to english if there isn't enough confidence
language = "en"
Submitting a query
Amazon Kendra currently supports documents and queries in English, so in order to submit your query, you have to translate it.
In the provided example code, after identifying the dominant language, and depending on the language, you translate the query to English. It’s worth noting that we can do a simple check if the language is English or not. For illustration purposes, I include the option of matching Spanish or a different language.
if language == "en":
pass
elif language == "es":
translated_query = translate.translate_text(Text=query, SourceLanguageCode="es", TargetLanguageCode="en")
query = translated_query['TranslatedText']
else:
try:
translated_query = translate.translate_text(Text=query, SourceLanguageCode=language, TargetLanguageCode="en")
query = translated_query['TranslatedText']
except Exception as e:
return(str(e))
Now that your query is in English, you can submit the query to Amazon Kendra:
response=kendra.query(
QueryText = query,
IndexId = index_id)
There are several options on how to work with the result from Amazon Kendra. For more information, see Analyzing the results in the Amazon Kendra Essentials Workshop. As a chatbot use case, we only work with the first result.
If the first result is from the reading comprehension model (result type Answer
) and the language code is different than en
(English), you translate the DocumentExcerpt
, which is the value to be returned. See the following code:
answer_text = query_result['DocumentExcerpt']['Text']
if language == "en":
pass
else:
result = translate.translate_text(Text=answer_text, SourceLanguageCode="en", TargetLanguageCode=language)
answer_text = result['TranslatedText']
If the first result is from the document ranking model (result type Document), you might recall that in the introduction, we have pre-translated the Spanish language results and stored that in the document metadata for Spanish language documents.
The following code shows that:
- If the language code is
es
(Spanish), the pre-translated content stored in the metadata fieldsynopsis
is returned. - If the language code is
en
(English), theDocumentExcerpt
value returned by Amazon Kendra is returned as is. - If the language code is neither
es
oren
, the content ofDocumentExcerpt
is translated to the language detected and returned.if language == "es": if key['Key'] == 'spanish_text': synopsis = key['Value']['StringValue'] answer_text = synopsis if key['Key'] == 'spanish_title': document_title = key['Value']['StringValue'] print('Title: ' + document_title) elif language == "en": document_title = query_result['DocumentTitle']['Text'] answer_text = query_result['DocumentExcerpt']['Text'] else: #Placeholder to translate the title if needed #document_title = query_result['DocumentTitle']['Text'] #result = translate.translate_text(Text=document_title, SourceLanguageCode="en", TargetLanguageCode=language) #document_title = result['TranslatedText'] answer_text = query_result['DocumentExcerpt']['Text'] result = translate.translate_text(Text=answer_text, SourceLanguageCode="en", TargetLanguageCode=language) answer_text = result['TranslatedText'] response = answer_text return response
Returning the result
At this point, if you obtained a result, you should have it the language that the question was asked. The last portion of the Lambda function is to return the result to Amazon Lex for it to be passed on to the user’s conversational user interface:
if result == "":
no_matches = "I'm sorry, I couldn't find matches for your query"
result = translate.translate_text(Text=no_matches, SourceLanguageCode="en", TargetLanguageCode=language)
result = result['TranslatedText']
else:
#Truncate Text
if len(result) > 340:
result = result[:340]
result = result.rsplit(' ', 1)
result = result[0]+"..."
response = {
"dialogAction": {
"type": "Close",
"fulfillmentState": "Fulfilled",
"message": {
"contentType": "PlainText",
"content": result
},
}
}
Conclusion
We have demonstrated a few techniques that you can use to enable Amazon Kendra to provide support for languages other than English. We recommend doing a small pilot and accuracy POC on ground truth questions and answers to determine if these techniques can enable your non-English language use cases.
To follow an interactive tutorial that can help you get started with Amazon Kendra visit our Amazon Kendra Essentials+ Workshop. You can also visit the Amazon Kendra website to dive deep on features, connectors, videos and more.
Appendix
In the above sections of this post we covered translation in Amazon Kendra for the reading comprehension and document ranking models. Below, we will cover translation in Amazon Kendra for FAQ matching.
Translations for the FAQ model
For Amazon Kendra FAQ matching, you can use either real-time or pre-translated responses. Pre-translated responses with human vetting likely provide better results. For pre-translated responses, complete the following steps:
- Create one row per language desired for each question.
- Create a language attribute that specifies what language the answer is in.
- Place the pre-translated response into the FAQ answer column.
- Use the language attribute as a query filter.
Pre-translation considerations
This chatbot use case has documents with a small amount of text. This allows us to place the pre-translated document into an attribute. For larger files, we place pre-translated document summaries into the attribute instead. This allows us to return vetted summaries in the native language for each document ranking result. We can continue to use real-time translation for the reading comprehension model passages and suggested answers.
Pre-translation is only effective for the document ranking model and the FAQ model. The reading comprehension model doesn’t return associated attributes. The lack of attributes prevents the use of pre-translated content with the reading comprehension model and requires instead that you use on-the-fly translation for the reading comprehension model results.
Creating an Amazon Kendra data source and adding attributes
For this use case, we use two custom attributes that contain the revised translations to Spanish. These attributes are called spanish_title
and spanish_text
.
To add them into your index, follow these steps:
- On the Amazon Kendra console, on your new index, under Data management, choose Facet definition.
- Choose Add field.
- For Field name, enter your name (
spanish_text
). - For Data type, choose String.
- For Usage types, select Displayable.
- Choose Add.
- Repeat the process for the field
spanish_title
.
Ingesting the example dataset
Now that you have an Amazon Kendra index, the custom index fields, and the sample documents into your S3 bucket, you create an S3 data source.
- On the Amazon Kendra console, on your new index, under Data management, choose Data sources.
- Choose Add connector.
- For My data source name, enter a name (for example,
MyS3Connector
). - Choose Next.
- For Enter the data source location, enter the location of your S3 bucket.
- For IAM role, choose Create a new role.
- For Role name, enter a name for your role.
- For Frequency, choose Run on demand.
- Choose Next.
- Validate your settings and choose Add data source.
- When the process is complete, you can sync your data source by choosing Sync now.
At this point you can test a sample query by on the search console. For example, the following screenshot shows the results for the question “what is Amazon Kendra?”
Setting up the fulfillment Lambda function
For this use case, the multilingual chatbot requires a Lambda function to query the index as well as perform the translations if needed.
- On the Lambda console, choose Create function.
- Select Author from scratch.
- For Function name, enter a name.
- For Runtime, choose the latest Python version available.
- For Execution role, select Create a new role with basic Lambda permissions.
- Choose Create function.
- After creating the function, on the Permissions tab, choose your role to edit it.
- On the IAM console, choose Add inline policy.
- On the JSON tab, update the following policy to include your Amazon Kendra index ID (you can obtain it on the Amazon Kendra console in the Index section):
{ "Version": "2012-10-17", "Statement": [ { "Sid": "KendraQueries", "Effect": "Allow", "Action": "kendra:Query", "Resource": "arn:aws:kendra:<YOUR_REGION>:<YOUR_AWS_ACCOUNT_IT>:index/<YOUR_AMAZON_KENDRA_INDEX_ID>" }, { "Sid": "ComprehendTranslate", "Effect": "Allow", "Action": [ "comprehend:DetectDominantLanguage", "translate:TranslateText" ], "Resource": "*" } ] }
- Choose Review policy.
- For Name, enter a name.
- Choose Create policy.
- In the Lambda configuration, enter the following code into the function code (update your
index_id
). The code is also available to download.""" Lexbot Lambda handler. """ from urllib.request import Request, urlopen import json import boto3 kendra = boto3.client('kendra') #Define your Index ID index_id=<YOUR_AMAZON_KENDRA_INDEX_ID> region = 'us-east-1' translate = boto3.client(service_name='translate', region_name=region, use_ssl=True) comprehend = boto3.client(service_name='comprehend', region_name=region, use_ssl=True) def query_index(query, language): print("Query: "+query) if language == "en": pass elif language == "es": translated_query = translate.translate_text(Text=query, SourceLanguageCode="es", TargetLanguageCode="en") query = translated_query['TranslatedText'] else: try: translated_query = translate.translate_text(Text=query, SourceLanguageCode=language, TargetLanguageCode="en") query = translated_query['TranslatedText'] except Exception as e: return(str(e)) response=kendra.query( QueryText = query, IndexId = index_id) print(response) #Return just the first result for query_result in response['ResultItems']: #Reading comprehension result if query_result['Type']=='ANSWER': url = query_result['DocumentURI'] answer_text = query_result['DocumentExcerpt']['Text'] if language == "en": pass else: result = translate.translate_text(Text=answer_text, SourceLanguageCode="en", TargetLanguageCode=language) answer_text = result['TranslatedText'] response = answer_text return response #Document Ranking result if query_result['Type']=='DOCUMENT': if query_result['ScoreAttributes']['ScoreConfidence'] == "LOW": response = "" return(response) else: synopsis = "" document_title = "" answer_text= "" url = "" for key in query_result['DocumentAttributes']: if language == "es": if key['Key'] == 'spanish_text': synopsis = key['Value']['StringValue'] answer_text = synopsis if key['Key'] == 'spanish_title': document_title = key['Value']['StringValue'] print('Title: ' + document_title) elif language == "en": document_title = query_result['DocumentTitle']['Text'] answer_text = query_result['DocumentExcerpt']['Text'] else: #Placeholder to translate the title if needed #document_title = query_result['DocumentTitle']['Text'] #result = translate.translate_text(Text=document_title, SourceLanguageCode="en", TargetLanguageCode=language) #document_title = result['TranslatedText'] answer_text = query_result['DocumentExcerpt']['Text'] result = translate.translate_text(Text=answer_text, SourceLanguageCode="en", TargetLanguageCode=language) answer_text = result['TranslatedText'] response = answer_text return response def lambda_handler(event, context): if(len(event['inputTranscript']) < 3): result = "Please try again" else: query = event['inputTranscript'] response = comprehend.detect_dominant_language(Text = query) confidence = response["Languages"][0]['Score'] if confidence > 0.50: language = response["Languages"][0]['LanguageCode'] else: #Default to english if there isn't enough confidence language = "en" result = query_index(query, language) if result == "": no_matches = "I'm sorry, I couldn't find matches for your query" result = translate.translate_text(Text=no_matches, SourceLanguageCode="en", TargetLanguageCode=language) result = result['TranslatedText'] else: #Truncate Text if len(result) > 340: result = result[:340] result = result.rsplit(' ', 1) result = result[0]+"..." response = { "dialogAction": { "type": "Close", "fulfillmentState": "Fulfilled", "message": { "contentType": "PlainText", "content": result }, } } print('result = ' + str(response))
- Choose Deploy.
Setting up the chatbot
The chatbot that you create for this use case uses Lambda to fulfill the requests. Essentially, you create a fallback intent and pass the user input to the Lambda function.
To set up a chatbot on the console, complete the following steps:
- On the Amazon Lex console, under Bots, choose Create.
- Choose Custom bot.
- For Bot name, enter a name.
- For Language, choose English (US).
- Leave the other options at their defaults.
- Choose Create.
For this post, we use the fallback intent to process the queries sent to Amazon Kendra. First we need to create an intent.
- Choose Create intent.
- Enter a name for your intent and choose Add.
- Under Sample utterances, enter some sample utterances.
- Under Response, enter an example answer.
- Choose Save Intent.
Now you can build and test your bot (see the following screenshot).
- To import the fallback intent, next to Intents, choose the + icon.
- Choose Search existing intents.
- Search for and choose the built-in intent
AMAZON.FallbackIntent
.
- Enter a name.
- Choose Add.
- For Fulfillment, select AWS Lambda function.
- For Lambda function, choose the function you created.
- Choose Save Intent.
Now you disable the clarification questions so you can use the fallback intent on the first attempt.
- Under Error handling, deselect Clarification prompts.
- Choose Save.
- Choose Build.
Testing
After the bot building process is complete, you can test your bot directly on the Amazon Kendra console.
Now we issue the same query in French (“Qu’est-ce qu’Amazon Kendra?”) and we get the response back in French.
If you want to test your chatbot as a standalone web application, see Sample Amazon Lex Web Interface on GitHub.
You can also test the Amazon Lex integration with Slack or Facebook Messenger.
About the Author
Juan Bustos is an AI Services Specialist Solutions Architect at Amazon Web Services, based in Dallas, TX. Outside of work, he loves spending time writing and playing music as well as trying random restaurants with his family.
David Shute is a Senior ML GTM Specialist at Amazon Web Services focused on Amazon Kendra. When not working, he enjoys hiking and walking on a beach.
Using the AWS DeepRacer new Soft Actor Critic algorithm with continuous action spaces
AWS DeepRacer is the fastest way to get started with machine learning (ML). You can train reinforcement learning (RL) models by using a 1/18th scale autonomous vehicle in a cloud-based virtual simulator and compete for prizes and glory in the global AWS DeepRacer League.
We’re excited to bring you two new features available on the AWS DeepRacer console: a new RL algorithm called Soft Actor Critic (SAC) and a new way of defining your action space called continuous action space. Understanding how SAC and continuous action space work will let you come up with new strategies to top the AWS DeepRacer League. This post walks you through the unique features of the SAC algorithm and how to use it with continuous action space. By the end, you will learn how to use continuous action space and be ready to train your first SAC RL model on the AWS DeepRacer console.
Reviewing the fundamentals
Let’s first review some fundamental RL concepts that give us a foundation to dive deeper into SAC. The objective of RL models is to maximize total reward, which is done by exploring the environment. In the case of the AWS DeepRacer, the environment is the track that you choose to train your model on. The agent, which for AWS DeepRacer is the car, explores the environment by following a policy. A policy determines the action the agent takes after observing the environment (for example turning left, forward, or right). AWS DeepRacer observes the environment by using image data or a combination of image and LIDAR data.
As the agent explores the environment, the agent learns a value function. We can think of the value function as a way to judge how good an action taken is, after observing the environment. The value function uses the reward function that you write in the AWS DeepRacer console to score the action. For example, if we choose the “follow the center line” sample reward function in the AWS DeepRacer console, a good action keeps the agent near the center of the track and is scored higher than a bad action, which moves the agent away from the center of the track.
Over time, the value function helps us learn policies that increase the total reward. To learn the optimal or best policy, we balance the amount of time we spend exploring the environment versus the amount of time we spend exploiting what our policy has learned over time. For example, if we consider the “follow the center line” sample reward function, we first take random actions to explore the environment, meaning that our agent doesn’t do a very good job at staying in the center of the track. Over time, the agent learns which actions keep it near the center of the track, but if we keep taking random actions, it takes a long time to learn how to stay at the center of the track for the entire lap. So as the policy begins to learn the good actions, we begin to use those actions instead of taking random actions. However, if we always use or exploit the good actions, we never learn anything new because we fail to explore the environment. This trade-off is often referred to as the “exploration vs. exploitation” problem in RL.
What’s new with SAC?
Now that we have the fundamental RL concepts down, let’s look at how SAC works and how it compares to the other algorithm available on the AWS DeepRacer console, Proximal Policy Optimization (PPO).
There are three main differences between PPO and SAC. The first is that the implementation of SAC on the AWS DeepRacer console only allows you to select continuous action space (covered later in this post).
The second and sharper contrast between PPO and SAC is in how they leverage the information learned by the policy while exploring the environment between training iterations. PPO uses on-policy learning, which means that we learn the value function from observations made by the current policy exploring the environment. SAC, on the other hand, uses off-policy learning, which means it can use observations made by previous policies exploring the environment.
The trade-off between off-policy and on-policy learning tends to be stability vs. data efficiency. On-policy algorithms tend to be more stable but are more data-hungry, whereas off-policy algorithms tend to be more unstable but more data efficient, where stability in this context refers to how the model performs in between training iterations. A stable model tends to have consistent performance between training iterations, meaning that if we’re training our model to follow the center of the track, we see it get better and better at staying in the center of the track with each training iteration. Because of the consistent performance, we tend to see the total reward consistently increase between training iterations.
Unstable models tend to have more random performance between training iterations, which means that our model may come closer to following the middle of the track in one training iteration and then be completely unable to stay on the track the next training iteration. This leads to total reward between training iterations that looks noisier than on-policy methods, particularly at the start of training.
The third and final difference is how PPO and SAC use entropy. In this case, entropy is a measure of the uncertainty in the policy, so it can be interpreted as a measure of how confident a policy is at choosing an action for a given observation. A policy with low entropy is very confident at choosing an action, whereas a policy with high entropy is unsure of which action to choose.
As mentioned earlier, exploration vs. exploitation is a key challenge in RL. To confront this issue, the PPO algorithm uses entropy regularization. Entropy regularization encourages the agent to explore by preventing it from settling on a specific policy.
Let’s once again use the “follow the center line” sample reward function. If we don’t have entropy regularization, after various training iterations we may end up with a policy that causes the agent to jitter around the center line. The jitter behavior occurs because the policy has a hard time deciding whether the best action is to stay forward or turn slightly left or right after making an observation of the environment. This behavior keeps us close to the center line, we just jitter around the center line by slightly turning left and right as the agent moves around the track. This means that this jitter policy has a high total reward because it keeps us close to the center line. The entropy of this policy is also relatively high, because the policy is unsure of what the best action is for a given observation of the environment.
At this point, without using entropy as a regularizer and the total reward being high, the algorithm starts producing policies with the same jitter behavior on every training iteration, effectively meaning that the algorithm has converged. By adding entropy as a regularizer on each training iteration, the algorithm requires the total reward to be high and the entropy to be low. If we end up in a training iteration where the total reward and entropy are both high, the algorithm produces a new policy with new behavior as opposed to producing another “jitter” policy. Because entropy regularization causes a new policy to be produced, we say that it encourages exploration, because the new policy likely takes different actions than the previous “jitter” policy when observing the environment.
For SAC, instead of using entropy as a regularizer, we change the objective of the RL model to maximize not only total reward but also entropy. This entropy maximization makes SAC a unique RL algorithm. Entropy maximization has similar benefits to using the entropy as a regularizer, such as incentivizing wider exploration and avoiding convergence to a bad policy.
Entropy maximization has one unique advantage: the algorithm tends to give up on policies that choose unpromising behavior. This happens because the polices produced by SAC for each training iteration choose actions that maximize total reward and entropy when observing the environment. This means that SAC policies tend to explore the environment more because high entropy means that we’re unsure which action to take. However, because we also maximize for total reward, we’re taking unsure actions as we observe the environment close to our desired behavior. SAC is an off-policy algorithm, which means we can use observations from policies produced from different training iterations. When we look at the observations of the previous policies, which have high entropy and therefore explore the environment more, the algorithm can pick out the promising behavior and give up on the unpromising behavior.
You can tune the amount of entropy to use in SAC with the hyperparameter SAC alpha, with a value between 0.0 and 1.0. The maximum value of the SAC alpha uses the whole entropy value of the policy and favors exploration. The minimum value of SAC alpha recovers the standard RL objective and there is no entropy bonus to incentivize the exploration. A good SAC alpha value to kick off your first model is 0.5. Then you can tune this hyperparameter accordingly as you iterate on your models.
The ins and outs of action spaces
Now let’s look at how action spaces work on the AWS DeepRacer console and introduce the new continuous action space, which allows you to define a range of actions instead of a discrete set of actions. To begin, let’s review how discrete action spaces work in AWS DeepRacer.
The AWS DeepRacer console uses a neural network to model the policy learned by both PPO and SAC. The output of the policy is a discrete set of values. For discrete action spaces, which is what the PPO algorithm available on the AWS console has traditionally used, the discrete values returned from the neural network are interpreted as a probability distribution and are mapped to a set of actions. The set of actions is defined by the user by specifying the maximum steering angle, speed values, and their respective granularities to generate the corresponding combinations of speed and steering actions. Therefore, the policy returns a discrete distribution of actions.
For example, if we select a maximum steering angle of 15 degrees, a maximum speed of 1 m/s, and corresponding granularities of 3 and 1, our discrete action space has three values mapped to the following steering angle and speed pairs: (-15 degrees, 1 m/s), (0 degrees, 1m/s), and (15 degrees, 1m/s). A policy may return the following discrete distribution [0.50, 0.25, 0.25] for a given observation in the environment, which can loosely be interpreted as the policy being 50% certain that action 1, (-15 degrees, 1 m/s), is the action most likely to maximize total reward for a given observed state.
During training, we sample the action space distribution to encourage exploration, meaning that if we have this discrete distribution, we have a 50% chance of picking action 1, a 25% chance of picking action 2, and a 25% chance of picking action 3. This means that during training, until our policy is very sure about which action to take for a given observed state, we always have the chance to explore the benefits of a new action.
For continuous action space, the policy only outputs two discrete values. These values are interpreted to be the mean and standard deviation of a continuous normal distribution. You define a range for speed and steering angle. The action for an observed state is chosen from this user-defined range of speed and steering by sampling the normal distribution, defined by the mean and standard deviation returned from the policy.
For example, we can define the following ranges for steering angel and speed, [-20 degrees, 20 degrees] and [0.75m/s, 4m/s]. This means that the policy can explore all combinations specified in this range, as opposed to the discrete action space case where it could only explore three combinations. Continuous action spaces tend to produce agents that exhibit less zig-zag motion when navigating the environment. This is because policies tend to learn smooth changes in steering angle and speed as opposed to discrete changes. The trade-off is that continuous action spaces are more sensitive to choices in reward function and steering angle and speed ranges. Depending on these choices, continuous action spaces may increase the amount of time it takes to train.
Although continuous action spaces are required for SAC, you can also use them for PPO. The AWS DeepRacer console now supports training PPO models that can use either continuous or discrete action spaces. Let’s look at how to set up a continuous action space on the AWS DeepRacer console.
Creating a new vehicle using continuous action space
In this section, we walk you through the steps to create a new vehicle in the My Garage section of the console with continuous action space. All you need to do is sign up for an AWS account (if you don’t already have one) and go to the AWS DeepRacer console:
- On the AWS DeepRacer console, choose Your garage.
In the list of vehicles, you should see a new vehicle The Original DeepRacer (continuous action space) added. This is provided by default to all users to train their models using continuous action space. The vehicle uses a single camera and has a speed range of [0.5 : 1] m/s and steering angle range of [-30 : 30 ] degrees.
- Choose Build new vehicle to build your own vehicle with a new configuration.
In this example, we build a vehicle with stereo cameras.
- For Sensor modifications, select Stereo camera.
- Choose Next.
- For Choose your action space type, select Continuous.
For this post, we choose the action space range [0.5 : 2 ] m/s and [-30 : 30 ] degrees.
- For Right steering angle range, enter
-30
. - For Left steering angle range, enter
30
. - For Minimum speed, enter
5
. - For Maximum speed, enter
2
. - Choose Next.
- Customize your vehicle appearance and name your vehicle.
- Choose Done.
The vehicle is now available to choose when creating a model.
Training a Soft Actor Critic model on the console
In this section, we walk you through how to create new Soft Actor Critic model:
- On the AWS DeepRacer console, choose Your models.
- Choose Create model.
- For Model name, enter the name of one of your models.
- Optionally, for Training job description, enter a description.
- For Choose a track, select your track (for this post, we select European Seaside Circuit (Buildings)).
- Choose Next.
The next section allows you to customize the desired training environment, select an algorithm along with its hyperparameters, and choose the virtual car that contains your desired action spaces.
- For Race type, select the type (for this post, we select Time trial).
- For Training algorithm and hyperparameters, select SAC.
- Under Hyperparameters, configure your hyperparameters.
SAC Alpha is the hyperparameter that determines the relative importance of the entropy term against the reward.
- Lastly, choose your virtual car to use, which contains your desired action spaces. For this post, we chose My_DeepRacer_Continuous.
- Choose Next.
Lastly, you can write a reward function to guide the agent to your desired behavior and configure your desired time of training.
- In Code editor, write your reward function.
SAC is sensitive to the scaling of the reward signal, so it’s important to carefully tune the appropriate reward value. For small reward magnitudes, the policy may perform poorly because it’s likely to become uniform and fail to exploit the reward signal. For large reward magnitudes, the model learns quickly at first, but the policy quickly converges to poor local minima due to lack of exploration. So carefully tuning the right reward scaling is the key to training a successful SAC model.
- After writing your reward function, choose Validate to verify your reward function is compatible with AWS DeepRacer.
- Under Stop conditions, for Maximum time, set the desired duration of training time in minutes.
- Choose Create model to start training.
When the training starts, the model dashboard shows the progress of training along with the live streaming of the simulator.
Conclusion
With AWS DeepRacer, you can now get hands-on experience with the Soft Actor Critic algorithm. Finding the right hyperparameters values, choosing appropriate action spaces, and writing your custom reward function are the keys to improving your SAC models.
You’re now ready to train your first SAC model. Sign in to the AWS DeepRacer console to get started.
About the Author
Eddie Calleja is an SDM for AWS DeepRacer. He is the manager of the AWS DeepRacer simulation application and device software stacks. As a former physicist he spends his spare time thinking about applying AI techniques to modern day physics problems.
Scheduling work meetings in Slack with Amazon Lex
Imagine being able to schedule a meeting or get notified about updates in your code repositories without leaving your preferred messaging platform. This could save you time and increase productivity. With the advent of chatbots, these mundane tasks are now easier than ever. Amazon Lex, a service for building chatbots, offers native integration with popular messaging applications such as Slack to offer a simple, yet powerful user experience. In a previous post, we explored how to schedule an appointment in Office 365 using an Amazon Lex bot and a custom web application to book meetings with a single user via email. In this post, we take advantage of Slack APIs to schedule meetings with multiple users by referencing their information in the Slack workspace. The Meeting Scheduler Slack Bot takes care of comparing calendars, finding open timeslots, and scheduling the actual meeting all without leaving the Slack workspace.
To accomplish this integration, we use a combination of AWS services (specifically Amazon Lex and AWS Lambda), and schedule actual meetings in Outlook. We use the chatbot to get the information needed to schedule a meeting, because these users also exist in Slack workspaces.
The following diagram illustrates the architecture of our solution.
Prerequisites
Before getting started, make sure you have the following prerequisites:
- An Office 365 account. If you don’t have an existing account, you can use the free trial of Office 365 Business Premium.
- Approval from your Azure Active Directory administrator for the Office 365 application registration.
Estimated cost
You incur AWS usage charges when deploying resources and interacting with the Amazon Lex bot. For more information, see Amazon Lex pricing and AWS Lambda Pricing. Depending on the configured deployments for the Office 365 account and Slack account, additional charges may be incurred depending on the selected licenses.
Deployment steps
In the following sections, we walk you through the deployment for the Meeting Scheduler Slack Bot. The steps are as follows:
- Register an application within your Microsoft account. This generates the keys that are required to call the Office 365 APIs.
- Configure the Slack application. This creates the keys that the Amazon Lex bot and fulfillment Lambda function use to call Slack APIs.
- Launch the AWS CloudFormation template to generate AWS resources. You need the keys and URLs from the previous two steps.
- Connect Amazon Lex to the Slack channel.
- Test your Meeting Scheduler Slack Bot by typing a message into your Slack application.
Registering an application within your Microsoft account
To register your application in your Microsoft account, complete the following steps:
- Log in to your Azure portal and navigate to App registrations.
- Choose New registration.
- For Name, enter a name for your application.
- For Redirect URL, enter
http://localhost/myapp
.
The redirect URL is required to make Microsoft Graph API calls. You also use this as the value for RedirectURL
for your CloudFormation stack.
- Choose Create.
- Choose Certificates & secrets.
- Choose New client secret.
- Enter a name for your secret.
- Choose Save.
Before navigating away from this page, take note of the secret value (which you use as the ApplicationPassword
parameter from the CloudFormation stack). This is the only time you can view the secret.
- Choose API permissions.
- Choose Add permission.
- Choose Microsoft Graph.
- For Select permissions, under Calendars, select Calendars.ReadWrite.
You need your Active Directory administrator to grant access to these permissions in order for the bot to be successful. These permissions give the application the ability to use service credentials to run certain actions, such as reading a calendar (to find available times) and writing (to schedule the meetings).
- In addition to the application secret you captured earlier, you also need the following information from your registered app:
- Application (client) ID – For the CloudFormation stack parameter
Client ID
- Directory (tenant) ID – For the CloudFormation stack parameter
ActiveDirectoryID
- Application (client) ID – For the CloudFormation stack parameter
Configuring the Slack application
To configure your Slack application, complete the following steps:
- Sign up for a Slack account and create a Slack team. For instructions, see Using Slack.
In the next step, you create a Slack application, which any Slack team can install. If you already have a Slack team set up, you may move on to the next step.
- Create a Slack application.
- Under OAuth and Permissions, for Bot Token Scopes, add the following:
- chat:write – Allows the bot to send messages with the given user handle
- team:read – Allows the bot to view the name, email domain, and icons for Slack workspaces the chatbot is connected to
- users:read – Allows the bot to see people in the Slack workspace
- users:read.email – Allows the bot to see the emails of people in the Slack workspace
- Choose Install App to Workspace.
- Choose Allow when prompted.
- Copy the Bot OAuth User Token, which you need when deploying the CloudFormation template in the next steps (for the parameter
SlackBotToken
). - Save the information found in the Basic Information section for a later step.
Deploying the CloudFormation template
The following CloudFormation template creates the necessary chatbot resources into your AWS account. The resources consist of the following:
- BotFulfillmentLambdaLayer – The Lambda layer that contains the libraries necessary to run the function
- LambdaExecutionRole – A basic Lambda execution role that allows the fulfillment function to get secrets from AWS Secrets Manager
- HelperLambdaExecutionRole – The Lambda execution role that allows the helper function to create Amazon Lex bots
- BotFulfillmentLambda – The Lambda function that handles fulfillment of the bot
- HelperLambda – The Lambda function that generates the bot
- SlackAppTokens – Secrets in Secrets Manager for using Slack APIs
- O365Secretes – Secrets in Secrets Manager for using Office 365 APIs
- HelperLambdaExecute – A custom CloudFormation resource to run the
HelperLambda
and generate the bot upon complete deployment of the template
The HelperLambda
function runs automatically after the CloudFormation template has finished deploying. This function generates a bot definition, slot types, utterances, and Lambda fulfillment connections in the Amazon Lex bot. The template takes approximately 10 minutes to deploy.
To deploy your resources, complete the following steps:
- On the AWS CloudFormation console, choose Create stack.
- For Upload a template file, upload the template.
- Choose Next.
- For Stack name, enter a name (for example,
MeetingScheduler
).
- Under Parameters, provide the parameters that you recorded in the previous steps:
- ApplicationId – Client ID
- ApplicationPassword – Client secret
- AzureActiveDirectoryId – Directory ID
- CodeBucket – S3 bucket created to store the .zip files
- RedirectUri – Redirect URI; if not changed from the example (
http://localhost/myapp
), leave this section as is - SlackBotToken – Bot OAuth user token
- Choose Next.
- Choose Next
- Select the I acknowledge that AWS CloudFormation might create IAM resources
This allows AWS CloudFormation to create the AWS Identity and Access Management (IAM) resources necessary to run our application. This includes the Lambda function execution roles and giving Amazon Lex the permissions to call those functions.
- Choose Create stack.
- Wait for the stack creation to complete.
You can monitor the status on the AWS CloudFormation console. Stack creation should take approximately 5 minutes.
Connecting the Amazon Lex bot to the Slack channel
To connect your bot to Slack, complete the following steps:
- On the Amazon Lex console, choose your newly deployed bot.
- On the Settings tab, create a dev alias and select Latest as the version.
- Click the + button to create the alias.
- On the Channels tab, choose Slack.
- For Channel Name, enter a name.
- For Alias, choose dev.
- Enter values for Client Id, Client Secret, Verification Token, and Success Page URL from the Basic Information page in your Slack app.
- Choose Activate.
- Complete your Slack integration. (You can skip step 2C, because we already completed it).
- Under Settings, choose Manage distribution.
- Choose Add to Slack.
- Authorize the bot to respond to messages.
Testing the Meeting Scheduler Slack Bot
To test your bot, complete the following steps:
- Navigate to the Slack workspace where you installed your application.
You should see the application under Apps.
- To schedule a meeting with your bot, try entering
Schedule a meeting
.
The following screenshot shows the bot’s response. You’re presented with the next five available work days to choose from.
- Choose your desired date for the meeting.
If there are no times available on the day you selected, you can choose a different date.
- Enter how long you want the meeting to last.
- When asked who to invite to the meeting, enter your team member’s Slack handle.
The user must have their Active Directory email address associated with their Slack profile.
- Choose your desired time of day for the meeting.
- Confirm the details of your scheduled meeting.
Success! You’ve just scheduled your first meeting using your Slack bot!
Cleaning up
To avoid incurring future charges, delete the resources by deleting the CloudFormation stack. Upon completion, delete the files uploaded to the S3 bucket, then delete the bucket itself.
Conclusion
Using Amazon Lex with Slack can help improve efficiency for daily tasks. This post shows how you can combine AWS services to create a chatbot that assists in scheduling meetings. It shows how to grant permissions, interact with Amazon Lex, and use external APIs to deliver powerful functionality and further boost productivity. The contents of this post and solution can be applied to other common workloads such as querying a database, maintaining a Git repo, or even interacting with other AWS services.
By integrating AWS with APIs like Office 365 and Slack, you can achieve even more automated functionality and improve the user experience. To get more hands on with building and deploying chatbots with Amazon Lex, check out these tutorials:
- A Question and Answer Bot Using Amazon Lex and Amazon Alexa
- Build a customer service chatbot with Amazon Lex
- CoffeeBot chat bot
About the Authors
Kevin Wang is a Solutions Architect for AWS, and passionate about building new applications on the latest AWS services. With a background in investment finance, Kevin loves to blend financial analysis with new technologies to find innovative ways to help customers. An inquisitive and pragmatic developer at heart, he loves community-driven learning and sharing of technology.
Kim Wendt is a Solutions Architect at AWS, responsible for helping global media & entertainment companies on their journey to the cloud. Prior to AWS, she was a Software Developer for the US Navy, and uses her development skills to build solutions for customers. She has a passion for continuous learning and is currently pursuing a masters in Computer Science with a focus in Machine Learning.
“Talking to the public about AI”
The University of Oxford’s Michael Wooldridge and Amazon’s Zachary Lipton on the topic of Wooldridge’s AAAI keynote — and the road ahead for AI research.Read More
English-language Alexa voice learns to speak Spanish
Neural text-to-speech enables new multilingual model to use the same voice for Spanish and English responses.Read More