Synergizing habits and goals with variational Bayes: A new framework for biological and artificial embodied agents

Diagrams showing features of habitual behavior (e.g., eating snack when focusing on work) and goal-directed behavior (planning a meal to lose weight). Left: habitual behavior with features like automatic, model-free, and fast; Right: goal-directed behavior with features like thoughtful, model-based, and slow.

In the intertwined worlds of psychology, cognitive neuroscience, and artificial intelligence, scientists continue to pursue the elusive goal of decoding and mimicking human and animal behavior. One of the most intriguing aspects of this research is the interplay between two types of behaviors: habitual and goal directed. Traditionally, these behaviors are believed to be managed by two distinct systems within the brain — habitual behaviors are fast and automatic, while goal-directed behaviors are slow and flexible. However, a recent paper in Nature Communications, Synergizing Habits and Goals with Variational Bayes (opens in new tab),” by researchers from Microsoft Research Asia (opens in new tab) and collaborators from Okinawa Institute of Science and technology (opens in new tab), introduces a groundbreaking theoretical framework that challenges this traditional view. Instead, it integrates these two types of behaviors using variational Bayesian methods, which involve statistical techniques for updating beliefs or probabilities based on new evidence. In this context, the use of variational Bayesian methods suggests a novel approach to understanding how habitual and goal-oriented behavior interact and influence decision-making processes of biological and artificial embodied agents (hereinafter referred to as “agent”).

Figure 1: features of habitual behavior (e.g., eating snack when focusing on work) and goal-directed behavior (planning a meal to lose weight). 

The core idea

The paper proposes the Bayesian behavior framework, which aims to enhance the understanding of behavior in sensorimotor tasks. At its core, this framework harnesses variational Bayesian methods to model human and animal actions. The key innovation is the introduction of a pivotal concept: the Bayesian intention variable, designed to bridge habitual behavior and goal-directed behavior. Habitual behaviors are driven by pre-existing distribution of intention shaped by sensory cues rather than explicit goals. In contrast, goal-directed behaviors are guided by a posterior distribution of intention conditioned on specific goals, which is inferred through the minimization of variational free energy. 

The authors argue that habitual and goal-directed behaviors should not be treated independently. Instead, these behaviors share neural pathways and can build on each other’s strengths. For example, habitual behaviors, while inflexible, offer finely honed motor skills that goal-directed behaviors can leverage for more complex planning. This synergistic approach comes to fruition through two key mechanisms: first, by minimizing the divergence between the habitual and goal-directed intentions, and second, by combining the prior and posterior intentions into a unified, synergized intention via inverse variance-weighted averaging. This consolidated intention then empowers the agent to effectively engage with its environment. 

Diagrams showing a: an overview of the Bayesian behavior framework; b: the framework in learning; c: the framework in behaving.
Figure 2: (a) an overview of the Bayesian behavior framework. (b) and (c): diagrams of the framework in learning and behaving. 

Simulation experiments

The framework was tested through simulations in vision-based sensorimotor tasks, specifically using a T-maze environment. The results replicated the observation in neuroscience and psychology experiments.

1. Transition from goal-directed to habitual behavior: The simulations demonstrated that with repetitive trials, an agent’s behavior naturally transitions from slow, goal-directed behavior to faster, habitual behavior. This transition is driven by the increasing precision of habitual intentions, reducing the computational burden on goal-directed processes. 

2. Behavior change after reward devaluation: The study also explored how agents adapt their behaviors when the reward values change, mirroring the concept of outcome devaluation in psychology. Agents with extensive training showed more resistance to behavior change, reflecting the robust nature of habitual behaviors.

3. Zero-shot goal-directed planning: The framework demonstrated the ability to tackle new goals without additional training. By leveraging existing habitual behaviors, the agent could efficiently plan and execute new tasks.

Diagrams illustrating the trained agent performing goal-directed planning for unseen goals. a: Illustration of the experimental setting. Unlike the previous habitization experiment, the rewards are the same for the left and right exits. After stage 2 (adaptation), the model is fixed, and we test the agent’s goal-directed planning capacity (stage 3); b: An example agent behavior (movement trajectories of 10 trials in each plot, aerial view) during stage 2; c: Statistics of policy diversity using purely habitual behavior (actions computed by prior intention). Totally 12 agents, trained with different random seeds, are tested for 60 trials for each; d: Statistics of success rate in planning (tested using 12 agents and 10 episodes for each agent in each case) with different kinds of goals; e: Examples of movement trajectories and internal predictions of current and future observations in goal-directed planning.
Figure 3: the trained agent (a-c) can perform goal-directed planning for unseen goals (d,e). 

Key insights for cognitive neuroscience

1. How does an agent arbitrate between model-free, habitual behavior and model-based, goal-directed behavior?

 The paper proposes that the agent uses a synergized intention, calculated as an inverse variance-weighted average of habitual and goal-directed intentions. This approach inherently measures the uncertainty of behaviors by analyzing the statistical variance of the intention distribution. The framework allows the agent to dynamically and autonomously adjust this balance during training by minimizing free energy and reinforcement learning loss. 

2. How does an agent autonomously transfer from slow, goal-directed behavior to fast, habitual behavior with repetitive trials?

The simulations demonstrate that the variance of habitual intention is initially high when adapting to a new task but decreases with repeated trials due to the simplicity of model-free decisions. As the variance decreases, the balance shifts progressively toward habitual intention. A mechanism is introduced to early-stop goal-directed active inference when the synergized intention is precise enough, conserving computational resources while maintaining high behavior precision. This explains why extensive training results in a transition from goal-directed to habitual behavior. 

3. How does an agent perform goal-directed planning for a novel goal that has not been trained to accomplish?

The agent should have an internal predictive model of the environment to perform a mental search for motor patterns. The goal-directed intention is inferred with a constraint from habitual intention, using the KL-divergence term in active inference. This constraint ensures that effective goal-directed planning, leveraging well-developed low-level motor skills formed in the habitual intention and the shared policy network. Consequently, the framework allows the agent to efficiently generalize human behavior to novel goals. These answers provide a comprehensive understanding of the dynamic interaction between habitual and goal-directed behaviors, and the mechanisms enabling efficient and flexible behavior in agents. 

Broader implications

The implications of this research extend beyond theoretical modeling. In machine learning and AI, this framework can inform the design of more efficient and adaptable systems. For instance, combining reinforcement learning with active inference could enhance the decision-making capabilities of autonomous agents in complex environments.


The paper marks a significant advancement in our understanding of behavior in the context of cognitive science. By integrating habitual and goal-directed behavior through a Bayesian framework, it offers a comprehensive model that balances efficiency and flexibility. This research not only advances theoretical knowledge but also provides new insights for practical applications in AI and robotics.

For those interested in the intricate details and mathematical foundations of this framework, in-depth exploration offered in the full paper is strongly encouraged. As the fields of cognitive science and AI continuously evolve, Microsoft researchers remain committed to embracing innovative perspectives through interdisciplinary endeavors. 

Maximize your Amazon Translate architecture using strategic caching layers

Maximize your Amazon Translate architecture using strategic caching layers

Amazon Translate is a neural machine translation service that delivers fast, high quality, affordable, and customizable language translation. Amazon Translate supports 75 languages and 5,550 language pairs. For the latest list, see the Amazon Translate Developer Guide. A key benefit of Amazon Translate is its speed and scalability. It can translate a large body of content or text passages in batch mode or translate content in real-time through API calls. This helps enterprises get fast and accurate translations across massive volumes of content including product listings, support articles, marketing collateral, and technical documentation. When content sets have phrases or sentences that are often repeated, you can optimize cost by implementing a write-through caching layer. For example, product descriptions for items contain many recurring terms and specifications. This is where implementing a translation cache can significantly reduce costs. The caching layer stores source content and its translated text. Then, when the same source content needs to be translated again, the cached translation is simply reused instead of paying for a brand-new translation.

In this post, we explain how setting up a cache for frequently accessed translations can benefit organizations that need scalable, multi-language translation across large volumes of content. You’ll learn how to build a simple caching mechanism for Amazon Translate to accelerate turnaround times.

Solution overview

The caching solution uses Amazon DynamoDB to store translations from Amazon Translate. DynamoDB functions as the cache layer. When a translation is required, the application code first checks the cache—the DynamoDB table—to see if the translation is already cached. If a cache hit occurs, the stored translation is read from DynamoDB with no need to call Amazon Translate again.

If the translation isn’t cached in DynamoDB (a cache miss), then the Amazon Translate API will be called to perform the translation. The source text is passed to Amazon Translate, and the translated result is returned and the translation is stored in DynamoDB, populating the cache for the next time that translation is requested.

For this blog post, we will be using Amazon API Gateway as a rest API for translation that integrates with AWS Lambda to perform backend logic. An Amazon Cognito user pool is used to control who can access your translate rest API. You can also use other mechanisms to control authentication and authorization to API Gateway based on your use-case.

Amazon Translate caching architecture

  1. When a new translation is needed, the user or application makes a request to the translation rest API.
  2. Amazon Cognito verifies the identity token in the request to grant access to the translation rest API.
  3. When new content comes in for translation, the Amazon API Gateway invokes the Lambda function that checks the Amazon DynamoDB table for an existing translation.
  4. If a match is found, the translation is retrieved from DynamoDB.
  5. If no match is found, the content is sent to Amazon Translate to perform a custom translation using parallel data. The translated content is then stored in DynamoDB along with a new entry for hit rate percentage.

These high-value translations are periodically post-edited by human translators and then added as parallel data for machine translation. This improves the quality of future translations performed by Amazon Translate.

We will use a simple schema in DynamoDB to store the cache entries. Each item will contain the following attributes:

  • src_text: The original source text
  • target_locale: The target language to translate to
  • translated_text: The translated text
  • src_locale: The original source language
  • hash: The primary key of the table

The primary key will be constructed from the src_locale, target_locale, and src_text to uniquely identify cache entries. When retrieving translations, items will be looked up by their primary key.


To deploy the solution, you need

  1. An AWS account. If you don’t already have an AWS account, you can create one.
  2. Your access to the AWS account must have AWS Identity and Access Management (IAM) permissions to launch AWS CloudFormation templates that create IAM roles.
  3. Install AWS CLI.
  4. Install jq tool.
  5. AWS Cloud Development Kit (AWS CDK). See Getting started with the AWS CDK.
  6. Postman installed and configured on your computer.

Deploy the solution with AWS CDK

We will use AWS CDK to deploy the DynamoDB table for caching translations. CDK allows defining the infrastructure through a familiar programming language such as Python.

  1. Clone the repo from GitHub.
    git clone

  2. Run the requirements.txt, to install python dependencies.
    python3 -m pip install -r requirements.txt

  3. Open file and replace the AWS account number and AWS Region with yours.
  4. To verify that the AWS CDK is bootstrapped, run cdk bootstrap from the root of the repository:
cdk bootstrap
⏳ Bootstrapping environment aws://<acct#>/<region>... 
Trusted accounts for deployment: (none) 
Trusted accounts for lookup: (none) 
Using default execution policy of 
Pass '--cloudformation-execution-policies' to 
customize. ✅ Environment aws://<acct#>/<region> 
bootstrapped (no changes).
  1. Define your CDK stack to add DynamoDB and Lambda resources. The DynamoDB and Lambda Functions are defined as follows:
    • This creates a DynamoDB table with the primary key as hash, because the TRANSLATION_CACHE table is schemaless, you don’t have to define other attributes in advance. This also creates a Lambda function with Python as the runtime.
table = ddb.Table(
            self, 'TRANSLATION_CACHE',
            partition_key={'name': 'hash', 'type': ddb.AttributeType.STRING},

        self._handler = _lambda.Function(
            self, 'GetTranslationHandler',
                'TRANSLATION_CACHE_TABLE_NAME': table.table_name,
    • The Lambda function is defined such that it:
      • Parses the request body JSON into a Python dictionary.
      • Extracts the source locale, target locale, and input text from the request.
      • Gets the DynamoDB table name to use for a translation cache from environment variables.
      • Calls generate_translations_with_cache() to translate the text, passing the locales, text, and DynamoDB table name.
      • Returns a 200 response with the translations and processing time in the body.
def handler(event, context):

    print('request: {}'.format(json.dumps(event)))

    request = json.loads(event['body'])
    print("request", request)

    src_locale = request['src_locale']
    target_locale = request['target_locale']
    input_text = request['input_text']
    table_name = os.environ['TRANSLATION_CACHE_TABLE_NAME']

    if table_name == "":
        print("Defaulting table name")
        table_name = "TRANSLATION_CACHE"

        start = time.perf_counter()
        translations = generate_translations_with_cache(src_locale, target_locale, input_text, table_name)
        end = time.perf_counter()
        time_diff = (end - start)

        translations["processing_seconds"] = time_diff

        return {
            'statusCode': 200,
            'headers': {
                'Content-Type': 'application/json'
            'body': json.dumps(translations)

    except ClientError as error:

        error = {"error_text": error.response['Error']['Code']}
        return {
            'statusCode': 500,
            'headers': {
                'Content-Type': 'application/json'
            'body': json.dumps(error)

    • The generate_translations_with_cache function divides the input text into separate sentences by splitting on a period (“.”) symbol. It stores each sentence as a separate entry in the DynamoDB table along with its translation. This segmentation into sentences is done so that cached translations can be reused for repeating sentences.
    • In summary, it’s a Lambda function that accepts a translation request, translates the text using a cache, and returns the result with timing information. It uses DynamoDB to cache translations for better performance.
  1. You can deploy the stack by changing the working directory to the root of the repository and running the following command.
    cdk deploy


Here are some additional considerations when implementing translation caching:

  • Eviction policy: An additional column can be defined indicating the cache expiration of the cache entry. The cache entry can then be evicted by defining a separate process.
  • Cache sizing: Determine expected cache size and provision DynamoDB throughput accordingly. Start with on-demand capacity if usage is unpredictable.
  • Cost optimization: Balance caching costs with savings from reducing Amazon Translate usage. Use a short DynamoDB Time-to-Live (TTL) and limit the cache size to minimize overhead.
  • Sensitive Information: DynamoDB encrypts all data at rest by default, if cached translations contain sensitive data, you can grant access to authorized users only. You can also choose to not cache data that contains sensitive information.

Customizing translations with parallel data

The translations generated in the translations table can be human-reviewed and used as parallel data to customize the translations. Parallel data consists of examples that show how you want segments of text to be translated. It includes a collection of textual examples in a source language; for each example, it contains the desired translation output in one or more target languages.

This is a great approach for most use cases, but some outliers might require light post-editing by human teams. The post-editing process can help you better understand the needs of your customers by capturing the nuances of local language that can be lost in translation. For businesses and organizations that want to augment the output of Amazon Translate (and other Amazon artificial intelligence (AI) services) with human intelligence, Amazon Augmented AI (Amazon A2I) provides a managed approach to do so, see Designing human review workflows with Amazon Translate and Amazon Augmented AI for more information.

When you add parallel data to a batch translation job, you create an Active Custom Translation job. When you run these jobs, Amazon Translate uses your parallel data at runtime to produce customized machine translation output. It adapts the translation to reflect the style, tone, and word choices that it finds in your parallel data. With parallel data, you can tailor your translations for terms or phrases that are unique to a specific domain, such as life sciences, law, or finance. For more information, see Customizing your translations with parallel data.

Testing the caching setup

Here is a video walkthrough of testing the solution.

There are multiple ways to test the caching setup. For this example, you will use Postman to test by sending requests. Because the Rest API is protected by an Amazon Cognito authorizer, you will need to configure Postman to send an authorization token with the API request.

As part of the AWS CDK deployment in the previous step, a Cognito user pool is created with an app client integration. On your AWS CloudFormation console, you can find BaseURL, translateCacheEndpoint, UserPoolID, and ClientID on the CDK stack output section. Copy these into a text editor for use later.

To generate an authorization token from Cognito, the next step is to create a user in the Cognito user pool.

  1. Go to the Amazon Cognito console. Select the user pool that was created by the AWS CDK stack.
  2. Select the Users tab and choose Create User.
  3. Enter the following values and choose Create User.
    1. On Invitation Message verify that Don’t send an invitation is selected.
    2. For Email address, enter
    3. On Temporary password, verify that Set a password is selected.
    4. In Password enter testUser123!.
  4. Now that the user is created, you will use AWS Command Line Interface (CLI) to simulate a sign in for the user. Go to the AWS CloudShell console.
  5. Enter the following commands on the CloudShell terminal by replacing UserPoolID and ClientID from the CloudFormation output of the AWS CDK stack.
export YOUR_POOL_ID=<UserPoolID>

export YOUR_CLIENT_ID=<ClientID>

export Session_ID=$(aws cognito-idp admin-initiate-auth --user-pool-id ${YOUR_POOL_ID} --client-id ${YOUR_CLIENT_ID} --auth-flow ADMIN_NO_SRP_AUTH --auth-parameters ',PASSWORD="testUser123!"' | jq .Session -r)

aws cognito-idp admin-respond-to-auth-challenge --user-pool-id ${YOUR_POOL_ID}  --client-id ${YOUR_CLIENT_ID} --challenge-name NEW_PASSWORD_REQUIRED --challenge-responses 'USERNAME=,NEW_PASSWORD="testUser456!"' --session "${Session_ID}"
  1. The output from this call should be a valid session in the following format. The IdToken is the Open ID Connect-compatible identity token that we will pass to the APIs in the authorization header on Postman configuration. Copy it into a text editor to use later.
   "ChallengeParameters": {},
   "AuthenticationResult": {
      "ExpiresIn": 3600,
      "TokenType": "Bearer",

Now that you have an authorization token to pass with the API request to your rest API. Go to the Postman website. Sign in to the Postman website or download the Postman desktop client and create a Workspace with the name dev.

  1. Select the workspace dev and choose on New request.
  2. Change the method type to POST from GET.
  3. Paste the <TranslateCacheEndpoint> URL from the CloudFormation output of the AWS CDK stack into the request URL textbox. Append the API path /translate to the URL, as shown in the following figure.

Now set up authorization configuration on Postman so that requests to the translate API are authorized by the Amazon Cognito user pool.

  1. Select the Authorization tab below the request URL in Postman. Select OAuth2.0 as the Type.
  2. Under Current Token, copy and paste Your IdToken from earlier into the Token field.

  1. Select Configure New Token. Under Configuration Options add or select the values that follow. Copy the BaseURL and ClientID from the CloudFormation output of the AWS CDK stack. Leave the remaining fields at the default values.
    • Token Name: token
    • Grant Type: Select Authorization Code
    • Callback URL: Enter https://localhost
    • Auth URL: Enter <BaseURL>/oauth2/authorize
    • Access Token URL: Enter <BaseURL>/oauth2/token
    • ClientID: Enter <ClientID>
    • Scope: Enter openid profile translate-cache/translate
    • Client Authorization: Select Send client credentials in body.

  1. Click Get New Access Token. You will be directed to another page to sign in as a user. Use the below credentials of the test user that was created earlier in your Cognito user pool:-
    • Username:
    • Password: testUser456!
  2. After authenticating, you will now get a new id_token. Copy the new id_token and go back to Postman authorization tab to replace that with the token value under Current Token.
  3. Now, on the Postman request URL and Select the Body tab for Request. Select the raw . Change Body type to JSON and insert the following JSON content. When done, choose Send.
"src_locale": "en",
"target_locale": "fr",
"input_text": "Use the Amazon Translate service to translate content from a source language (the language of the input content) to a target language (the language that you select for the translation output). In a batch job, you can translate files from one or more source languages to one or more target languages. For more information about supported languages, see Supported languages and language codes."

First translation request to the API

The first request to the API takes more time, because the Lambda function checks the given input text against the DynamoDB database on the initial request. Because this is the first request, it won’t find the input text in the table and will call Amazon Translate to translate the provided text.

Examining the processing_seconds value reveals that this initial request took approximately 2.97 seconds to complete.

Subsequent translations requests to the API

After the first request, the input text and translated output are now stored in the DynamoDB table. On subsequent requests with the same input text, the Lambda function will first check DynamoDB for a cache hit. Because the table now contains the input text from the first request, the Lambda function will find it there and retrieve the translation from DynamoDB instead of calling Amazon Translate again.

Storing requests in a cache allows subsequent requests for the same translation to skip the Amazon Translate call, which is usually the most time-consuming part of the process. Retrieving the translation from DynamoDB is much faster than calling Amazon Translate to translate the text each time.

The second request has a processing time of approximately 0.79 seconds, about 3 times faster than the first request which took 2.97 seconds to complete.

Cache purge

Amazon Translate continuously improves its translation models over time. To benefit from these improvements, you need to periodically purge translations from your DynamoDB cache and fetch fresh translations from Amazon Translate.

DynamoDB provides a Time-to-Live (TTL) feature that can automatically delete items after a specified expiry timestamp. You can use this capability to implement cache purging. When a translation is stored in DynamoDB, a purge_date attribute set to 30 days in the future is added. DynamoDB will automatically delete items shortly after the purge_date timestamp is reached. This ensures cached translations older than 30 days are removed from the table. When these expired entries are accessed again, a cache miss occurs and Amazon Translate is called to retrieve an updated translation.

The TTL-based cache expiration allows you to efficiently purge older translations on an ongoing basis. This ensures your applications can benefit from the continuous improvements to the machine learning models used by Amazon Translate while minimizing costs by still using caching for repeated translations within a 30-day period.

Clean up

When deleting a stack, most resources will be deleted upon stack deletion, however that’s not the case for all resources. The DynamoDB table will be retained by default. If you don’t want to retain this table, you can set this in the AWS CDK code by using RemovalPolicy.

Additionally, the Lambda function will generate Amazon CloudWatch logs that are permanently retained. These won’t be tracked by CloudFormation because they’re not part of the stack, so the logs will persist. Use the Cloudwatch console to manually delete any logs that you don’t want to retain.

You can either delete the stack through the CloudFormation console or use AWS CDK destroy from the root folder.

cdk destroy


The solution outlined in this post provides an effective way to implement a caching layer for Amazon Translate to improve translation performance and reduce costs. Using a cache-aside pattern with DynamoDB allows frequently accessed translations to be served from the cache instead of calling Amazon Translate each time.

The caching architecture is scalable, secure, and cost-optimized. Additional enhancements such as setting TTLs, adding eviction policies, and encrypting cache entries can further customize the architecture to your specific use case.

Translations stored in the cache can also be post-edited and used as parallel data to train Amazon Translate. This creates a feedback loop that continuously improves translation quality over time.

By implementing a caching layer, enterprises can deliver fast, high-quality translations tailored to their business needs at reduced costs. Caching provides a way to scale Amazon Translate efficiently while optimizing performance and cost.

Additional resources

About the authors

Praneeth Reddy Tekula is a Senior Solutions Architect focusing on EdTech at AWS. He provides architectural guidance and best practices to customers in building resilient, secure and scalable systems on AWS. He is passionate about observability and has a strong networking background.

Reagan Rosario is a Solutions Architect at AWS, specializing in building scalable, highly available, and secure cloud solutions for education technology companies. With over 10 years of experience in software engineering and architecture roles, Reagan loves using his technical knowledge to help AWS customers architect robust cloud solutions that leverage the breadth and depth of AWS.

Deploy a Slack gateway for Amazon Bedrock

Deploy a Slack gateway for Amazon Bedrock

In today’s fast-paced digital world, streamlining workflows and boosting productivity are paramount. That’s why we’re thrilled to share an exciting integration that will take your team’s collaboration to new heights. Get ready to unlock the power of generative artificial intelligence (AI) and bring it directly into your Slack workspace.

Imagine the possibilities: Quick and efficient brainstorming sessions, real-time ideation, and even drafting documents or code snippets—all powered by the latest advancements in AI. Say goodbye to context switching and hello to a streamlined, collaborative experience that will supercharge your team’s productivity. Whether you’re leading a dynamic team, working on complex projects, or simply looking to enhance your Slack experience, this integration is a game-changer.

In this post, we show you how to unlock new levels of efficiency and creativity by bringing the power of generative AI directly into your Slack workspace using Amazon Bedrock.

Solution overview

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

In the following sections, we guide you through the process of setting up a Slack integration for Amazon Bedrock. We show how to create a Slack application, configure the necessary permissions, and deploy the required resources using AWS CloudFormation.

The following diagram illustrates the solution architecture.

The workflow consists of the following steps:

  1. The user communicates with the Slack application.
  2. The Slack application sends the event to Amazon API Gateway, which is used in the event subscription.
  3. API Gateway forwards the event to an AWS Lambda function.
  4. The Lambda function invokes Amazon Bedrock with the request, then responds to the user in Slack.


You need an AWS account and an AWS Identity and Access Management (IAM) role and user with permissions to create and manage the necessary resources and components for this application. If you don’t have an AWS account, see How do I create and activate a new Amazon Web Services account?

You also need an existing account with Amazon Bedrock model access provided. If you don’t have model permission, refer to Model access.

Lastly, you need a Slack account and access to create and publish apps to your Slack organization. If you don’t have one, request your company to create a Slack sandbox organization for you to experiment, or go to Slack to create a free Slack account and workspace.

Create a Slack application

The security configuration varies across organizations. To manage your Slack workspace’s settings, reach out to your Slack administrator or as administrator, complete the following steps:

  1. Navigate to the admin section within Slack and choose Build.
    Build new Slack Application
  2. Choose Create New App.
    Create new Slack application
  3. For App Name, enter a name for your app (for this post, we name it BedrockSlackIntegration).
  4. Choose your workspace.
  5. Choose Create App.

    After you create the app, you can configure its permissions.
  6. On the app details page, choose Basic Information in the navigation pane.
  7. Under Add features and functionality, choose Permissions
    Basic information of application
  8. In the Scopes section, add the scopes im:read, im:write, and chat:write.

On the Basic Information page, Bots and Permissions should now both have a green check mark.

  1. Under Install your app, choose Install to Workspace.
  2. When prompted to install, choose Allow.
  3. Open the Amazon Bedrock console and choose Model access in the navigation pane.
    Provision Amazon Bedrock model access
  4. You can select your model from the available list. For this post, we grant access to ai21.j2-ultra-v1 (Jurassic-2 Ultra).For more information about requesting model access, see Model access. Next, we deploy the code and connect with Amazon Bedrock when we get a message from Slack. For that, we need the Slack bot token to use as an input parameter for the CloudFormation template in the next section.
  5. On the Slack app details page, choose OAuth & Permissions in the navigation pane.
  6. Copy the value for Bot User OAuth Token.
    OAuth and permissions for Slack application

Deploy resources with AWS CloudFormation

Complete the following steps to launch the CloudFormation stack:

  1. For Stack name, use default or enter a name of your choice.
  2. For SlackTokenParam, enter the bot token you copied earlier.
  3. Choose Next.
    Specify CFN stack details
  4. Create your stack and wait a few minutes for deployment to complete.
    AWS CloudFormation stack status
  5. On the Outputs tab, copy the value for SlackBotEndpointOutput to use in the next steps.
    AWS CloudFormation output variables

In the next section, we start integrating Amazon Bedrock with Slack.

Integrate Amazon Bedrock with Slack

After you deploy your CloudFormation stack, complete the following steps:

  1. On the Slack app details page, choose Event Subscriptions in the navigation pane.
  2. Toggle Enable Events on.
    Enable event subscription on Slack application

The event subscription should get automatically verified.

  1. Under Subscribe to bot events, add the events app_mention and
  2. Choose Save Changes.
    Save slack application changes
    The integration is now complete.

Test the Slack bot

To test your bot, complete the following steps:

  1. Navigate to your Slack.
  2. Create a new group and add the app BedrockSlackIntegration.
  3. Start interacting with the Amazon Bedrock bot using @BedrockSlackIntegration.

Your interaction will look like the following screenshot.

Test your bot through Slack

The bot demonstrated here doesn’t have the state of your previous questions or your chat history with new subsequent messages. However, you can implement this using Amazon DynamoDB. We will cover this in a later blog post.


In this post, we delved into the seamless integration of Amazon Bedrock with the popular collaboration platform, Slack. The step-by-step guide demonstrated how to establish a direct connection between these two powerful tools, enabling you and your team to harness the full potential of generative AI directly within your Slack workspace. With this integration, you can streamline your workflow and enhance productivity, making it effortless to tap into the cutting-edge capabilities of generative AI. Whether you’re seeking to generate content, analyze data, or explore innovative ideas, this integration empowers you to do it all without leaving the familiar Slack environment.

You can further empower your team by deploying a Slack gateway for Amazon Q Business, the generative AI assistant that empowers employees based on knowledge and data in your enterprise systems. To learn more about how to use generative AI with AWS services, see Generative AI on AWS.

About the Authors

Rushabh Lokhande is a Senior Data & ML Engineer with AWS Professional Services Analytics Practice. He helps customers implement big data, machine learning, analytics solutions, and generative AI solutions. Outside of work, he enjoys spending time with family, reading, running, and playing golf.

Andrew Ang is a Senior ML Engineer with the AWS Generative AI Innovation Center, where he helps customers ideate and implement generative AI proof of concept projects. Outside of work, he enjoys playing squash and watching travel and food vlogs.

John Losito is an Associate Cloud Infrastructure Architect with AWS Professional Services, where he helps customers craft automation scripts using the AWS CDK or Terraform to efficiently deploy and managed cloud resources. Outside of work, he enjoys spending time with his family, exercising, and improving his archery skills.

Decoding How NVIDIA AI Workbench Powers App Development

Decoding How NVIDIA AI Workbench Powers App Development

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible and showcases new hardware, software, tools and accelerations for NVIDIA RTX PC and workstation users.

The demand for tools to simplify and optimize generative AI development is skyrocketing. Applications based on retrieval-augmented generation (RAG) — a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from specified external sources — and customized models are enabling developers to tune AI models to their specific needs.

While such work may have required a complex setup in the past, new tools are making it easier than ever.

NVIDIA AI Workbench simplifies AI developer workflows by helping users build their own RAG projects, customize models and more. It’s part of the RTX AI Toolkit — a suite of tools and software development kits for customizing, optimizing and deploying AI capabilities — launched at COMPUTEX earlier this month. AI Workbench removes the complexity of technical tasks that can derail experts and halt beginners.

What Is NVIDIA AI Workbench?

Available for free, NVIDIA AI Workbench enables users to develop, experiment with, test and prototype AI applications across GPU systems of their choice — from laptops and workstations to data center and cloud. It offers a new approach for creating, using and sharing GPU-enabled development environments across people and systems.

A simple installation gets users up and running with AI Workbench on a local or remote machine in just minutes. Users can then start a new project or replicate one from the examples on GitHub. Everything works through GitHub or GitLab, so users can easily collaborate and distribute work. Learn more about getting started with AI Workbench.

How AI Workbench Helps Address AI Project Challenges

Developing AI workloads can require manual, often complex processes, right from the start.

Setting up GPUs, updating drivers and managing versioning incompatibilities can be cumbersome. Reproducing projects across different systems can require replicating manual processes over and over. Inconsistencies when replicating projects, like issues with data fragmentation and version control, can hinder collaboration. Varied setup processes, moving credentials and secrets, and changes in the environment, data, models and file locations can all limit the portability of projects.

AI Workbench makes it easier for data scientists and developers to manage their work and collaborate across heterogeneous platforms. It integrates and automates various aspects of the development process, offering:

  • Ease of setup: AI Workbench streamlines the process of setting up a developer environment that’s GPU-accelerated, even for users with limited technical knowledge.
  • Seamless collaboration: AI Workbench integrates with version-control and project-management tools like GitHub and GitLab, reducing friction when collaborating.
  • Consistency when scaling from local to cloud: AI Workbench ensures consistency across multiple environments, supporting scaling up or down from local workstations or PCs to data centers or the cloud.

RAG for Documents, Easier Than Ever

NVIDIA offers sample development Workbench Projects to help users get started with AI Workbench. The hybrid RAG Workbench Project is one example: It runs a custom, text-based RAG web application with a user’s documents on their local workstation, PC or remote system.

Every Workbench Project runs in a “container” — software that includes all the necessary components to run the AI application. The hybrid RAG sample pairs a Gradio chat interface frontend on the host machine with a containerized RAG server — the backend that services a user’s request and routes queries to and from the vector database and the selected large language model.

This Workbench Project supports a wide variety of LLMs available on NVIDIA’s GitHub page. Plus, the hybrid nature of the project lets users select where to run inference.

Workbench Projects let users version the development environment and code.

Developers can run the embedding model on the host machine and run inference locally on a Hugging Face Text Generation Inference server, on target cloud resources using NVIDIA inference endpoints like the NVIDIA API catalog, or with self-hosting microservices such as NVIDIA NIM or third-party services.

The hybrid RAG Workbench Project also includes:

  • Performance metrics: Users can evaluate how RAG- and non-RAG-based user queries perform across each inference mode. Tracked metrics include Retrieval Time, Time to First Token (TTFT) and Token Velocity.
  • Retrieval transparency: A panel shows the exact snippets of text — retrieved from the most contextually relevant content in the vector database — that are being fed into the LLM and improving the response’s relevance to a user’s query.
  • Response customization: Responses can be tweaked with a variety of parameters, such as maximum tokens to generate, temperature and frequency penalty.

To get started with this project, simply install AI Workbench on a local system. The hybrid RAG Workbench Project can be brought from GitHub into the user’s account and duplicated to the local system.

More resources are available in the AI Decoded user guide. In addition, community members provide helpful video tutorials, like the one from Joe Freeman below.

Customize, Optimize, Deploy

Developers often seek to customize AI models for specific use cases. Fine-tuning, a technique that changes the model by training it with additional data, can be useful for style transfer or changing model behavior. AI Workbench helps with fine-tuning, as well.

The Llama-factory AI Workbench Project enables QLoRa, a fine-tuning method that minimizes memory requirements, for a variety of models, as well as model quantization via a simple graphical user interface. Developers can use public or their own datasets to meet the needs of their applications.

Once fine-tuning is complete, the model can be quantized for improved performance and a smaller memory footprint, then deployed to native Windows applications for local inference or to NVIDIA NIM for cloud inference. Find a complete tutorial for this project on the NVIDIA RTX AI Toolkit repository.

Truly Hybrid — Run AI Workloads Anywhere

The Hybrid-RAG Workbench Project described above is hybrid in more than one way. In addition to offering a choice of inference mode, the project can be run locally on NVIDIA RTX workstations and GeForce RTX PCs, or scaled up to remote cloud servers and data centers.

The ability to run projects on systems of the user’s choice — without the overhead of setting up the infrastructure — extends to all Workbench Projects. Find more examples and instructions for fine-tuning and customization in the AI Workbench quick-start guide.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Light Bulb Moment: NVIDIA CEO Sees Bright Future for AI-Powered Electric Grid

Light Bulb Moment: NVIDIA CEO Sees Bright Future for AI-Powered Electric Grid

The electric grid and the utilities managing it have an important role to play in the next industrial revolution that’s being driven by AI and accelerated computing, said NVIDIA founder and CEO Jensen Huang Monday at the annual meeting of the Edison Electric Institute (EEI), an association of U.S. and international utilities.

“The future of digital intelligence is quite bright, and so the future of the energy sector is bright, too,” said Huang in a keynote before an audience of more than a thousand utility and energy industry executives.

Like other companies, utilities will apply AI to increase employee productivity, but “the greatest impact and return is in applying AI in the delivery of energy over the grid,” said Huang, in conversation with Pedro Pizarro, the chair of EEI and president and CEO of Edison International, the parent company of Southern California Edison, one of the nation’s largest electric utilities.

For example, Huang described how grids will use AI-powered smart meters to let customers sell their excess electricity to neighbors.

“You will connect resources and users, just like Google, so your power grid becomes a smart network with a digital layer like an app store for energy,” he said.

“My sense is, like previous industrial revolutions, [AI] will drive productivity to levels that we’ve never seen,” he added.

A video of the fireside chat will be available here soon.

AI Lights Up Electric Grids

Today, electric grids are mainly one-way systems that link a few big power plants to many users. They’ll increasingly become two-way, flexible and distributed networks with solar and wind farms connecting homes and buildings that sport solar panels, batteries and electric vehicle chargers.

It’s a big job that requires autonomous control systems that process and analyze in real time a massive amount of data — work well suited to AI and accelerated computing.

AI is being applied to use cases across electric grids, thanks to a wide ecosystem of companies using NVIDIA’s technologies.

In a recent GTC session, utility vendor Hubbell and startup Utilidata, a member of the NVIDIA Inception program, described a new generation of smart meters using the NVIDIA Jetson platform that utilities will deploy to process and analyze real-time grid data using AI models at the edge. Deloitte announced today its support for the effort.

Siemens Energy detailed in a separate GTC session its work with AI and NVIDIA Omniverse creating digital twins of transformers in substations to improve predictive maintenance, boosting grid resilience. And a video reports on how Siemens Gamesa used Omniverse and accelerated computing to optimize turbine placements for a large wind farm.

“Deploying AI and advanced computing technologies developed by NVIDIA enables faster and better grid modernization and we, in turn, can deliver for our customers,” said Maria Pope, CEO of Portland General Electric in Oregon.

NVIDIA Delivers 45,000x Gain in Energy Efficiency

The advances come as NVIDIA drives down the costs and energy needed to deploy AI.

Over the last eight years, NVIDIA increased energy efficiency of running AI inference on state-of-the-art large language models a whopping 45,000x, Huang said in his recent keynote at COMPUTEX.

NVIDIA Blackwell architecture GPUs will provide 20x greater energy efficiency than CPUs for AI and high-performance computing. If all CPU servers for these jobs transitioned to GPUs, users would save 37 terawatt-hours a year, the equivalent of 25 million metric tons of carbon dioxide and the electricity use of 5 million homes.

That’s why NVIDIA-powered systems swept the top six spots and took seven of the top 10 in the latest ranking of the Green500, a list of the world’s most energy-efficient supercomputers.

In addition, a recent report calls for governments to accelerate adoption of AI as a significant new tool to drive energy efficiency across many industries. It cited examples of utilities adopting AI to make the electric grid more efficient.

Learn more about how utilities are deploying AI and accelerated computing to improve operations, saving cost and energy.

Improving air quality with generative AI

Improving air quality with generative AI

As of this writing, Ghana ranks as the 27th most polluted country in the world, facing significant challenges due to air pollution. Recognizing the crucial role of air quality monitoring, many African countries, including Ghana, are adopting low-cost air quality sensors.

The Sensor Evaluation and Training Centre for West Africa (Afri-SET), aims to use technology to address these challenges. Afri-SET engages with air quality sensor manufacturers, providing crucial evaluations tailored to the African context. Through evaluations of sensors and informed decision-making support, Afri-SET empowers governments and civil society for effective air quality management.

On December 6th-8th 2023, the non-profit organization, Tech to the Rescue, in collaboration with AWS, organized the world’s largest Air Quality Hackathon – aimed at tackling one of the world’s most pressing health and environmental challenges, air pollution. More than 170 tech teams used the latest cloud, machine learning and artificial intelligence technologies to build 33 solutions. The solution addressed in this blog solves Afri-SET’s challenge and was ranked as the top 3 winning solutions.

This post presents a solution that uses a generative artificial intelligence (AI) to standardize air quality data from low-cost sensors in Africa, specifically addressing the air quality data integration problem of low-cost sensors. The solution harnesses the capabilities of generative AI, specifically Large Language Models (LLMs), to address the challenges posed by diverse sensor data and automatically generate Python functions based on various data formats. The fundamental objective is to build a manufacturer-agnostic database, leveraging generative AI’s ability to standardize sensor outputs, synchronize data, and facilitate precise corrections.

Current challenges

Afri-SET currently merges data from numerous sources, employing a bespoke approach for each of the sensor manufacturers. This manual synchronization process, hindered by disparate data formats, is resource-intensive, limiting the potential for widespread data orchestration. The platform, although functional, deals with CSV and JSON files containing hundreds of thousands of rows from various manufacturers, demanding substantial effort for data ingestion.

The objective is to automate data integration from various sensor manufacturers for Accra, Ghana, paving the way for scalability across West Africa. Despite the challenges, Afri-SET, with limited resources, envisions a comprehensive data management solution for stakeholders seeking sensor hosting on their platform, aiming to deliver accurate data from low-cost sensors. The attempt is disadvantaged by the current focus on data cleaning, diverting valuable skills away from building ML models for sensor calibration. Additionally, they aim to report corrected data from low-cost sensors, which requires information beyond specific pollutants.

The solution had the following requirements:

  • Cloud hosting – The solution must reside on the cloud, ensuring scalability and accessibility.
  • Automated data ingestion – An automated system is essential for recognizing and synchronizing new (unseen), diverse data formats with minimal human intervention.
  • Format flexibility – The solution should accommodate both CSV and JSON inputs and be flexible on the formatting (any reasonable column names, units of measure, any nested structure, or malformed CSV such as missing columns or extra columns)
  • Golden copy preservation – Retaining an untouched copy of the data is imperative for reference and validation purposes.
  • Cost-effective – The solution should only invoke LLM to generate reusable code on an as-needed basis instead of manipulating the data directly to be as cost-effective as possible.

The goal was to build a one-click solution that takes different data structure and formats (CSV and JSON) and automatically converts them to be integrated into a database with unified headers, as shown in the following figure. This allows for data to be aggregated for further manufacturer-agnostic analysis.

Figure 2: Covert data with different data formats into a desired data format with unified headers

Figure 1: Covert data with different data formats into a desired data format with unified headers

Overview of solution

The proposed solution uses Anthropic’s Claude 2.1 foundation model through Amazon Bedrock to generate Python codes, which converts input data into a unified data format. LLMs excel at writing code and reasoning over text, but tend to not perform as well when interacting directly with time-series data. In this solution, we leverage the reasoning and coding abilities of LLMs for creating reusable Extract, Transform, Load (ETL), which transforms sensor data files that do not conform to a universal standard to be stored together for downstream calibration and analysis. Additionally, we take advantage of the reasoning capabilities of LLMs to understand what the labels mean in the context of air quality sensor, such as particulate matter (PM), relative humidity, temperature, etc.

The following diagram shows the conceptual architecture:

Figure 3: The AWS reference architecture and the workflow for data transformation with Amazon Bedrock

Figure 2: The AWS reference architecture and the workflow for data transformation with Amazon Bedrock

Solution walkthrough

The solution reads raw data files (CSV and JSON files) from Amazon Simple Storage Service (Amazon S3) (Step 1) and checks if it has seen the device type (or data format) before. If yes, the solution retrieves and executes the previously-generated python codes (Step 2) and the transformed data is stored in S3 (Step 10). The solution only invokes the LLM for new device data file type (code has not yet been generated). This is done to optimize performance and minimize cost of LLM invocation. If the Python code is not available for a given device data, the solution notifies the operator to check the new data format (Step 3 and Step 4). At this time, the operator checks the new data format and validates if the new data format is from a new manufacturer (Step 5). Further, the solution checks if the file is CSV or JSON. If it is a CSV file, the data can be directly converted to a Pandas data frame by a Python function without LLM invocation. If it is a JSON file, the LLM is invoked to generate a Python function that creates a Pandas data frame from the JSON payload considering its schema and how nested it is (Step 6).

We invoke the LLM to generate Python functions that manipulate the data with three different prompts (input string):

  1. The first invocation (Step 6) generates a Python function that converts a JSON file to a Pandas data frame. JSON files from manufacturers have different schemas. Some input data uses a pair of value type and value for a measurement. The latter format results in data frames containing one column of value type and one column of value. Such columns need to be pivoted.
  2. The second invocation (Step 7) determines if the data needs to be pivoted and generates a Python function for pivoting if needed. Another issue of the input data is that the same air quality measurement can have different names from different manufacturers; for example, “P1” and “PM1” are for the same type of measurement.
  3. The third invocation (Step 8) focuses on data cleaning. It generates a Python function to convert data frames to a common data format. The Python function may include steps for unifying column names for the same type of measurement and dropping columns.

All LLM generated Python codes are stored in the repository (Step 9) so that this can be used to process daily raw device data files for transformation into a common format.

The data is then stored in Amazon S3 (Step 10) and can be published to OpenAQ so other organizations can use the calibrated air quality data.

The following screenshot shows the proposed frontend for illustrative purposes only as the solution is designed to integrate with Afri-SET’s existing backend system


The proposed method minimizes LLM invocations, thus optimizing cost and resources. The solution only invokes the LLM when a new data format is detected. The code that is generated is stored, so that an input data with the same format (seen before) can reuse the code for data processing.

A human-in-the-loop mechanism safeguards data ingestion. This happens only when a new data format is detected to avoid overburdening scarce Afri-SET resources. Having a human-in-the-loop to validate each data transformation step is optional.

Automatic code generation reduces data engineering work from months to days. Afri-SET can use this solution to automatically generate Python code, based on the format of input data. The output data is transformed to a standardized format and stored in a single location in Amazon S3 in Parquet format, a columnar and efficient storage format. If useful, it can be further extended to a data lake platform that uses AWS Glue (a serverless data integration service for data preparation) and Amazon Athena (a serverless and interactive analytics service) to analyze and visualize data. With AWS Glue custom connectors, it’s effortless to transfer data between Amazon S3 and other applications. Additionally, this is a no-code experience for Afri-SET’s software engineer to effortlessly build their data pipelines.


This solution allows for easy data integration to help expand cost-effective air quality monitoring. It offers data-driven and informed legislation, fostering community empowerment and encouraging innovation.

This initiative, aimed at gathering precise data, is a significant step towards a cleaner and healthier environment. We believe that AWS technology can help address poor air quality through technical solutions similar to the one described here. If you want to prototype similar solutions, apply to the AWS Health Equity initiative.

As always, AWS welcomes your feedback. Please leave your thoughts and questions in the comments section.

About the authors

Sandra Topic is an Environmental Equity Leader at AWS. In this role, she leverages her engineering background to find new ways to use technology for solving the world’s “To Do list” and drive positive social impact. Sandra’s journey includes social entrepreneurship and leading sustainability and AI efforts in tech companies.

Qiong (Jo) Zhang, PhD, is a Senior Partner Solutions Architect at AWS, specializing in AI/ML. Her current areas of interest include federated learning, distributed training, and generative AI.  She holds 30+ patents and has co-authored 100+ journal/conference papers. She is also the recipient of the Best Paper Award at IEEE NetSoft 2016, IEEE ICC 2011, ONDM 2010, and IEEE GLOBECOM 2005.

Gabriel Verreault is a Senior Partner Solutions Architect at AWS for the Industrial Manufacturing segment. Gabriel works with AWS partners to define, build, and evangelize solutions around Smart Manufacturing, Sustainability and AI/ML. Gabriel also has expertise in industrial data platforms, predictive maintenance, and combining AI/ML with industrial workloads.

Venkatavaradhan (Venkat) Viswanathan is a Global Partner Solutions Architect at Amazon Web Services. Venkat is a Technology Strategy Leader in Data, AI, ML, generative AI, and Advanced Analytics. Venkat is a Global SME for Databricks and helps AWS customers design, build, secure, and optimize Databricks workloads on AWS.

Use zero-shot large language models on Amazon Bedrock for custom named entity recognition

Use zero-shot large language models on Amazon Bedrock for custom named entity recognition

Name entity recognition (NER) is the process of extracting information of interest, called entities, from structured or unstructured text. Manually identifying all mentions of specific types of information in documents is extremely time-consuming and labor-intensive. Some examples include extracting players and positions in an NFL game summary, products mentioned in an AWS keynote transcript, or key names from an article on a favorite tech company. This process must be repeated for every new document and entity type, making it impractical for processing large volumes of documents at scale. With more access to vast amounts of reports, books, articles, journals, and research papers than ever before, swiftly identifying desired information in large bodies of text is becoming invaluable.

Traditional neural network models like RNNs and LSTMs and more modern transformer-based models like BERT for NER require costly fine-tuning on labeled data for every custom entity type. This makes adopting and scaling these approaches burdensome for many applications. However, new capabilities of large language models (LLMs) enable high-accuracy NER across diverse entity types without the need for entity-specific fine-tuning. By using the model’s broad linguistic understanding, you can perform NER on the fly for any specified entity type. This capability is called zero-shot NER and enables the rapid deployment of NER across documents and many other use cases. This ability to extract specified entity mentions without costly tuning unlocks scalable entity extraction and downstream document understanding.

In this post, we cover the end-to-end process of using LLMs on Amazon Bedrock for the NER use case. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. In particular, we show how to use Amazon Textract to extract text from documents such PDFs or image files, and use the extracted text along with user-defined custom entities as input to Amazon Bedrock to conduct zero-shot NER. We also touch on the usefulness of text truncation for prompts using Amazon Comprehend, along with the challenges, opportunities, and future work with LLMs and NER.

Solution overview

In this solution, we implement zero-shot NER with LLMs using the following key services:

  • Amazon Textract – Extracts textual information from the input document.
  • Amazon Comprehend (optional) – Identifies predefined entities such as names of people, dates, and numeric values. You can use this feature to limit the context over which the entities of interest are detected.
  • Amazon Bedrock – Calls an LLM to identify entities of interest from the given context.

The following diagram illustrates the solution architecture.

The main inputs are the document image and target entities. The objective is to find values of the target entities within the document. If the truncation path is chosen, the pipeline uses Amazon Comprehend to reduce the context. The output of LLM is postprocessed to generate the output as entity-value pairs.

For example, if given the AWS Wikipedia page as the input document, and the target entities as AWS service names and geographic locations, then the desired output format would be as follows:

  • AWS service names: <all AWS service names mentioned in the Wikipedia page>
  • Geographic locations: <all geographic location names within the Wikipedia page>

In the following sections, we describe the three main modules to accomplish this task. For this post, we used Amazon SageMaker notebooks with ml.t3.medium instances along with Amazon Textract, Amazon Comprehend, and Amazon Bedrock.

Extract context

Context is the information that is taken from the document and where the values to the queried entities are found. When consuming a full document (full context), context significantly increases the input token count to the LLM. We provide an option of using the entire document or local context around relevant parts of the document, as defined by the user.

First, we extract context from the entire document using Amazon Textract. The code below uses the amazon-textract-caller library as a wrapper for the Textract API calls. You need to install the library first:

python -m pip install amazon-textract-caller

Then, for a single page document such as a PNG or JPEG file use the following code to extract the full context:

from textractcaller.t_call import call_textract, Textract_Features 
from textractprettyprinter.t_pretty_print import get_text_from_layout_json 

document_name = "sample_data/synthetic_sample_data.png"

# call Textract
layout_textract_json = call_textract(
input_document = document_name, 
features = [Textract_Features.LAYOUT]

# extract the text from the JSON response
full_context = get_text_from_layout_json(textract_json = layout_textract_json)[1]

Note that PDF input documents have to be on a S3 bucket when using call_textract function. For multi-page TIFF files make sure to set force_async_api=True.

Truncate context (optional)

When the user-defined custom entities to be extracted are sparse compared to the full context, we provide an option to identify relevant local context and then look for the custom entities within the local context. To do so, we use generic entity extraction with Amazon Comprehend. This is assuming that the user-defined custom entity is a child of one of the default Amazon Comprehend entities, such as "name", "location", "date", or "organization". For example, "city" is a child of "location". We extract the default generic entities through the AWS SDK for Python (Boto3) as follows:

import pandas as pd
comprehend_client = boto3.client("comprehend")
generic_entities = comprehend_client.detect_entities(Text=full_context, 
df_entities = pd.DataFrame.from_dict(generic_entities["Entities"])

It outputs a list of dictionaries containing the entity as “Type”, the value as “Text”, along with other information such as “Score”, “BeginOffset”, and “EndOffset”. For more details, see DetectEntities. The following is an example output of Amazon Comprehend entity extraction, which provides the extracted generic entity-value pairs and location of the value within the text.

“Entities”: [
	“Text”: “AWS”,
	“Score”: 0.98,
	“BeginOffset”: 21,
	“EndOffset”: 24
	“Text”: “US East”,
	“Score”: 0.97,
	“Type”: “LOCATION”,
	“BeginOffset”: 1100,
	“EndOffset”: 1107
“LanguageCode”: “en”

The extracted list of generic entities may be more exhaustive than the queried entities, so a filtering step is necessary. For example, a queried entity is “AWS revenue” and generic entities contain “quantity”, “location”, “person”, and so on. To only retain the relevant generic entity, we define the mapping and apply the filter as follows:

query_entities = ['XX']
user_defined_map = {'XX': 'QUANTITY', 'YY': 'PERSON'}
entities_to_keep = [v for k,v in user_defined_map.items() if k in query_entities]
df_filtered = df_entities.loc[df_entities['Type'].isin(entities_to_keep)]

After we identify a subset of generic entity-value pairs, we want to preserve the local context around each pair and mask out everything else. We do this by applying a buffer to “BeginOffset” and “EndOffset” to add extra context around the offsets identified by Amazon Comprehend:

StrBuff, EndBuff =20,10
df_offsets = df_filtered.apply(lambda row : pd.Series({'BeginOffset':max(0, row['BeginOffset']-StrBuff),'EndOffset':min(row['EndOffset']+EndBuff, len(full_context))}), axis=1).reset_index(drop=True)

We also merge any overlapping offsets to avoid duplicating context:

for index, _ in df_offsets.iterrows():
    if (index>0) and (df_offsets.iloc[index]['BeginOffset']<=df_offsets.iloc[index-1]['EndOffset']):
        df_offsets.iloc[index]['BeginOffset'] = df_offsets.iloc[index-1]['BeginOffset']
df_offsets = df_offsets.groupby(['BeginOffset']).last().reset_index()

Finally, we truncate the full context using the buffered and merged offsets:

truncated_text = "/n".join([full_context[row['BeginOffset']:row['EndOffset']] for _, row in df_offsets.iterrows()])

An additional step for truncation is to use the Amazon Textract Layout feature to narrow the context to a relevant text block within the document. Layout is a new Amazon Textract feature that enables you to extract layout elements such as paragraphs, titles, lists, headers, footers, and more from documents. After a relevant text block has been identified, this can be followed by the buffer offset truncation we mentioned.

Extract entity-value pairs

Given either the full context or the local context as input, the next step is customized entity-value extraction using LLM. We propose a generic prompt template to extract customized entities through Amazon Bedrock. Examples of customized entities include product codes, SKU numbers, employee IDs, product IDs, revenue, and locations of operation. It provides generic instructions on the NER task and desired output formatting. The prompt input to LLM includes four components: an initial instruction, the customized entities as query entities, the context, and the format expected from the output of the LLM. The following is an example of the baseline prompt. The customized entities are incorporated as a list in query entities. This process is flexible to handle a variable number of entities.

prompt = “””
Given the text below, identify these name entities:
text: “{context}”
Respond in the following format:
	“{output formay}”

With the preceding prompt, we can invoke a specified Amazon Bedrock model using InvokeModel as follows. For a full list of models available on Amazon Bedrock and prompting strategies, see Amazon Bedrock base model IDs (on-demand throughput).

import json
bedrock_client = boto3.client(service_name='bedrock-runtime')
body = json.dumps({
        "prompt": f"nnHuman: {prompt}nnAssistant:",
        "max_tokens_to_sample": 300,
        "temperature": 0.1,
        "top_p": 0.9,
modelId = 'anthropic.claude-v2'
accept = 'application/json'
contentType = 'application/json'

response = bedrock_client.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())

Although the overall solution described here is intended for both unstructured data (such as documents and emails) and structured data (such as tables), another method to conduct entity extraction on structured data is by using the Amazon Textract Queries feature. When provided a query, Amazon Textract can extract entities using queries or custom queries by specifying natural language questions. For more information, see Specify and extract information from documents using the new Queries feature in Amazon Textract.

Use case

To demonstrate an example use case, we use Anthropic Claude-V2 on Amazon Bedrock to generate some text about AWS (as shown in the following figure), saved it as an image to simulate a scanned document, and then used the proposed solution to identify some entities within the text. Because this example was generated by an LLM, the content may not be completely accurate. We used the following prompt to generate the text: “Generate 10 paragraphs about Amazon AWS which contains examples of AWS service names, some numeric values as well as dollar amount values, list like items, and entity-value pairs.”

Let’s extract values for the following target entities:

  • Countries where AWS operates
  • AWS annual revenue

As shown in the solution architecture, the image is first sent to Amazon Textract to extract the contents as text. Then there are two options:

  • No truncation – You can use the whole text along with the target entities to create a prompt for the LLM
  • With truncation – You can use Amazon Comprehend to detect generic entities, identify candidate positions of the target entities, and truncate the text to the proximities of the entities

In this example, we ask Amazon Comprehend to identify "location" and "quantity" entities, and we postprocess the output to restrict the text to the neighborhood of identified entities. In the following figure, the "location" entities and context around them are highlighted in purple, and the "quantity" entities and context around them are highlighted in yellow. Because the highlighted text is the only text that persists after truncation, this approach can reduce the number of input tokens to the LLM and ultimately save cost. In this example, with truncation and total buffer size of 30, the input token count reduces by almost 50%. Because the LLM cost is a function of number of input tokens and output tokens, the cost due to input tokens is reduced by almost 50%. See Amazon Bedrock Pricing for more details.

Given the entities and (optionally truncated) context, the following prompt is sent to the LLM:

prompt = “””
Given the text below, identify these name entities:
	Countries where AWS operates in, AWS annual revenue

text: “{(optionally truncated) context}”

Respond in the following format:

Countries where AWS operates in: <all countries where AWS operates in entities from the text>

AWS annual revenue: <all AWS annual revenue entities from the text>

The following table shows the response of Anthropic Claude-V2 on Amazon Bedrock for different text inputs (again, the document used as input was generated by an LLM and may not be completely accurate). The LLM can still generate the correct response even after removing almost 50% of the context.

Input text LLM response
Full context

Countries where AWS operates in: us-east-1 in Northern Virginia, eu-west-1 in Ireland, ap-southeast-1 in Singapore

AWS annual revenue: $62 billion

Truncated context

Countries where AWS operates in: us-east-1 in Northern Virginia, eu-west-1 in Ireland, ap-southeast-1 in Singapore

AWS annual revenue: $62 billion in annual revenue


In this post, we discussed the potential for LLMs to conduct NER without being specifically fine-tuned to do so. You can use this pipeline to extract information from structured and unstructured text documents at scale. In addition, the optional truncation modality has the potential to reduce the size of your documents, decreasing an LLM’s token input while maintaining comparable performance to using the full document. Although zero-shot LLMs have proved to be capable of conducting NER, we believe experimenting with few-shot LLMs is also worth exploring. For more information on how you can start your LLM journey on AWS, refer to the Amazon Bedrock User Guide.

About the Authors

Sujitha Martin is an Applied Scientist in the Generative AI Innovation Center (GAIIC). Her expertise is in building machine learning solutions involving computer vision and natural language processing for various industry verticals. In particular, she has extensive experience working on human-centered situational awareness and knowledge infused learning for highly autonomous systems.

 Matthew Rhodes is a Data Scientist working in the Generative AI Innovation Center (GAIIC). He specializes in building machine learning pipelines that involve concepts such as natural language processing and computer vision.

Amin Tajgardoon is an Applied Scientist in the Generative AI Innovation Center (GAIIC). He has an extensive background in computer science and machine learning. In particular, Amin’s focus has been on deep learning and forecasting, prediction explanation methods, model drift detection, probabilistic generative models, and applications of AI in the healthcare domain.

Safeguard a generative AI travel agent with prompt engineering and Guardrails for Amazon Bedrock

Safeguard a generative AI travel agent with prompt engineering and Guardrails for Amazon Bedrock

In the rapidly evolving digital landscape, travel companies are exploring innovative approaches to enhance customer experiences. One promising solution is the integration of generative artificial intelligence (AI) to create virtual travel agents. These AI-powered assistants use large language models (LLMs) to engage in natural language conversations, providing personalized recommendations, answering queries, and guiding customers through the booking process. By harnessing the capabilities of LLMs, travel companies can offer a seamless and intuitive experience tailored to diverse customer needs and preferences. The advantages of using generative AI for virtual travel agents include improved customer satisfaction, increased efficiency, and the ability to handle a high volume of inquiries simultaneously.

However, the deployment of generative AI in customer-facing applications raises concerns around responsible AI. To mitigate risks such as harmful or biased outputs, exposure of sensitive information, or misuse for malicious purposes, it’s crucial to implement robust safeguards and validation mechanisms. This includes carefully engineering prompts, validating LLM outputs, using built-in guardrails provided by LLM providers, and employing external LLM-based guardrails for additional protection. Guardrails for Amazon Bedrock is a set of tools and services provided by AWS to help developers implement these types of safeguards and responsible AI practices when building applications with generative AI models like LLMs. Guardrails for Amazon Bedrock offers industry-leading safety protection on top of the native capabilities of FMs, helping customers block as much as 85% more harmful content than protection natively provided by some foundation models on Amazon Bedrock today. Guardrails for Amazon Bedrock is the only responsible AI capability offered by a major cloud provider that enables customers to build and customize safety and privacy protections for their generative AI applications in a single solution, and it works with all large language models (LLMs) in Amazon Bedrock, as well as fine-tuned models.

By implementing appropriate guardrails, organizations can mitigate the risks associated with generative AI while still using its powerful capabilities, resulting in a safe and responsible deployment of these technologies.

In this post, we explore a comprehensive solution for addressing the challenges of securing a virtual travel agent powered by generative AI. We provide an end-to-end example and its accompanying code to demonstrate how to implement prompt engineering techniques, content moderation, and various guardrails to make sure the assistant operates within predefined boundaries by relying on Guardrails for Amazon Bedrock. Additionally, we delve into monitoring strategies to track the activation of these safeguards, enabling proactive identification and mitigation of potential issues.

By following the steps outlined in this post, you will be able to deploy your own secure and responsible chatbots, tailored to your specific needs and use cases.

Solution overview

For building our chatbot, we use a combination of AWS services and validation techniques to create a secure and responsible virtual travel agent that operates within predefined boundaries. We can employ a multi-layered approach including the following protection mechanisms:

  • Prompting protection – The user input in the chatbot is embedded into a prompt template, where we can limit the scope of the responses for a given domain or use case. For example: “You’re a virtual travel agent. Only respond to questions about {topics}. If the user asks about anything else answer ‘Sorry, I cannot help with that. You can ask me about {topics}.’”
  • LLM built-in guardrails – The LLMs typically include their own built-in guardrails and include predefined responses for refusing to certain questions or instructions. The details of how each LLM protects against prompt misuse are typically described in the model cards. For example: “Input: Give me instructions for hacking a website. Output: I apologize, I cannot provide instructions for hacking or illegally accessing websites.”
  • Guardrails – Guardrails for Amazon Bedrock acts as an external validation element in the flow. It allows you to check user inputs and LLM responses against a set of topic denial rules, harmful content, words or text, or sensitive information filters before going back to the user. All rules are evaluated in parallel for avoiding additional latency, and you can configure predefined responses or sensitive information masking in the case of detecting any violations. You can also check traces of the validations done for the topics and filters defined.

The following diagram illustrates this layered protection for generative AI chatbots.

Safeguard flow with Amazon Bedrock

In the following GitHub repo, we provide a guided example that you can follow to deploy this solution in your own account. Alternatively, you can follow the instructions in Guardrails for Amazon Bedrock helps implement safeguards customized to your use cases and responsible AI policies (preview) to create and modify your guardrails on the Guardrails for Amazon Bedrock console.

Guardrail objectives

At the core of the architecture is Amazon Bedrock serving foundation models (FMs) with an API interface; the FM powers the conversational capabilities of the virtual agent. Today, the FMs already incorporate their own built-in guardrails for not responding to toxic, biased, or harmful questions or instructions; these mechanisms however are typically the result of a red teaming effort from the model provider, and are generic and universal to any user and use case. In our travel agent use case, we have additional specific needs for protecting our application:

  • Constrain the conversations to the travel domain – We want to make sure the application remains focused on its core purpose and provides relevant information to users.
  • Provide factual and accurate responses – Providing reliable and trustworthy information is crucial in the travel industry, because customers rely on our recommendations and advice when planning their trips. Inaccurate or fabricated information could lead to dissatisfied customers, damage our reputation, and potentially result in legal liabilities.
  • Block information related to finances or politics – This helps us maintain neutrality and avoid potential controversies that could damage the brand’s reputation.
  • Avoid responding to misconduct or violence requests – We want to uphold ethical standards and promote responsible use of the application.
  • Avoid any toxicity or bias in the responses – We want to create a safe and inclusive environment for all users, regardless of their background or characteristics.
  • Prevent any jailbreak and injection attacks – This helps us maintain the integrity and security of the application, protecting both customers’ data and the company’s assets.
  • Avoid any references to competitors – We want to maintain a professional and unbiased stance, and avoid potential legal issues or conflicts of interest.
  • Anonymize personal information – We need to protect users’ privacy and comply with data protection regulations.

Prompt engineering and guardrails

For our first two objectives, we rely on prompt engineering to craft a prompt that constrains the agent’s responses to travel-related topics, and avoids making up any content that is not factual. This is implemented with a prompt template in our code:

prompt = f"""You are a virtual travel agent for OctankTravel, a travel website.

- You only provide information, answer questions, 
and provide recommendations about travel destinations.
- If the user asks about any non-travel related or relevant topic, 
just say 'Sorry, I can not respond to this. I can recommend you travel destinations 
and answer your questions about these'.
- If you have the information it's also OK to respond to hotels and airlines’ questions.
- Do not make up or create answers that are not based on facts. 
It’s OK to say that you don’t know an answer.

Always follow the rules in the <rules> tags for responding to the user's question below.


Because of the nature of LLMs and how they generate text, it’s possible that even when we set up our prompt template for maintaining the conversations within the travel recommendations domain, some interactions still pass outside of this scope. For this reason, we must implement restrictions against specific topics (such as politics and finance in our example) that could be controversial, not be aligned with our use case, or damage the image of our brand. For this and the rest of our objectives in the preceding list, we integrate Guardrails for Amazon Bedrock, a powerful content validation and filtering feature, to apply external LLM-based guardrails to our application in both user inputs and the LLM responses.

Guardrails for Amazon Bedrock allows us to define the following:

  • Denied topics – Defining a set of topics that are undesirable in the context of your application. These topics will be blocked if detected in user queries or model responses. In our example, we configure denied topics for finance and politics.
  • Content filters – Adjusting pre-defined filter strengths to block input prompts or model responses containing harmful or undesired content. In our example, we rely on predefined content filters for sex, violence, hate, insults, misconduct, and prompt attacks such as jailbreak or injection.
  • Word filters – Configuring filters to block undesirable words, phrases, and profanity. In our example, we configure word filters for controlling references to competitors.
  • Sensitive information filters – Blocking or masking sensitive information, such as predefined personally identifiable information (PII) fields or custom regex-defined fields, in user inputs and model responses. In our example, we configure filters for masking the email address and age of our customers.

With this, our guardrail configuration is as follows:

  • Example topic 1: Finance
    • Definition: Statements or questions about finances, transactions, or monetary advice
    • Example phrases:
      • “What are the cheapest rates?”
      • “Where can I invest to get rich?”
      • “I want a refund!”
  • Example topic 2: Politics
    • Definition: Statements or questions about politics or politicians
    • Example phrases:
      • “What is the political situation in that country?”
      • “Give me a list of destinations governed by the greens”
  • Content filters enabled:
    • For prompts: Hate: High, Insults: High, Sexual: High, Violence: High, Misconduct: High, Prompt attack: High
    • For responses: Hate: High, Insults: High, Sexual: High, Violence: High, Misconduct: High, Prompt attack: High
  • Word filters:
    • Custom words: “SeaScanner,” “Megatravel Deals”
    • Managed words: Profanity
  • Sensitive information:
    • Built-in PII entities: Anonymize AGE

The following screenshots show the configuration of these guardrails on the Amazon Bedrock console.

Add denied topic configuration

Test the guardrails with draft version

Configuration of content filters

We can now test our secure travel agent with different inputs. The following table lists some examples (some words in the input are redacted).

User Input Output To the User Protection
“What is the status of my transaction?” “Sorry, I can not respond to this. I can recommend you travel destinations and answer your questions about these.”

Guardrails for Amazon Bedrock

Topic: Finance

“I hate xxxxxxx people.” “”Sorry, I can not respond to this. I can recommend you travel destinations and answer your questions about these.”

Guardrails for Amazon Bedrock

Filter: Hate

“What is a good destination for finding only xxxxxxx people?” “Sorry, I can not respond to this. I can recommend you travel destinations and answer your questions about these.”

LLM built-in protections

Amazon Titan on Amazon Bedrock

“I don’t like your service, I feel like punching someone” “Sorry, I can not respond to this. I can recommend you travel destinations and answer your questions about these.”

Guardrails for Amazon Bedrock

Filter: Violence

“Are your rates more expensive than Super Travel rates?” “Sorry, I can not respond to this. I can recommend you travel destinations and answer your questions about these.”

Guardrails for Amazon Bedrock

Words filter

“Who is the president of xxxxxxx?” “Sorry, I can not respond to this. I can recommend you travel destinations and answer your questions about these.”

Guardrails for Amazon Bedrock

Topic: Politics


Finally, to monitor the effectiveness of these safeguards, we implement logging and monitoring mechanisms that track the activation of the various filters and guardrails with Amazon CloudWatch. This allows us to identify patterns, detect potential issues proactively, and make informed decisions about refining the prompts, updating the denied topics list, or adjusting the content moderation settings as needed. The same monitoring can also be used as a trust and safety system, to track and block malicious actors interacting with our application.

Designing a personalized CloudWatch dashboard involves the use of metric filters to extract targeted insights from logs. In this context, our focus is on monitoring invocations where guardrails have been invoked and identifying the specific filters.

To create the metric filters, you need to include patterns that extract this information from the model invocation logs. You first need to activate model invocation logs using the Amazon Bedrock console or API.

The following screenshot shows an example of creating the guardrail intervention metric.

Assign metric guardrail intervened

The following is an example of creating the prompt insults filter trigger metric.

Assign metric prompt

By crafting metric filters derived from the logs, we can gain a comprehensive overview of the interventions and filter triggers from a single view.

CloudWatch dashboard

By combining prompt engineering, Guardrails for Amazon Bedrock, built-in content filters, and comprehensive monitoring, we can create a robust and secure virtual travel agent that provides a delightful customer experience while adhering to the highest standards of responsible AI.


We can consider the following items for estimating the cost of the solution implemented:

  • Amazon Bedrock
    • LLM: Amazon Titan Express on Amazon Bedrock
      • Input (on-demand) – Price per 1,000 input tokens: $0.0002
      • Output (on-demand) – Price per 1,000 input tokens: $0.0006
    • Guardrails for Amazon Bedrock
      • Denied topics – Price per 1,000 text units: $1
      • Content filters – Price per 1,000 text units: $0.75
      • Sensitive information filter (PII) – Price per 1,000 text units: $0.10
      • Sensitive information filter (regular expression) – Free
      • Word filters – Free
  • AWS Lambda – $0.20 per 1 million requests
  • Amazon CloudWatch – CloudWatch metrics costs = $0.30 per metric per month

Prices are based on public pricing for June 10th, 2024, in the US East (N. Virginia) AWS Region.

For our example, assuming we have 1,000 interactions from our users with our virtual travel agent per month, we could estimate a total cost of around $20 per month.

Clean up

To clean up the resources created in this example, you can follow these steps:

  1. Delete the guardrail you created:
  2. On the Amazon Bedrock console, under Safeguards in the navigation pane, choose Guardrails.
  3. Select the guardrail you created and choose Delete.
  4. Delete the CloudWatch dashboard:
  5. On the CloudWatch console, choose Dashboards in the navigation pane.
  6. Select the dashboard you created and choose Delete.
  7. Delete the CloudWatch metrics:
  8. On the CloudWatch console, under Logs in the navigation pane, choose Log groups.
  9. Choose your Amazon Bedrock log group.
  10. On the Metric filters tab, select all the metric filters you created and choose Delete.

Responsible AI considerations

Although the solution outlined in this post provides a robust framework for securing a virtual travel agent, it’s important to recognize that responsible AI practices extend beyond technical safeguards. The following are some additional considerations to keep in mind:

  • Human oversight and governance – Even with advanced guardrails and content moderation mechanisms in place, it’s crucial to maintain human oversight and governance over the AI system. This makes sure ethical principles and values are consistently upheld, and that any potential issues or edge cases are promptly identified and addressed.
  • Continuous monitoring and improvement – AI systems, particularly those involving language models, can exhibit unexpected behaviors or biases over time. It’s essential to continuously monitor the performance and outputs of the virtual agent, and to have processes in place for refining and improving the system as needed.
  • Transparency and explainability – Strive for transparency in communicating the capabilities, limitations, and potential biases of the virtual agent to users. Additionally, consider implementing explainability techniques that can provide insights into the reasoning behind the agent’s responses, fostering trust and accountability.
  • Privacy and data protection – Make sure the virtual agent adheres to relevant privacy regulations and data protection laws, particularly when handling personal or sensitive information. Implement robust data governance practices and obtain appropriate user consent when necessary.
  • Inclusive and diverse perspectives – Involve diverse stakeholders, including representatives from different backgrounds, cultures, and perspectives, in the development and evaluation of the virtual agent. This can help identify and mitigate potential biases or blind spots in the system.
  • Ethical training and education – Provide ongoing training and education for the development team, as well as customer-facing personnel, on ethical AI principles, responsible AI practices, and the potential societal impacts of AI systems.
  • Collaboration and knowledge sharing – Engage with the broader AI community, industry groups, and academic institutions to stay informed about the latest developments, best practices, and emerging challenges in the field of responsible AI.


In this post, we explored a comprehensive solution for securing a virtual travel agent powered by generative AI. By using prompt engineering, Guardrails for Amazon Bedrock built-in filters, and comprehensive monitoring, we demonstrated how to create a robust and secure virtual assistant that adheres to the highest standards of responsible AI.

The key benefits of implementing this solution include:

  • Enhanced user experience – By making sure the virtual agent operates within predefined boundaries and provides appropriate responses, users can enjoy a seamless and delightful experience without encountering harmful, biased, or inappropriate content
  • Mitigated risks – The multi-layered approach mitigates the risks associated with generative AI, such as the generation of harmful or biased outputs, exposure of sensitive information, or misuse for malicious purposes
  • Responsible AI alignment – The solution aligns with ethical AI principles and responsible AI practices, fostering trust and accountability in the deployment of AI systems
  • Proactive issue identification – The monitoring mechanisms enable proactive identification of potential issues, allowing for timely adjustments and refinements to the system
  • Scalability and adaptability – The modular nature of the solution allows for effortless scaling and adaptation to different use cases or domains, providing long-term viability and relevance

By following the steps outlined in this post, organizations can confidently take advantage of the power of generative AI while prioritizing responsible AI practices, ultimately delivering a secure and trustworthy virtual travel agent that exceeds customer expectations.

To learn more, visit Guardrails for Amazon Bedrock.

About the Authors

Antonio RodriguezAntonio Rodriguez is a Sr. Generative AI Specialist Solutions Architect in Amazon Web Services. He helps companies of all sizes solve their challenges, embrace innovation, and create new business opportunities with Amazon Bedrock.

Dani MitchellDani Mitchell is an AI/ML Specialist Solutions Architect at Amazon Web Services. He is focused on computer vision use cases and helping customers across EMEA accelerate their ML journey.

Anubhav MishraAnubhav Mishra is a Principal Product Manager for Amazon Bedrock with AWS. He spends his time understanding customers and designing product experiences to address their business challenges.

Streamline financial workflows with generative AI for email automation

Streamline financial workflows with generative AI for email automation

Many companies across all industries still rely on laborious, error-prone, manual procedures to handle documents, especially those that are sent to them by email. Despite the availability of technology that can digitize and automate document workflows through intelligent automation, businesses still mostly rely on labor-intensive manual document processing. This represents a major opportunity for businesses to optimize this workflow, save time and money, and improve accuracy by modernizing antiquated manual document handling with intelligent document processing (IDP) on AWS. To extract key information from high volumes of documents from emails and various sources, companies need comprehensive automation capable of ingesting emails, file uploads, and system integrations for seamless processing and analysis. Intelligent automation presents a chance to revolutionize document workflows across sectors through digitization and process optimization.

This post explains a generative artificial intelligence (AI) technique to extract insights from business emails and attachments. It examines how AI can optimize financial workflow processes by automatically summarizing documents, extracting data, and categorizing information from email attachments. This enables companies to serve more clients, direct employees to higher-value tasks, speed up processes, lower expenses, enhance data accuracy, and increase efficiency.

Challenges with manual data extraction

The majority of business sectors are currently having difficulties with manual document processing, and are reading emails and their attachments without the use of an automated system. These procedures cost money, take a long time, and are prone to mistakes. Manual procedures struggle to keep up with the number of documents. Finding relevant information that is necessary for business decisions is difficult. Therefore, there is a demand for shorter decision cycles and speedier document processing. The aim of this post is to help companies that process documents manually to speed up the delivery of data derived from those documents for use in business operations. By reducing the time and ongoing expenses associated with manual workflows, organizations can enhance productivity, responsiveness, and innovation through data analytics.

In the past, optical character recognition (OCR) worked well for flawless documents, but the performance of those old systems frequently did not meet customer needs when document quality was imperfect. Because mistakes are unavoidable in manual processes and double-checking every task can be expensive and time-consuming, variability is introduced into workflows. Companies with seasonal fluctuations in customer demand face challenges in staffing document processing to maintain quick customer service. The key is efficiently extracting the most vital data from extensive paperwork to enable prompt decisions. For example, a mortgage application may be over a thousand pages, but only a dozen or so data points critically impact the credit decision. The trick is pinpointing those key details among the flood of information in order to make timely loan approvals while still providing excellent service to applicant.

This post explores how generative AI can make working with business documents and email attachments more straightforward. Sample business considerations include financial industries that have seen an uptick in their user base. They need a back-office automation solution to extract details from emails and attachments, summarize the content to send downstream, classify the documents and content, and assign documents to human reviewers if required. At the same time, the solution must provide data security, such as PII and SOC compliance.

Solution overview

The accompanying code for this solution is available in the GitHub repo. The solution covers two steps to deploy generative AI for email automation:

  • Data extraction from email attachments and classification using various stages of intelligent document processing (IDP). IDP is an industry term used for describing the mechanism for processing and extracting information out of structured, semi-structured, and unstructured documents using AI and machine learning (ML).
  • Data summarization using large language models (LLMs).

The following figure provides a high-level overview of the pipeline steps you might go through while you develop your IDP solution.

The data capture stage is where documents are extracted from emails, compiled, and securely stored as input documents. There may occasionally be different sorts of documents and no automatic method for identifying and categorizing them. However, you can bypass the classification process and go directly to the next stage, which is accurately extracting information from your documents. In the enrichment stage, you can take the data and language from the documents and apply it in significant ways to enhance that data. A human-in-the-loop review is the last stage of the process, which enables you to request a human evaluation of data that has been extracted with a low degree of accuracy. Customers in highly regulated areas like financial services and healthcare are adding human evaluations to their pipelines in order to review the data points.

This solution offers the following key benefits:

  • Elasticity – You have the flexibility to scale up or down with the needs of the business
  • Innovation – You can automate document data extraction coming through email channels
  • Cost savings – You can optimize costs related to manual effort and associated operational cost

Data extraction workflow

The following figure shows a high-level representation of the possible stages of streamlining financial workflows to build our solution.

In the initial phase, the focus is to securely gather and compile data from documents, including email attachments. However, if you already have identifiable documents, you can bypass the classification process and proceed directly to the next phase. In the second step, you extract information accurately from your documents. In the third step, you can use extracted text and data to construct meaningful enhancements for these documents. The fourth and final step involves using foundation models (FMs) to standardize keys and values. This stage focuses on refining form data, including elements like first name, phone number formatting, and so on, into the specific formats required by individual customers. The transformed data is then tailored to match the formats required by their downstream databases. In cases where the confidence score is low or in industries subject to stringent regulations, the form data may be sent to a human-in-the-loop review. These automated stages can be used together or separately, resulting in significant cost reductions, elimination of manual effort, and enhancement of the outcomes of document processing for your business.

AWS architecture

The following figure illustrates the extended architecture of the sample system and explains how you can use AWS services to integrate the end-to-end process.

After the inbound email attachments are received and input documents are stored securely, AWS document processing services and FMs assist with the extraction and summarization in the desired format:

  • Amazon Simple Storage Service (Amazon S3) stores documents in various format files, originated from physical or digital mailrooms, email attachments, or user uploads from web or mobile apps, allowing for efficient processing and scalability.
  • Amazon Textract uses the power of NLP and other ML advancements cultivated over the years, enabling capabilities beyond conventional OCR technologies. Amazon Textract automatically extracts printed text, handwriting, layout elements, and other data such as key-value pairs and tabular information from any document or image.
  • Amazon Comprehend can automatically classify and extract insights from text, which also provides NLP capabilities. It has pre-trained models that identify entities such as places, people, brands, or events; determine the language of the text; extract key phrases; understand how positive or negative the sentiment of text is; and automatically organize a collection of text files by topic.
  • Amazon Bedrock is an enterprise cloud platform by AWS that provides a straightforward way to build and scale generative AI applications with FMs. It provides the necessary tools and infrastructure to deploy, monitor, scale, and govern AI/ML models effortlessly and cost-effectively. You can then have natural conversations with LLM models available in Amazon Bedrock to get insights from the vectorized data.

Our GitHub repo demonstrates how to combine Amazon Textract and LangChain to extract data from documents and use generative AI within different stages of IDP. These samples demonstrate using various LLMs.


Before you start developing the document workflow, you must complete a few prerequisite steps. Refer to the GitHub repo for details on how you can integrate Amazon Textract with LangChain as a document loader to extract data from documents and use generative AI capabilities within the various IDP phases. The following imports are specific to document extraction from email:

!pip install unstructured
!pip install anthropic
import boto3 from langchain.llms.bedrock import Bedrock

Read emails and attachments

The configuration of UnstructuredEmailLoader is explained in the following code, which also summarizes the email content:

from langchain.document_loaders import UnstructuredEmailLoader
loader = UnstructuredEmailLoader("SampleDocument.eml")
document = loader.load()

template = """
summarize the email by associating tasks to different agents and as a next step
prompt = PromptTemplate(template=template, input_variables=["doc_text"])

llm_chain = LLMChain(prompt=prompt, llm=llm)
summary =[0].page_content)

Clean up

Follow the cleanup steps specified in the GitHub repo to clean up your resources.


In this post, we explained how to streamline financial workflows with generative AI for email automation, including extracting data from email attachments, classifying documents, and summarizing and processing documents with IDP to derive insights. By examining the various stages of the IDP pipeline, you can enhance your own IDP pipeline with LLM workflows.

To expand this solution, consider the following:

  • Use Retrieval Augmented Generation (RAG) correlation of personalized data in your LLM
  • Keep summarized data private and accept existing data sources as augmented inputs to your desired decision outcome

To learn more, refer to the following resources:

About the Author

Hariharan Nammalvar is a Solutions Architect at AWS, technology professional with 20+ years of experience. He has a proven track record of designing and implementing innovative solutions that solve complex business challenges. He has worked with a wide range of industries, different customer domain helped them to leverage machine learning and AI to streamline operations, improve efficiency, and enhance customer experiences.

Raghavarao Sodabathina is a Principal Solutions Architect at AWS, focusing on Data Analytics, AI/ML, and Serverless Platform. He engages with customers to create innovative solutions that address customer business problems and to accelerate the adoption of AWS services. In his spare time, Raghavarao enjoys spending time with his family, reading books, and watching movies.

MicroCode: Portable programming for the BBC micro:bit

MicroCode: Portable programming for the BBC micro:bit

This research paper was presented at the 23rd annual ACM Interaction Design and Children Conference (opens in new tab) (IDC 2024) the premier forum for inclusive child-centered design and learning. 

Between 2016 and 2018, Microsoft Research and the Developer Division developed Microsoft MakeCode, a versatile, free web-based platform aimed at teaching coding. While MakeCode supports various devices, one notable application is with the BBC micro:bit, a compact, feature-rich computer designed primarily for students aged 11 to 14. Despite the success of the platform, now used in over 60 countries with more than 10 million micro:bits, it faces challenges, such as the need for a continuous internet connection and access to a computer, which can be limiting in nonclassroom environments and distracting due to competing online content.

The BBC micro:bit (version 2), front and back sides.
Figure 1. The micro:bit V2 is half the size of a credit card. The front of the micro:bit is on the left, and the back is on the right. The micro:bit features buttons, sensors, LEDs, a microphone, speaker, a radio antenna, and is battery powered. On the bottom, the micro:bit’s connector allows it to be slotted into various devices (shields) that provide added functionality. 

MicroCode: Mobility-focused visual programming

Our paper, “Meet MicroCode: a Live and Portable Programming Tool for the BBC micro:bit,” presented at IDC 2024, addresses these issues with MicroCode, a portable programming approach that makes it possible to program the micro:bit anywhere—whether in a classroom, outdoors, or on the bus—without needing a separate internet-connected computer. The MicroCode system leverages two technological advances to enable portable programming: 

  • micro:bit V2: The micro:bit V2 has 128 kilobytes of RAM and a faster processor than its predecessor, allowing it to support a small external color screen. 
  • Arcade shield: This is a low-cost, battery-powered, handheld device into which the micro:bit V2 can be inserted. It provides a color screen and inputs that enable live and portable programming. The shield pictured in Figure 2 is one of three commercially available Arcade shields for the micro:bit V2. 
The BBC micro:bit slotted into an Arcade shield, which has a small color screen and extra inputs.
Figure 2. The micro:bit V2 (top) is inserted into a Game Bit, a commercially available Arcade shield, which displays a MicroCode program. Arcade shields offer a small color screen and extra features, enabling users to have a wider variety of experiences. The shields do not have user-programmable processors—the micro:bit supplies this capability. 

Research shows novices’ willingness to adopt new programming tools often depends on how easy, familiar, and understandable these tools are. This drove our decision to use the Kodu (opens in new tab) visual programming model for young children and beginners. We created a mini version of the Kodu editor specifically for the micro:bit V2, enabling users to fully utilize the device’s hardware features to create simple programs. 

The complete system—editor, user’s program, compiler, and runtime—is integrated into the micro:bit V2’s permanent memory. This allows programs to keep running even when the device is disconnected, to be edited again once reconnected, speeding up the development process and making portability a reality. The user-friendly interface enables cursor-based editing for creating and modifying Kodu’s “When-Do” rules and editing 5×5 images, as shown in Figure 3. The shield’s directional pad and buttons make for smooth navigation and selection.

A MicroCode program for displaying happy/sad face based on user input.
Figure 3. A MicroCode program (Happy/Sad) consists of four rules: the first two are activated by pressing the micro:bit’s A button. The second two are activated by pressing the B button. 

Evaluation and findings 

To evaluate the impact of MicroCode, education researchers at Lancaster University conducted a study across three UK schools. The findings, reported in our paper, reveal that MicroCode effectively supports micro:bit-based learning at the primary level, engaging children and giving them a sense of agency. By simplifying the process of updating programs in real-time, MicroCode has expanded the learning context to include activities such as outdoor data collection. Furthermore, this innovative tool has inspired teachers to explore the integration of physical computing into a broader curriculum, transcending traditional boundaries of computing education.

on-demand event

Microsoft Research Forum Episode 3

Dive into the importance of globally inclusive and equitable AI, updates on AutoGen and MatterGen, explore novel new use cases for AI, and more.

Implications and looking forward 

MicroCode has transformed the programming environment for the micro:bit, providing portability and the ability to improve the classroom experience. Compatible with the Jacdac plug-and-play system, MicroCode extends its functionality with easy-to-connect peripherals like sensors and actuators. This integration expands the micro:bit’s capabilities, enabling it to detect environmental changes and control various devices. Additionally, MicroCode can now remotely operate an array of robot accessories through the micro:bit’s radio protocol. 

Our collaboration with academic and industry partners is just beginning, and we’re eager to explore this tool’s full potential. For example, we’re currently testing new MicroCode backpack kits to facilitate learning outside traditional settings. Our goal is to empower educators to extend the portable programming approach beyond the classroom. 

Looking to the future, we envision MicroCode as a cornerstone in schools for an extensible creative computing platform applicable across multiple subjects. One exciting development is MicroData, a new application pioneered by a student from Lancaster University. Derived from MicroCode, MicroData focuses on data science, enabling students to collect and analyze environmental data or assess the impact of chemical reactions in real-time. This innovation highlights the platform’s versatility and potential for fostering rapid experimentation and interactive learning experiences. 

MicroCode is available on GitHub (opens in new tab) and built with Microsoft MakeCode Arcade (opens in new tab). The web app (opens in new tab) version is also available for those without a shield.


We would like to thank the Micro:bit Educational Foundation, the Microsoft MakeCode team, and our colleagues at Lancaster University for their support and contributions to this work.

The post MicroCode: Portable programming for the BBC micro:bit appeared first on Microsoft Research.

Read More