November 2024 – Page 10

Build cost-effective RAG applications with Binary Embeddings in Amazon Titan Text Embeddings V2, Amazon OpenSearch Serverless, and Amazon Bedrock Knowledge Bases

Today, we are happy to announce the availability of Binary Embeddings for Amazon Titan Text Embeddings V2 in Amazon Bedrock Knowledge Bases and Amazon OpenSearch Serverless. With support for binary embedding in Amazon Bedrock and a binary vector store in OpenSearch Serverless, you can use binary embeddings and binary vector store to build Retrieval Augmented Generation (RAG) applications in Amazon Bedrock Knowledge Bases, reducing memory usage and overall costs.

Amazon Bedrock is a fully managed service that provides a single API to access and use various high-performing foundation models (FMs) from leading AI companies. Amazon Bedrock also offers a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock Knowledge Bases, FMs and agents can retrieve contextual information from your company’s private data sources for RAG. RAG helps FMs deliver more relevant, accurate, and customized responses.

Amazon Titan Text Embeddings models generate meaningful semantic representations of documents, paragraphs, and sentences. Amazon Titan Text Embeddings takes as an input a body of text and generates a 1,024 (default), 512, or 256 dimensional vector. Amazon Titan Text Embeddings are offered through latency-optimized endpoint invocation for faster search (recommended during the retrieval step) and throughput-optimized batch jobs for faster indexing. With Binary Embeddings, Amazon Titan Text Embeddings V2 will represent data as binary vectors with each dimension encoded as a single binary digit (0 or 1). This binary representation will convert high-dimensional data into a more efficient format for storage and computation.

Amazon OpenSearch Serverless is a serverless deployment option for Amazon OpenSearch Service, a fully managed service that makes it simple to perform interactive log analytics, real-time application monitoring, website search, and vector search with its k-nearest neighbor (kNN) plugin. It supports exact and approximate nearest-neighbor algorithms and multiple storage and matching engines. It makes it simple for you to build modern machine learning (ML) augmented search experiences, generative AI applications, and analytics workloads without having to manage the underlying infrastructure.

The OpenSearch Serverless kNN plugin now supports 16-bit (FP16) and binary vectors, in addition to 32-bit floating point vectors (FP32). You can store the binary embeddings generated by Amazon Titan Text Embeddings V2 for lower costs by setting the kNN vector field type to binary. The vectors can be stored and searched in OpenSearch Serverless using PUT and GET APIs.

This post summarizes the benefits of this new binary vector support across Amazon Titan Text Embeddings, Amazon Bedrock Knowledge Bases, and OpenSearch Serverless, and gives you information on how you can get started. The following diagram is a rough architecture diagram with Amazon Bedrock Knowledge Bases and Amazon OpenSearch Serverless.

You can lower latency and reduce storage costs and memory requirements in OpenSearch Serverless and Amazon Bedrock Knowledge Bases with minimal reduction in retrieval quality.

We ran the Massive Text Embedding Benchmark (MTEB) retrieval data set with binary embeddings. On this data set, we reduced storage, while observing a 25-times improvement in latency. Binary embeddings maintained 98.5% of the retrieval accuracy with re-ranking, and 97% without re-ranking. Compare these results to the results we got using full precision (float32) embeddings. In end-to-end RAG benchmark comparisons with full-precision embeddings, Binary Embeddings with Amazon Titan Text Embeddings V2 retain 99.1% of the full-precision answer correctness (98.6% without reranking). We encourage customers to do their own benchmarks using Amazon OpenSearch Serverless and Binary Embeddings for Amazon Titan Text Embeddings V2.

OpenSearch Serverless benchmarks using the Hierarchical Navigable Small Worlds (HNSW) algorithm with binary vectors have unveiled a 50% reduction in search OpenSearch Computing Units (OCUs), translating to cost savings for users. The use of binary indexes has resulted in significantly faster retrieval times. Traditional search methods often rely on computationally intensive calculations such as L2 and cosine distances, which can be resource-intensive. In contrast, binary indexes in Amazon OpenSearch Serverless operate on Hamming distances, a more efficient approach that accelerates search queries.

In the following sections we’ll discuss the how-to for binary embeddings with Amazon Titan Text Embeddings, binary vectors (and FP16) for vector engine, and binary embedding option for Amazon Bedrock Knowledge Bases To learn more about Amazon Bedrock Knowledge Bases, visit Knowledge Bases now delivers fully managed RAG experience in Amazon Bedrock.

Generate Binary Embeddings with Amazon Titan Text Embeddings V2

Amazon Titan Text Embeddings V2 now supports Binary Embeddings and is optimized for retrieval performance and accuracy across different dimension sizes (1024, 512, 256) with text support for more than 100 languages. By default, Amazon Titan Text Embeddings models produce embeddings at Floating Point 32 bit (FP32) precision. Although using a 1024-dimension vector of FP32 embeddings helps achieve better accuracy, it also leads to large storage requirements and related costs in retrieval use cases.

To generate binary embeddings in code, add the right embeddingTypes parameter in your invoke_model API request to Amazon Titan Text Embeddings V2:

import json
import boto3
import numpy as np
rt_client = boto3.client("bedrock-runtime")

response = rt_client.invoke_model(modelId="amazon.titan-embed-text-v2:0", 
          body=json.dumps(
               {
                   "inputText":"What is Amazon Bedrock?",
                   "embeddingTypes": ["binary","float"]
               }))['body'].read()

embedding = np.array(json.loads(response)["embeddingsByType"]["binary"], dtype=np.int8)

As in the request above, we can request either the binary embedding alone or both binary and float embeddings. The preceding embedding above is a 1024-length binary vector similar to:

array([0, 1, 1, ..., 0, 0, 0], dtype=int8)

For more information and sample code, refer to Amazon Titan Embeddings Text.

Configure Amazon Bedrock Knowledge Bases with Binary Vector Embeddings

You can use Amazon Bedrock Knowledge Bases, to take advantage of the Binary Embeddings with Amazon Titan Text Embeddings V2 and the binary vectors and Floating Point 16 bit (FP16) for vector engine in Amazon OpenSearch Serverless, without writing a single line of code. Follow these steps:

On the Amazon Bedrock console, create a knowledge base. Provide the knowledge base details, including name and description, and create a new or use an existing service role with the relevant AWS Identity and Access Management (IAM) permissions. For information on creating service roles, refer to Service roles. Under Choose data source, choose Amazon S3, as shown in the following screenshot. Choose Next.
Configure the data source. Enter a name and description. Define the source S3 URI. Under Chunking and parsing configurations, choose Default. Choose Next to continue.
Complete the knowledge base setup by selecting an embeddings model. For this walkthrough, select Titan Text Embedding v2. Under Embeddings type, choose Binary vector embeddings. Under Vector dimensions, choose 1024. Choose Quick Create a New Vector Store. This option will configure a new Amazon Open Search Serverless store that supports the binary data type.

You can check the knowledge base details after creation to monitor the data source sync status. After the sync is complete, you can test the knowledge base and check the FM’s responses.

Conclusion

As we’ve explored throughout this post, Binary Embeddings are an option in Amazon Titan Text Embeddings V2 models available in Amazon Bedrock and the binary vector store in OpenSearch Serverless. These features significantly reduce memory and disk needs in Amazon Bedrock and OpenSearch Serverless, resulting in fewer OCUs for the RAG solution. You’ll also experience better performance and improvement in latency, but there will be some impact on the accuracy of the results compared to using the full float data type (FP32). Although the drop in accuracy is minimal, you have to decide if it suits your application. The specific benefits will vary based on factors such as the volume of data, search traffic, and storage requirements, but the examples discussed in this post illustrate the potential value.

Binary Embeddings support in Amazon Open Search Serverless, Amazon Bedrock Knowledge Bases, and Amazon Titan Text Embeddings v2 are available today in all AWS Regions where the services are already available. Check the Region list for details and future updates. To learn more about Amazon Knowledge Bases, visit the Amazon Bedrock Knowledge Bases product page. For more information regarding Amazon Titan Text Embeddings, visit Amazon Titan in Amazon Bedrock. For more information on Amazon OpenSearch Serverless, visit the Amazon OpenSearch Serverless product page. For pricing details, review the Amazon Bedrock pricing page.

Give the new feature a try in the Amazon Bedrock console today. Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts and engage with the generative AI builder community at community.aws.

About the Authors

Shreyas Subramanian is a principal data scientist and helps customers by using generative AI and deep learning to solve their business challenges using AWS services. Shreyas has a background in large-scale optimization and ML and in the use of ML and reinforcement learning for accelerating optimization tasks.

Ron Widha is a Senior Software Development Manager with Amazon Bedrock Knowledge Bases, helping customers easily build scalable RAG applications.

Satish Nandi is a Senior Product Manager with Amazon OpenSearch Service. He is focused on OpenSearch Serverless and has years of experience in networking, security and AI/ML. He holds a bachelor’s degree in computer science and an MBA in entrepreneurship. In his free time, he likes to fly airplanes and hang gliders and ride his motorcycle.

Vamshi Vijay Nakkirtha is a Senior Software Development Manager working on the OpenSearch Project and Amazon OpenSearch Service. His primary interests include distributed systems.

Automate cloud security vulnerability assessment and alerting using Amazon Bedrock

Cloud technologies are progressing at a rapid pace. Businesses are adopting new innovations and technologies to create cutting-edge solutions for their customers. However, security is a big risk when adopting the latest technologies. Enterprises often rely on reactive security monitoring and notification techniques, but those techniques might not be sufficient to safeguard your enterprises from vulnerable assets and third-party attacks. You need to establish proper security guardrails in the cloud environment and create a proactive monitoring practice to strengthen your cloud security posture and maintain required compliance standards.

To address this challenge, this post demonstrates a proactive approach for security vulnerability assessment of your accounts and workloads, using Amazon GuardDuty, Amazon Bedrock, and other AWS serverless technologies. This approach aims to identify potential vulnerabilities proactively and provide your users with timely alerts and recommendations, avoiding reactive escalations and other damages. By implementing a proactive security monitoring and alerting system, users can receive personalized notifications in preferred channels like email, SMS, or push notifications. These alerts concisely summarize the identified security issues and provide succinct troubleshooting steps to fix the problem promptly, without the need for escalation.

GuardDuty is a threat detection service that continuously monitors for malicious activity and unauthorized behavior across your AWS environment. GuardDuty combines machine learning (ML), anomaly detection, and malicious file discovery, using both AWS and industry-leading third-party sources, to help protect AWS accounts, workloads, and data. GuardDuty integrates with Amazon EventBridge by creating an event for EventBridge for new generated vulnerability findings. This solution uses a GuardDuty findings notification through EventBridge to invoke AWS Step Functions, a serverless orchestration engine, which runs a state machine. The Step Functions state machine invokes AWS Lambda functions to get a findings summary and remediation steps through Amazon Bedrock.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

By using generative AI FMs on Amazon Bedrock, users can quickly analyze vast amounts of security data to identify patterns and anomalies that may indicate potential threats or breaches. Furthermore, by recognizing patterns in network traffic, user behavior, or system logs, such FMs can help identify suspicious activities or security vulnerabilities. Generative AI can make predictions about future security threats or attacks by analyzing historical security data and trends. This can help organizations proactively implement security measures to prevent breaches before they occur. This form of automation can help improve efficiency and reduce the response time to security threats.

Solution overview

The solution uses the built-in integration between GuardDuty and EventBridge to raise an event notification for any new vulnerability findings in your AWS accounts or workloads. You can configure the EventBridge rule to filter the findings based on severity so that only high-severity findings are prioritized first. The EventBridge rule invokes a Step Functions workflow. The workflow invokes a Lambda function and passes the GuardDuty findings details. The Lambda function calls Anthropic’s Claude 3 Sonnet model through Amazon Bedrock APIs with the input request. The API returns the finding summarization and mitigation steps. The Step Functions workflow sends findings and remediation notifications to the subscribers or users using Amazon Simple Notification Service (Amazon SNS). In this post, we use email notification, but you can extend the solution to send mobile text or push notifications.

The solution uses the following key services:

Amazon Bedrock – The solution integrates with Anthropic’s Claude 3 Sonnet model to provide summarized visibility into security vulnerabilities and troubleshooting steps.
Amazon EventBridge – EventBridge is a serverless event bus that helps you receive, filter, transform, route, and deliver events.
Amazon GuardDuty – The solution uses the threat detection capabilities of GuardDuty to identify and respond to threats.
IAM – With AWS Identity and Access Management (IAM), you can specify who or what can access services and resources in AWS, centrally manage fine-grained permissions, and analyze access to refine permissions across AWS. Follow the principle of least privilege to safeguard your workloads.
AWS Lambda – Lambda is a compute service that runs your code in response to events and automatically manages the compute resources, making it the fastest way to turn an idea into a modern, production, serverless application.
Amazon SNS – Amazon SNS is a managed service that provides message delivery from publishers to subscribers.
AWS Step Functions – Step Functions is a visual workflow service that helps developers use AWS services to build distributed applications, automate processes, orchestrate microservices, and create data and ML pipelines.

The following diagram illustrates the solution architecture.

The workflow includes the following steps:

GuardDuty invokes an EventBridge rule. The rule can filter the findings based on severity.
1. The findings are also exported to an Amazon Simple Storage Service (Amazon S3) bucket.
The EventBridge rule invokes a Step Functions workflow.
The Step Functions workflow calls a Lambda function to get the details of the vulnerability findings.
The Lambda function creates a prompt with the vulnerability details and passes it to Anthropic’s Claude 3 using Amazon Bedrock APIs. The function returns the response to the Step Functions workflow.
The Step Functions workflow calls an SNS topic with the findings details to send an email notification to subscribers. You can use your support or operations team as the subscriber for this use case.
Amazon SNS sends the email to the subscribers.
The Step Functions workflow and Lambda function logs are stored in Amazon CloudWatch. For more details, see Configure logging in the Step Functions console to store logs in CloudWatch. By default, CloudWatch logs use server-side encryption for the log data at rest.

Solution benefits

The solution provides the following benefits for end-users:

Real-time visibility – The intuitive omnichannel support solution provides a comprehensive view of your cloud environment’s security posture
Actionable insights – You can drill down into specific security alerts and vulnerabilities generated using generative AI to prioritize and respond effectively
Proactive customizable reporting – You can troubleshoot various errors before escalation by retrieving a summary of reports with action recommendations

Prerequisites

Complete the following prerequisite steps:

Enable GuardDuty in your account to generate findings.
Provision least privilege IAM permissions for AWS resources like Step Functions and Lambda functions to perform desired actions:
1. The Step Functions IAM role should have IAM policies to invoke the Lambda function and publish to the SNS topic.
2. The Lambda function needs AWSLambdaBasic ExecutionRole to publish logs and the bedrock:InvokeModel
3. Edit the access policy of the SNS topic to only allow Step Functions to publish messages to the topic.
Request access to Anthropic’s Claude 3 on Amazon Bedrock.
Turn on encryption at the SNS topic to enable server-side encryption.

Deploy the solution

Complete the following steps to deploy the solution:

On the EventBridge console, create a new rule for GuardDuty findings notifications.

The example rule in the following screenshot filters high-severity findings at severity level 8 and above. For a complete list of GuardDuty findings, refer to the GetFindings API.

On the Lambda console, create a Lambda function that will take the findings as the input and call the Amazon Bedrock API to get the summarization and mitigation steps from Anthropic’s Claude 3.

You need to provide proper IAM permissions to your Lambda function to call Amazon Bedrock APIs. You can configure parameters in the environment variables in the Lambda function. The following function uses three configuration parameters:

modelId is set as claude-3-sonnet-20240229-v1:0
findingDetailType is set as GuardDuty finding to filter the payload
source is set as guardduty to only evaluate GuardDuty findings

import json
import boto3
import urllib.parse
import os

region = os.environ['AWS_REGION']
model_Id = os.environ['modelId']
finding_detail_type = os.environ['findingDetailType']
finding_source = os.environ['source']

# Bedrock client used to interact with APIs around models
bedrock = boto3.client(service_name='bedrock', region_name= region)

# Bedrock Runtime client used to invoke and question the models
bedrock_runtime = boto3.client(service_name='bedrock-runtime', region_name= region)

evaluator_response = []
max_tokens=512
top_p=1
temp=0.5
system = ""

def lambda_handler(event, context):
    message = ""
    try:
        file_body = json.loads(json.dumps(event))
        print(finding_detail_type)
        print(finding_source)
        if file_body['detail-type'] == finding_detail_type and file_body['source'] == finding_source and file_body['detail']:
            print(f'File contents: {file_body['detail']}')
            description = file_body["detail"]["description"]
            finding_arn = file_body["detail"]["arn"]
            try:
                body= createBedrockRequest(description)
                message = invokeModel(body)
                print(message)
                evaluator_response.append(message)
                evaluator_response.append(finding_arn)
            except Exception as e:
                print(e)
                print('Error calling model')
        else:
            message = "Invalid finding source"
    except Exception as e:
        print(e)
        print('Error getting finding id from the guard duty record')
        raise e
    return message

def createBedrockRequest(description):
    prompt = "You are an expert in troubleshooting AWS logs and sharing details with the user via an email draft as stated in <description>. Do NOT provide any preamble. Draft a professional email summary of details as stated in description. Write the recipient as - User in the email and sender in the email should be listed as - Your Friendly Troubleshooter. Skip the preamble and directly start with subject. Also, provide detailed troubleshooting steps in the email draft." + "<description>" + description + "</description>"
    messages = [{ "role":'user', "content":[{'type':'text','text': prompt}]}]
    body=json.dumps(
             {
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": max_tokens,
                "messages": messages,
                "temperature": temp,
                "top_p": top_p,
                "system": system
            } 
        )
    return body

def invokeModel(body):
    response = bedrock_runtime.invoke_model(body= body, modelId = model_Id)
    response_body = json.loads(response.get('body').read())
    message = response_body.get('content')[0].get("text")
    return message

It’s crucial to perform prompt engineering and follow prompting best practices in order to avoid hallucinations or non-coherent responses from the LLM. In our solution, we created the following prompt to generate responses from Anthropic’s Claude 3 Sonnet:

Prompt = ```You are an expert in troubleshooting AWS logs and sharing details with the user via an email draft as stated in <description>. Do NOT provide any preamble. Draft a professional email summary of details as stated in description. Write the recipient as - User in the email and sender in the email should be listed as - Your Friendly Troubleshooter. Skip the preamble and directly start with subject. Also, provide detailed troubleshooting steps in the email draft." + "<description>" + description + "</description>```

The prompt makes sure the description of the issue under consideration is categorized appropriately within XML tags. Further emphasis has been provided upon jumping directly into generating the answer and skipping any additional information that may be generated from the model.

On the Amazon SNS console, create an SNS topic to send notifications and add the emails of the subscribers.

The following screenshot shows the topic details with some test subscribers.

Now you can create the Step Functions state machine and integrate the Lambda and Amazon SNS calls in the workflow.

On the Step Functions console, create a new state machine and add the Lambda and Amazon SNS optimized integration.

You need to provide appropriate IAM permissions to the Step Functions role so it can call Lambda and Amazon SNS.

The following diagram illustrates the Step Functions state machine.

The following sample code shows how to use the Step Functions optimized integration with Lambda and Amazon SNS.

On the EventBridge console, add the Step Functions state machine as the target of the EventBridge rule created earlier.

As seen in the following screenshot, the rule needs to have proper IAM permission to invoke the Step Functions state machine.

Test the solution

You can test the setup by generating some sample findings on the GuardDuty console. Based on the sample findings volume, the test emails will be triggered accordingly.

Based on a sample generation, the following screenshot shows an email from Amazon SNS about a potential security risk in an Amazon Elastic Container Service (Amazon ECS) cluster. The email contains the vulnerability summary and a few mitigation steps to remediate the issue.

The following screenshot is a sample email notification about a potential Bitcoin IP address communication.

This proactive approach enables users to take immediate action and remediate vulnerabilities before they escalate, reducing the risk of data breaches or security incidents. It empowers users to maintain a secure environment within their AWS accounts, fostering a culture of proactive security awareness and responsibility. Furthermore, a proactive security vulnerability assessment and remediation system can streamline the resolution process, minimizing the time and effort required to address security concerns.

Clean up

To avoid incurring unnecessary costs, complete the following steps:

Delete the following AWS resources associated with this solution:
1. Step Functions state machine
2. Lambda functions
3. SNS topic
You can disable GuardDuty if you’re no longer using it to avoid S3 bucket storage cost.

By cleaning up the resources created for this solution, you can prevent any ongoing charges to your AWS account.

Conclusion

By providing users with clear and actionable recommendations, they can swiftly implement the necessary fixes, reducing the likelihood of untracked or lost tickets and enabling swift resolution. Adopting this proactive approach not only enhances the overall security posture of AWS accounts, but also promotes a collaborative and efficient security practice within the organization, fostering a sense of ownership and accountability among users.

You can deploy this solution and integrate it with other services to have a holistic omnichannel solution. To learn more about Amazon Bedrock and AWS generative AI services, refer to the following workshops:

About the Authors

Shikhar Kwatra is a Sr. Partner Solutions Architect at Amazon Web Services, working with leading Global System Integrators. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and support the GSI partners in building strategic industry solutions on AWS.

Rajdeep Banerjee is a Senior Partner Solutions Architect at AWS helping strategic partners and clients in the AWS cloud migration and digital transformation journey. Rajdeep focuses on working with partners to provide technical guidance on AWS, collaborate with them to understand their technical requirements, and designing solutions to meet their specific needs. He is a member of Serverless technical field community. Rajdeep is based out of Richmond, Virginia.

DXC transforms data exploration for their oil and gas customers with LLM-powered tools

This post is co-written with Kaustubh Kambli from DXC Technology.

DXC Technology is an IT services leader with more than 130,000 employees supporting 6,000 customers in over 70 countries. DXC builds offerings across industry verticals to deliver accelerated value to their customers.

One of the sectors DXC has deep expertise in is energy. The oil and gas industry relies on discovering new drilling sites to drive growth. Data-driven insights can accelerate the process of identifying potential locations and improve decision-making. For the largest companies in the sector, shaving even a single day off the time to first oil can impact operational costs and revenue by millions of dollars.

In this post, we show you how DXC and AWS collaborated to build an AI assistant using large language models (LLMs), enabling users to access and analyze different data types from a variety of data sources. The AI assistant is powered by an intelligent agent that routes user questions to specialized tools that are optimized for different data types such as text, tables, and domain-specific formats. It uses the LLM’s ability to understand natural language, write code, and reason about conversational context.

Data plays a key role in identifying sites for oil exploration and in accelerating the time to extract oil from those sites, but data in this industry is scattered, non-standard, and of various types. These companies have remote sites, offshore drilling locations, branch offices, and corporate offices. Relevant data is also in various formats, ranging from spreadsheets to complex datasets like satellite images and GIS data. Moreover, there are industry-specific data formats like Log ASCII standard (LAS).

The assistant architecture consists of several key components powered by Anthropic’s Claude on Amazon Bedrock. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

In this project, Amazon Bedrock enabled us to seamlessly switch between different variants of Anthropic’s Claude models. We used smaller, faster models for basic tasks such as routing, and more powerful models for complex processes such as code generation. Additionally, we took advantage of Amazon Bedrock Knowledge Bases, a managed service that enhances the LLM’s knowledge by integrating external documents. This service seamlessly integrates with FMs on Bedrock and can be set up through the console in a few minutes.

Solution overview

The solution is composed of several components powered by Anthropic’s Claude models on Bedrock:

Router – The router analyzes the user query and routes it to the appropriate tool
Custom built tools – These tools are optimized for different data sources such as file systems, tables, and LAS files
Conversational capabilities – These capabilities enable the model to understand context and rewrite queries when needed

We also use an Amazon Simple Storage Service (Amazon S3) bucket to store the data. The data is indexed by relevant tools when deploying the solution, and we use signed S3 URLs to provide access to the relevant data sources in the UI.

The following diagram illustrates the solution architecture.

In the following sections, we go over these components in detail. The examples presented in this post use the Teapot dome dataset, which describes geophysical and well log data from the Rocky Mountain Oilfield Testing Center (RMOTC) related to the Tensleep Formation and is available for educational and research purposes.

LLM-powered router

The types of questions that the chatbot can be asked can be broken down into distinct categories:

File name questions – For example, “How many 3D seg-y files do we have?” For these questions, we don’t need to look at the file content; we only need to filter by file extension and count.
File content questions – For example, “What can you say about the geology of teapot dome?” For these questions, we need to do semantic search on the file content.
Production questions – For example, “How much oil did API 490251069400 produce on March 2001?” For these questions, we need to filter the production Excel sheet (here, match on API number) and make operations on the columns (here, sum on the oil production column).
Directional survey questions – For example, “What APIs have a total depth of more than 6000 ft?” For these questions, we need to filter the directional survey Excel sheet. The process is similar to the production questions, but the data differs.
LAS files questions – For example, “What log measurements were acquired for API 490251069400?” For these questions, we need to open and process the LAS file to look at the measurements present in the file.

The way to deal with each of these questions requires different processing steps. We can design LLM-powered tools that can address each question type, but the first step upon receiving the user query is to route it to the right tool. For this, we use Anthropic’s Claude v2.1 on Amazon Bedrock with the following prompt:

routing_prompt = """

Human: You are an AI assistant that is an expert in Oil and Gas exploration.

    Use the following information as background to categorize the question

    - An API well number or API# can can have up to 14 digits sometimes divided
    by dashes.
    - There can be different unique identifiers for wells other than API #.
    - .las or .LAS refers to Log ASCII file format. It is a standard file
    format for storing well log data, which is crucial in the exploration and production of oil and gas. Well logs are detailed records of the geological formations penetrated by a borehole, and they are essential for understanding the subsurface conditions.


    Determine the right category of the question to route it to the appropriate service by following the instructions below

    - Repond with single word (the category name).
    - Use the information provided in <description> to determine the category of
    the question.
    - If you are unable to categorize the question or it is not related to one of
    the below categories then return "unknown".
    - Use the category names provided below. Do not make up new categories.
    - If the category is ambiguous then output all the categories that are relevant
    as a comma separated list.


    <categories>
        <category>
            <name>filename</name>
            <description>The questions about searching for files or objects or
            those related to counting files of specific types such as .pdf, .las, .xls, .sgy etc.
            </description>
        </category>

        <category>
            <name>production</name>
            <description>well production related information. This can correlate API#, Well, Date of production, Amount Produces, Formation, Section </description>
        </category>

        <category>
            <name>las</name>
            <description>related to log data or .las or .LAS or Log ASCII files.
            Except questions related
            to searching or counting the files with .las extension.
            Those belong to filesystem category. </description>
        </category>

        <category>
            <name>directional_survey</name>
            <description>directional survey contains information about multiple
            wells and associates API, Measured Depth, Inclination and Azimuth
            </description>
        </category>

        <category>
            <name>knowledge_base</name>
            <description>related to oil and gas exploration but does not fit in any of the categories above, include seismic, logging and core analysis related questions.
            </description>
        </category>

        <category>
            <name>unrelated</name>
            <description> Question does not belong to one of the above categories and it is not related to oil and gas exploration in general. </description>
        </category>

    </categories>

    Here is the question

    <question>
    {query}
    </question>


    Return your answer in the following format

    <answer>
        <reason>$REASON_JUSTIFYING_CATEGORY</reason>
        <labels>$COMMA_SEPARETED_LABELS</labels>
    </answer>
"""

Using XML tags in the output allows you to parse out the right category for the question. You can then pass the query down to the relevant tool for further processing. Note that with the release of new powerful Anthropic models, you could also use Anthropic’s Claude Haiku on Amazon Bedrock to improve latency for the routing.

The prompt also includes guardrails to make sure queries not pertaining to oil and gas data are gently dismissed.

LLM-powered tools

To optimally handle the variety of tasks for the chatbot, we built specialized tools. The tools that we built are data-type specific (text, tables, and LAS), except for the file search tool, which is task specific.

File search tool

When searching for files and information, we identified two distinct types of search. One type pertains to identifying files based on the name or extension; the other requires analyzing the contents of the file to answer a question. We call the first type file name-based search and the second semantic-content based search.

File name-based search

For this tool, we don’t need to look at the contents of the file; only at the file name. To initialize the tool, we first crawl the S3 bucket containing the data to get a list of the available files. Then for each query, the steps are as follows:

LLM call to extract file extension and keywords from the query. When searching for relevant files to answer a query, we can look for specific file extensions or keywords that might be present in the content of the files. Our approach is to first use an LLM to analyze the user’s query and extract potential file extensions and keywords that could be relevant. We then search through the list of available files, looking for matches to those file extensions and keywords identified by the LLM. This allows us to narrow down the set of files that might contain information pertinent to answering the user’s question. Because we’re working with Anthropic’s Claude models, we ask the LLM to format its answer using XML tags. This structured output allows us to parse and extract file extensions and keywords from the answer. For instance, if the question is “Do we have any *.SGY files,” the LLM response should be <file-extension>.sgy</file-extension> <keywords></keywords> because there are no keywords. On the other hand, if the question is “Can you show me the 2Dseismic base map,” the response should be <file-extension></file-extension> <keywords>2D, seismic, base, map</keywords>.
Retrieve files that match the extension or keywords identified by the LLM. Retrieval is done by doing simple string matching over the list of available files. If a file extension is extracted by the LLM, simple string matching is done on the end of the file name; if keywords are extracted by the LLM, string matching is done for each of the keywords.
LLM call to confirm that retrieved files match the user query, and provide a final answer. To reach the final answer, we build a prompt containing the user query and the files retrieved at Step 2. We also give specific output formatting instructions to the LLM. Similar to Step 1, we rely on an XML structure to parse and extract desired information. In this step, the desired outputs are as follows:
1. <success> – Whether the search was successful.
2. <resources> – The list of Amazon S3 locations that the LLM deems relevant to the user query.
3. <answer> – The final answer from the LLM.

To be mindful of the limited number of input and output tokens, we also implement controls to make sure the retrieved context isn’t too large and that the LLM’s answer isn’t cut off, which can happen if there are too many resources that match the user query.

The following screenshot is an example of a query that’s answered using that tool. It shows the query “show me the .sgy files” and the tool’s response, which includes a list of files.

Semantic content-based search

The implementation for semantic content-based search relies on Amazon Bedrock Knowledge Bases. Amazon Bedrock Knowledge Bases provides a seamless way to implement semantic search by pointing the service to an S3 bucket containing the data. The managed service takes care of the processing, chunking, and data management when files are added or deleted from the bucket. For setup instructions, see Knowledge Bases now delivers fully managed RAG experience in Amazon Bedrock.

For a given user query that’s passed to the tool, the steps are as follows:

Use the retrieve API from Amazon Bedrock Knowledge Bases to retrieve documents semantically close to the query.
Construct a prompt with the retrieved documents formatted with XML tags—<content> for text content and <location> for the corresponding Amazon S3 location.
Call Anthropic’s Claude v2.1 model with the prompt to get the final answer. Similarly to the file name-based search, we instruct the LLM to use <success>, <answer>, and <resources> tags in the answer.

Using the retrieve_and_reply API instead of the retrieve API would get the answer in a single step, but this approach gives us more flexibility in the prompting to get the output with the desired format.

The following screenshot is an example of a question answered using the semantic search tool. It shows the query “what information do you have about the geology of teapot dome?” and the tool’s response.

Tables tool

This tool is designed to filter tables and compute certain metrics from the information they contain. It uses the LLM’s ability to write Python code for data analysis. We implemented a generic tables tool that takes the following as input:

An Excel or CSV file
A description of the table (optional)
Table-specific instructions (optional)

In practice, with every new CSV or Excel file, we create a standalone tool for the router. This means that the tables tool applied to the production Excel sheet constitutes the production tool, whereas the tables tool coupled with the directional survey Excel sheet constitutes the directional survey tool.

Some out-of-the-box data analysis tools, such as LangChain’s Pandas agent, are available in open source libraries. The way these agents work is that they use an LLM to generate Python code, execute the code, and send the result of the code back to the LLM to generate a final response. However, for certain data analysis tasks, it would be preferable to directly output the result of Python code. Having an LLM generate the response as an extra step after the code execution introduces both latency and a risk for hallucination.

For example, many sample questions require filtering a DataFrame and potentially returning dozens of entries. The ability to filter a DataFrame and return the filtered results as an output was essential for our use case. To address this limitation, we wanted the LLM to generate code that we could run to obtain the desired output directly, so we built a custom agent to enable this functionality. Our custom agent also has the ability to self-correct if the generated code outputs an error. The main difference with traditional code-writing agents is that after the code is run, we return the output, whereas with traditional agents, this output is passed back to the agent to generate the final response. In our example with filtering and returning a large DataFrame, passing the DataFrame back to the agent to generate the final response would have the LLM rewrite that large DataFrame with risk of either exceeding the context window or hallucinating some of the data.

The following screenshot is an example of a question answered using the production data tool, which is the tables tool applied to the production data CSV file. It shows the query “What were the top 3 oil producing wells in March 2024” and the tool’s response. The response includes a DataFrame with the top 3 oil producing wells as well as the logic behind how the DataFrame was obtained.

The following code is the LLM response generated in the background; you can see in <code> tags the code that is being run to get the result in the data section of the UI. We explicitly prompt the LLM to store the final result in a variable called result so we can send it back to the user in the frontend.

<scratchpad>
To find wells with total depth greater than 6000 feet, I need to:

1. Filter the df dataframe on the 'Total Depth' column
2. Check if the value is greater than 6000
3. Handle any NaN values so they are not included
</scratchpad>

<code>
import pandas as pd
import numpy as np

result = df[df['Total Depth'].replace({np.nan: 0}) > 6000]
</code>

<answer>
I filtered the dataframe on the 'Total Depth' column, replaced any NaN values with 0 so they would not be included, and checked for values greater than 6000 feet. The final filtered dataframe is stored in the variable 'result'.
</answer>

LAS tool

The LAS tool is built to analyze Log ASCII Standard (LAS) files, which contain a wide variety of log curves and data to fully characterize the drilled formations and support petroleum engineering analysis. This file format, which is in essence a structured text file, can be efficiently queried and analyzed using the lasio Python library.

Similarly to the tables tool, we use the code-writing abilities of the LLM to generate Python code using the lasio library. We used the XML agent implemented in LangChain, along with the Python REPL tool, which allows the agent to run Python code.

Because lasio isn’t a common Python library, we give detailed instructions in the prompt on how to use it properly. The following code shows an example of the extract from the larger prompt:

""" ...
If the question is to display a particular log data, you can use the `lasio` package
 in Python to load every LAS file into dataframe and display them iteratively as
follows.

    import lasio
   #Extract the las file names as a list from the dataframe df
    las_files = df[df['API Number'] == api_num]['las_files'].values[0]
    print(las_files)

    for las_file in las_files: #enumerate the las file name in the list            
        las = lasio.read(las_file) # read a single LAS file
        las_df = las.df() # convert the LAS data into dataframe
        print(las_df) # print the dataframe 
 ... """

The following screenshot shows an example of a question answered using the LAS tool. It shows the query “What log measurements were acquired for API #490251107500” and the tool’s response, which includes processed data from the LAS files.

The following code is the LLM’s response in the backend, which contains the relevant Python code:

<tool>Python_REPL</tool><tool_input>
import pandas as pd
import lasio

df = pd.read_parquet("/home/ec2-user/SageMaker/AmazonSageMaker-DXC-GAIIC-Opibus-Repo/data/processed/teapot_dome/las_index.parq")

api_num = 490251107500  
las_files = df[df['API Number'] == api_num]['las_files'].values[0]

print(f"LAS files for API #{api_num}: {las_files}")

log_measurements = []
for las_file in las_files:
    las = lasio.read(las_file)
    for curve in las.curves:
        if curve.mnemonic not in log_measurements:
            log_measurements.append(curve.mnemonic)
            
print(f"Log measurements acquired: {log_measurements}")

The following is the output of the code run by the Python REPL tool, which is returned to the LLM to generate its final answer:

"""
LAS files for API #490251107500: ['/home/ec2-user/SageMaker/AmazonSageMaker-DXC-GAIIC-Opibus-Repo/data/raw/teapot_dome/Well Log Data/LAS_log_files/Shallow_LAS_files/49025110750000_480593.LAS']
Log measurements acquired: ['DEPT', 'GRR', 'RILD', 'RILM', 'RLL', 'SPR']
"""

Lastly, we have the LLM’s final answer in the backend, before it’s parsed and returned to the UI:

 <final_answer>
    <data>
        <log_measurements>
            DEPT, GRR, RILD, RILM, RLL, SPR
        </log_measurements>
    </data>
    <resources>
        <resource>
            <location>/home/ec2-user/SageMaker/AmazonSageMaker-DXC-GAIIC-Opibus-Repo/data/raw/teapot_dome/Well Log Data/LAS_log_files/Shallow_LAS_files/49025110750000_480593.LAS</location>

Conversational capabilities

The basic router handles a single user query and isn’t aware of chat history. However, conversational context is an essential part of the user experience. For instance, when a user asks “What API produced the most oil in 2010,” a natural follow-up question would be “What measurements do we have for that API,” in which case we need to recall the API number from the previous context, or “What about 2011,” in which case we need to recall the fact that the question is to find the API that produced the most oil. To enable this ability to add follow-up questions, we added another layer before the router that takes the user query and the conversation history and rewrites the user query with context that might be missing from it. We can also use this query-rewriting layer to directly translate or summarize previous responses, without having to go to the router, which saves time for simple queries.

The following is the sample prompt for context-aware query rewriting. We give the LLM two choices: either directly reply to the question if it’s a translation of summarization of a previous interaction, because this doesn’t require the use of tools, or rewrite the query to forward it to an expert (the router plus the tool framework). To differentiate between the options, the LLM can use either <answer> tags or <new_query> tags in its reply. In both cases, we ask the LLM to start out by using <thinking> tags to logically think about which one is more appropriate. If the <answer> tag is present in the LLM’s reply, we directly forward that answer to the user. Otherwise, if the <new_query> tag is present, we forward that new query to the router for appropriate tool use. We also added few-shot examples to the prompt to make the query rewriting process more explicit for the LLM, and in our experience they were instrumental to the success of query rewriting.

query_rewriting_prompt = """
You are an AI assistant that helps a human answer oil and gas question.
You only know how to or rewrite previous interactions.
If the human asks for oil and gas specific knowledge, or to count and find specific
files, you should rewrite the query so it can be forwarded to an expert.
If the human tries to ask a question that is not related to oil and gas,
you should politely tell them that only oil and gas related questions are supported.

Here is the conversation between the human and the expert so far.
H is the human and E is the expert:
<history>
{history}
</history>

Here is the new query
<query>
{query}
</query>

If you can answer the question, your answer should be formatted as follows.
In the example, H is the human, E is the expert and A is you, the assistant.
<example>
H: How many wells are in section 24?
E: There are 42 wells
H: Can you rewrite that in French?
A: <think> This is a translation, I can answer.</think>
<answer>Il y a 42 puits.</answer>
</example>

<example>
H: Can you summarize that in one sentence?
A: <think> This is just rewriting, I can summarize the previous reply and
answer directly.</think>
<answer>Il y a 42 puits.</answer>
</example>

<example>
H: Who's the queen of England?
A: <think>This is unrelated, I can tell the user I can't answer.</think>
<answer>I am sorry but I can only answer questions about your files.</answer>
</example>

If you need to forward the question to the expert, your answer should be as follows
<example>
H: What is the depth of well x250?
E: It's 2000 ft
H : What about well y890?
A: <think>This requires expert knowledge and some context, and I need to rewrite
the query before I ask the expert.</think>
<new_query>What is the depth of well y890?</new_query>
</example>

<example>
H: How many pdf files do I have?
A: <think>This requires to look into the file system,
I need to forward the question to the expert.</think>
</new_query>How many pdf files do I have?</new_query>
</example>

Remember, You only know how to translate or rewrite previous interactions.
If the human asks for anything other than rewriting or translating,
you should rewrite the query so it can be forwarded to an expert.
If the query needs context from previous questions,  rewrite the query so
the expert can understand it, otherwise, forward the query as-is.
If the human tries to ask a question that is not related to oil and gas,
you should politely tell them that only oil and gas related questions are supported.
"""

This query rewriting step adds an extra second in terms of latency, and could be toggled off, but it greatly enhances user experience because it enables follow-up questions. Another way to handle this would have been to combine the query rewriting and the router in a single prompt, but we find that it’s more straightforward for LLMs to perform tasks separately, because they can get overwhelmed when faced with too many instructions.

The following is an example of a conversational flow. The user first asks about the measurements for a given API, which requires the LAS tool. Then they follow up by asking production questions about that API, all using the production tool. Each of these questions builds on previous questions, which highlights the need for query rewriting. The table shows the initial user query and corresponding LLM-rewritten query, which accounts for previous context.

User Query	LLM Rewritten Query (Context Aware)
What log measurements were acquired for API #490251107500	What log measurements were acquired for API #490251107500?
How much oil did this API produce in September 2003?	How much oil did API #490251107500 produce in September 2003?
What about November?	For API #490251107500, how much oil did it produce in November 2003?
What month had the highest production that year?	What steps would you take to analyze the production data for API #490251107500 and determine which month had the highest oil production in the year 2003?
Get me a table of the monthly productions for that API for that year, include the monthly production and the months in the table	Please provide a table of the monthly oil productions for API #490251107500 for the year 2003. This API number and year were referenced in our previous discussion.

The following screenshots show the corresponding flow in the UI and demonstrates that the tool is able to respond based on previous context.

Conclusion

In this post, we presented an AI assistant for efficient data exploration in the oil and gas industry powered by LLMs and optimized tools. The router uses the language understanding abilities of LLMs to route queries to specialized tools. We built custom tools optimized for major data types such as text, tables, and domain-specific formats like LAS. Conversational capabilities enable clarification and context-aware follow-up questions. The end-to-end solution showcases how LLMs can transform data exploration workflows through the use of specialized tools and conversational interfaces. Data exploration tasks that took hours can now be achieved in just a few minutes, dramatically reducing time to first oil for DXC’s customers.

In addition to the tools presented here, you can create additional generative AI tools to query SQL data bases or analyze other industry-specific formats. Additionally, instead of creating separate table tools for each CSV dataset, the selection of the relevant dataset could be part of the tables tools itself, further reducing the need for preprocessing when onboarding the solution.

If you’re interested in building a similar AI assistant that can use multiple tools, you can get started with Amazon Bedrock Agents, a fully managed AWS solution that helps orchestrate complex tasks.

About the authors

Aude Genevay is a Senior Applied Scientist at the Generative AI Innovation Center, where she helps customers tackle critical business challenges and create value using generative AI. She holds a PhD in theoretical machine learning and enjoys turning cutting-edge research into real-world solutions.

Asif Fouzi is a Principal Solutions Architect leading a team of seasoned technologists supporting Global Service Integrators (GSI) helping GSIs such as DXC in their cloud journey. When he is not innovating on behalf of users, he likes to play guitar, travel and spend time with his family.

Kaustubh Kambli is a Senior Manager responsible for Generative AI and Cloud Analytics Delivery at DXC. His team drives innovation and AI-powered solutions to meet client needs across multiple industries in AMS region. When he’s not focused on advancing AI technologies, Kaustubh enjoys exploring new places, engaging in creative pursuits and spending quality time with his loved ones.

Anveshi Charuvaka is a Senior Applied Scientist at the Generative AI Innovation Center, where he develops Generative AI-driven solutions for customers’ critical business challenges. With a PhD in Machine Learning and over a decade of experience, he specializes in applying innovative machine learning and generative AI techniques to address complex real-world problems.

Mofijul Islam is an Applied Scientist II at the AWS Generative AI Innovation Center, where he helps customers tackle customer-centric research and business challenges using generative AI, large language models (LLM), multi-agent learning, and multimodal learning. He holds a PhD in machine learning from the University of Virginia, where his work focused on multimodal machine learning, multilingual NLP, and multitask learning. His research has been published in top-tier conferences like NeurIPS, ICLR, AISTATS, and AAAI, as well as IEEE and ACM Transactions.

Yingwei Yu is an Applied Science Manager at Generative AI Innovation Center, AWS, where he leverages machine learning and generative AI to drive innovation across industries. With a PhD in Computer Science from Texas A&M University and years of working experience in Oil&Gas industry, Yingwei brings extensive expertise in applying cutting-edge technologies to real-world applications. Outside of work, he enjoys swimming, painting, MIDI composing, and spending time with family and friends.

The AI for Science Forum: A new era of discovery

The AI Science Forum highlights AI’s present and potential role in revolutionizing scientific discovery and solving global challenges, emphasizing collaboration between the scientific community, policymakers, and industry leaders.Read More

How MSD uses Amazon Bedrock to translate natural language into SQL for complex healthcare databases

This post is co-written with Vladimir Turzhitsky, Varun Kumar Nomula and Yezhou Sun from MSD.

Generative AI is transforming the way healthcare organizations interact with their data. Large language models (LLMs) can help uncover insights from structured data such as a relational database management system (RDBMS) by generating complex SQL queries from natural language questions, making data analysis accessible to users of all skill levels and empowering organizations to make data-driven decisions faster than ever before.

Merck & Co., Inc., Rahway, NJ, USA (hereinafter “MSD”) is a leading global pharmaceutical company that has been inventing medicines and vaccines for over 130 years. Headquartered in Rahway, New Jersey, the company delivers innovative health solutions through its prescription medicines, vaccines, biologic therapies, and animal health products. MSD collaborated with AWS Generative Innovation Center (GenAIIC) to implement a powerful text-to-SQL generative AI solution that streamlines data extraction from complex healthcare databases. MSD employs numerous analysts and data scientists who analyze databases for valuable insights. Currently, they spend considerable time manually querying these databases, which can slow down productivity and delay data-driven decision-making. The text-to-SQL solution can streamline this process significantly. For example, instead of writing complex SQL queries, an analyst could simply ask, “How many female patients have been admitted to a hospital in 2008?” The solution would generate the appropriate SQL query, potentially reducing query time from hours to minutes. This approach not only saves time but also democratizes data access, allowing even non-technical staff to extract insights quickly, thereby enhancing overall organizational productivity and accelerating informed decision-making.

Although some LLMs are capable of generating SQL code, creating an effective text-to-SQL pipeline necessitates precise prompting and may not be achievable with all models. Aside from generic instructions on SQL code generation, the prompt also needs to include all the necessary database information to write executable queries, because this context is crucial for generating accurate and schema-specific SQL statements.

This post explains how the solution is built using Anthropic’s Claude 3.5 Sonnet model on Amazon Bedrock. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. To showcase the solution’s capability, we use the open source DE-SynPUF (Data Entrepreneurs’ Synthetic Public Use File) dataset in this post. This dataset is ideal for demonstrating text-to-SQL capabilities because it provides a realistic yet synthetic healthcare data structure that closely mimics real-world scenarios without compromising patient privacy.

Understanding the DE-SynPUF dataset

The DE-SynPUF dataset is a synthetic database released by the Centers for Medicare and Medicaid Services (CMS), designed to simulate Medicare claims data from 2008–2010. It contains de-identified patient records, including demographics, diagnoses, procedures, and medications. This dataset is commonly used for research and development purposes, because it provides a realistic representation of healthcare data without compromising patient privacy. The database schema containing all the tables and their attributes of the dataset looks like the following figure (source).

Due to file size limitations, each data type in the CMS Linkable 2008–2010 Medicare DE-SynPUF database is released in 20 separate samples. For simplicity, we use only data from Sample 1. However, the solution seamlessly works for the database from all the samples too. In our case, we create a local SQLite database by first downloading it from the source site.

Solution overview

Out-of-the-box text-to-SQL solutions are available in several open source libraries, such as LangChain or LlamaIndex. Although they represent good baselines, we encountered several challenges that required a custom approach:

The DE-SynPUF dataset contains coded columns, a typical challenge for datasets used in the industry. Several attributes, such as sex, race and state, aren’t explicitly available in the database. Instead, they are coded: for instance, the sex column is a numerical column containing 1 for male and 2 for female. When writing a query to count the number of female patients, instead of filtering on the sex column containing female, we need to filter on the sex column containing 2. To give the LLM access to these codes without overwhelming the main prompt, we created lookup tools that the LLM can use to look up for sex, race, and state codes.
The DE-SynPUF dataset doesn’t have intuitive column names. In the input prompt, we listed the columns in the database along with their corresponding description to allow the LLM to identify the relevant column based on the user query.
User queries can contain a long list of medical codes corresponding to procedures, diagnoses, or drugs used by patients. For example, the user might ask “Count the number of patients having a diagnosis code list of 2500, 4501, ….” This is again representative of the industry challenge. Because data analysts need to filter on complex combinations of factors, this list can get too long to be reliably rewritten by the LLM in the SQL query. To avoid this, we rewrite the input question with the placeholder CODE_X, for example, “Count the number of patients having a diagnosis code list of CODE_X”. After the SQL query is generated with the placeholder, the user can swap back the actual list of codes before running it.
User queries are often ambiguous. To avoid a discrepancy between the intent in the input question and the generated SQL, we instruct the LLM to also generate its interpretation of the user query before generating the SQL statement. This way, the user can make sure the LLM’s interpretation of the question is in line with their intent. For example, if the user query is “Find the total number of male patients,” the LLM will generate the description “This query counts the distinct number of male beneficiaries from the beneficiary_summary table by filtering on the BENE_SEX_IDENT_CD column where the value is '1' which represents the code for male gender.” It will also generate the SQL statement: SELECT COUNT(DISTINCT "DESYNPUF_ID") AS num_male_patients FROM beneficiary_summary WHERE "BENE_SEX_IDENT_CD" = '1';.

The customized text-to-SQL pipeline is illustrated in the following diagram. It uses Anthropic’s Claude models (LLMs) in Amazon Bedrock to convert natural language questions into SQL queries. Given the comprehensive nature of these inputs, careful management of the total token count is crucial to make sure it remains within the maximum input token limit while providing sufficient context for accurate SQL generation.

The pipeline contains the following variables:

Prompt template
Database schema
Sample data
Few-shot examples (question-SQL pairs)
Column and table descriptions
Lookup tools

The flow of the solution is the following:

The system prompt template is populated with the aforementioned variables.
The system prompt is passed to Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock using the Converse API, along with the list of tools and the user input.
The LLM output is processed with one of two results:
1. The output contains a call for a lookup tool, in which case we run the lookup tool and append the result to the main prompt, before going back to Step 2.
2. The output contains a generated SQL query, in which case we return it to the user, along with the generated explanation.

Create and query the DE-SynPUF SQLite database

The following code downloads the DE-SynPUF dataset and uploads it to a local SQLite database, which automatically gets created. Although this example uses SQLite, you can adapt the text-to-SQL pipeline for other database engines by simply updating the prompt with the appropriate schema and syntax information for the target database system. We have a config file that contains the information and paths associated with each database.

# For each DE-SynPUF table to be imported
for tbl in data_config:
 
   # Get the data download links and other configurations
    config1 = data_config[tbl]
    links = config1["data_links"]

    #Download and unzip data
    download_and_unzip_files(links, os.path.join(data_dir, tbl))
    
    #Get csv paths
    csv_paths = get_csv_paths(data_dir, tbl)
   
    # For each csv path, export data to sqlite database
    load_table(csv_paths, db_path, config1["col_dtypes"], tbl)

Build the text-to-SQL application using in-context learning

You can call a variety of chat models using the Amazon Bedrock Converse API. In our case, we focus on the family of Anthropic’s Claude models. You can select the specific LLM at runtime from the Streamlit UI.

import boto3
def call_llm(
    prompt,
    tool_config,
    history=[],
    system_prompt="You're an AI assistant.",
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
):
    """
    prompt (str)
    system_prompt (str)
    model_id (str): either anthropic.claude-3-sonnet-20240229-v1:0 
        or anthropic.claude-3-haiku-20240307-v1:0 
        or "anthropic.claude-3-5-sonnet-20240620-v1:0"
    """
    bedrock_client = boto3.client("bedrock-runtime")
    # Temperature of 0 is recommended for code
    inference_config = {"temperature": 0}
    
    response = bedrock_client.converse(
        modelId=modelId,
        messages=history+[{"role": "user", "content": [{"text": prompt}]}],
        system=[{"text":system_prompt}],
        inferenceConfig=inference_config,
        toolConfig = tool_config
    )
   
    reply = response['output']['message']
    stop_reason = response['stopReason']
    
    return reply, stop_reason

The Converse API allows the LLM to use tools, which need to be specified in the tool_config parameter. In our use case, we use tool calling, also known as function calling, to look up relevant codes for our SQL queries.

Create lookup tools for codes

To address the fact that the DE-SynPUF dataset contains coded columns, we created lookup tools that allow the LLM to search for the codes corresponding to gender, race, and state location. We use Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock and the recently introduced tool calling capability.

To use tool calling, you need to call the model using the Amazon Bedrock Converse API and provide a list of available tools. Each tool is defined by a JSON that contains the name of the tool, its description, and its parameters. For example, the following is the tool spec for the get_state_code function:

{
"toolSpec": {
        "name": "get_state_code",
        "description": "Returns code for a given state.",
        "inputSchema": {
            "json": {
                "type": "object",
                "properties": {
                    "state": {
                        "type": "string",
                        "description": "The US state as a two-letter abbreviation with uppercase letters.",
                    },    
                },
                "required": [
                    "state"
                    ]
                }
            }
        }
    }

The following is the message item from the Converse API response when the user input is “Return all patients from Wisconsin”:

message ={
'role': 'assistant',
 'content': [
	{'text': 'To get all patients from the state of Wisconsin, we can use the `get_state_code` function to convert the state name to the standard two-letter uppercase code:'},
  	{'toolUse': {'toolUseId': 'tooluse_3aU_2GYtRxyRS_9J5tik4Q',
    'name': 'get_state_code',
    'input': {'state': 'WI'}}}]}

The Converse API response also contains a stopReason item, which can be either end_turn or tool_use. When the stop reason is tool_use, we extract the tool dictionary from the message dictionary and then invoke the function specified in the tool[‘name’] item with the parameters in the tool[‘input’] item. In the preceding example, we would invoke get_state_code(‘WI’), which would return 52.

We then send the result back to the Converse API so the LLM can continue answering the user question. The following is the formatted message that we feed back to the Converse API:

tool_result = {
    "toolUseId": tool['toolUseId'],
    "content": [{"json": {"code": 52}}]
}

tool_result_message = {
    "role": "user",
    "content": [
        {
            "toolResult": tool_result
        }
    ]
}

If the stop reason is end_turn, we stop the loop and return the generated SQL query to the user.

Prompt template

We use a system prompt to provide guidelines to the LLM, and only pass the user query in each message. We describe in the following sections how we populate the different placeholders.

system_prompt_template = """

Human: You are a SQLite expert. Given an input question, first create a syntactically correct PostgreSQL query to run, then look at the results of the query and return the answer to the input question.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table. Database tables info (DDL statements and sample records) is provided in <table-info> tag and input question is in <question> tag. All tables descriptions are provided within <tables-description> tag. Columns description is also provided within <columns-description> tag.

Provide only the necessary answer. When count, make sure you count only the distinct entries.

In this environment, you have access to a set of tools you can use to get the corresponding code for a given sex/gender, race/ethnicity and United States state. When passing a state to a tool, make sure that you change the state name to a two-letter state abbreviation with uppercase letters.

<table-info>
{table_info}
</table-info>

<tables-description>
{tables_description}
<tables-description>

<columns-description>
{columns_description}
<columns-description>

{query_examples}

{history}

Use only the provided tables. Include table name in SELECT if necessary. For code list, add block letters instead of the actual list in the output. Output the generated SQL query in <sql> tag. And output the description of what the query does in <summary> tag. Nothing else.

Add dataset information to the prompt

To enable the LLM with the necessary information to write executable SQL queries, we need to provide it with database information:

Database schema – The database schema contains each table’s schema by exposing its CREATE TABLE SQL statement. For example, the following is the schema for the table beneficiary_summary:

CREATE TABLE beneficiary_summary (
    "DESYNPUF_ID" TEXT, 
    "BENE_BIRTH_DT" TIMESTAMP, 
    "BENE_DEATH_DT" TIMESTAMP, 
    "BENE_SEX_IDENT_CD" TEXT, 
    "BENE_RACE_CD" TEXT, 
    "BENE_ESRD_IND" TEXT, 
    "SP_STATE_CODE" TEXT, 
    "BENE_COUNTY_CD" TEXT, 
    "BENE_HI_CVRAGE_TOT_MONS" REAL, 
    "BENE_SMI_CVRAGE_TOT_MONS" REAL, 
    "BENE_HMO_CVRAGE_TOT_MONS" REAL, 
    "PLAN_CVRG_MOS_NUM" REAL, 
    "SP_ALZHDMTA" TEXT, 
    "SP_CHF" TEXT, 
    "SP_CHRNKIDN" TEXT, 
    "SP_CNCR" TEXT, 
    "SP_COPD" TEXT, 
    "SP_DEPRESSN" TEXT, 
    "SP_DIABETES" TEXT, 
    "SP_ISCHMCHT" TEXT, 
    "SP_OSTEOPRS" TEXT, 
    "SP_RA_OA" TEXT, 
    "SP_STRKETIA" TEXT, 
    "MEDREIMB_IP" REAL, 
    "BENRES_IP" REAL, 
    "PPPYMT_IP" REAL, 
    "MEDREIMB_OP" REAL, 
    "BENRES_OP" REAL, 
    "PPPYMT_OP" REAL, 
    "MEDREIMB_CAR" REAL, 
    "BENRES_CAR" REAL, 
    "PPPYMT_CAR" REAL
)

Sample data – The sample data contains sample records from each table to show the LLM the expected data within the table. For example, for the beneficiary_summary table, it looks as follows:

/*
3 rows from beneficiary_summary table:
DESYNPUF_ID    BENE_BIRTH_DT    BENE_DEATH_DT    BENE_SEX_IDENT_CD    BENE_RACE_CD    BENE_ESRD_IND    SP_STATE_CODE    BENE_COUNTY_CD    BENE_HI_CVRAGE_TOT_MONS    BENE_SMI_CVRAGE_TOT_MONS    BENE_HMO_CVRAGE_TOT_MONS    PLAN_CVRG_MOS_NUM    SP_ALZHDMTA    SP_CHF    SP_CHRNKIDN    SP_CNCR    SP_COPD    SP_DEPRESSN    SP_DIABETES    SP_ISCHMCHT    SP_OSTEOPRS    SP_RA_OA    SP_STRKETIA    MEDREIMB_IP    BENRES_IP    PPPYMT_IP    MEDREIMB_OP    BENRES_OP    PPPYMT_OP    MEDREIMB_CAR    BENRES_CAR    PPPYMT_CAR
00013D2EFD8E45D1    1923-05-01 00:00:00    None    1    1    0    26    950    12.0    12.0    12.0    12.0    2    1    2    2    2    2    2    2    1    2    2    4000.0    1100.0    0.0    0.0    0.0    0.0    90.0    30.0    0.0
00016F745862898F    1943-01-01 00:00:00    None    1    1    Y    39    230    12.0    12.0    0.0    10.0    1    1    1    2    2    2    1    2    2    2    2    16000.0    1100.0    0.0    0.0    0.0    0.0    930.0    150.0    0.0
0001FDD721E223DC    1936-09-01 00:00:00    None    2    1    0    39    280    12.0    12.0    0.0    12.0    2    2    2    2    2    2    2    2    2    2    2    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
*/

Column descriptions – The column description is formatted as XML data that contains the column name, its description, and its associated table name. The following is a code example:

<data>
  <row>
    <column_name>DESYNPUF_ID</column_name>
    <column_description>Beneficiary Code</column_description>
    <table_name>beneficiary_summary</table_name>
  </row>
  <row>
    <column_name>BENE_BIRTH_DT</column_name>
    <column_description>Date of birth</column_description>
    <table_name>beneficiary_summary</table_name>
  </row>
  <row>
    <column_name>BENE_DEATH_DT</column_name>
    <column_description>Date of death</column_description>
    <table_name>beneficiary_summary</table_name>
  </row>
  ...
</data>

Table descriptions – Similarly, the table description contains the table name and its description, as shown in the following example:

<data>
  <row>
    <table_name>beneficiary_summary</table_name>
    <table_description>It contains synthesized Medicaire beneficiaries</table_description>
  </row>
  <row>
    <table_name>inpatient_claims</table_name>
    <table_description>It contains information about inpatient claims</table_description>
  </row>
  ...
</data>

Use few-shot examples to improve performance

Few-shot examples allow the LLM to better follow instructions, in particular regarding tool usage. We added some few-shot examples in the prompt that cover a variety of challenging cases. In general, it’s recommended to add few-shot examples that cover a broad spectrum of queries. Few-shot examples can be especially useful in helping the LLM deal with complex or ambiguous requests.

<example-question>
Select all patients
</example-question>
<example-sql>
SELECT desynpuf_id FROM beneficiary_summary;
</example-sql>

<example-question>
Select all female inpatients diagnosed with diabetes of code list X
</example-question>"
<example-sql>
SELECT DISTINCT ic.desynpuf_id FROM inpatient_claims ic INNER JOIN beneficiary_summary bs ON ic.desynpuf_id = bs.desynpuf_id WHERE bs.bene_sex_ident_cd = 2 AND ic.admtng_icd9_dgns_cd IN (CODE_LIST_X);
</example-sql>

<example-question>
Find total number of male, hispanic outpatients living in Georgia
</example-question>"
<example-sql>
SELECT COUNT(DISTINCT oc.desynpuf_id) AS num_patients FROM outpatient_claims oc INNER JOIN beneficiary_summary bs ON oc.desynpuf_id = bs.desynpuf_id WHERE bs.bene_sex_ident_cd = 1 AND bs.bene_race_cd = 5 AND bs.sp_state_code = 11;
</example-sql>

Another approach to few-shot prompting is to use the user query to find the most similar query in a database of sample code, to make the few-shot examples more relevant to the problem at hand. This can be done using a RAG approach, where sample queries are embedded in a vector store and stored with corresponding code. That way, when a new user query comes in, we can look at the closest query in the vector store and pull the corresponding code into the prompt.

Conclusion

In this post, we showcased how you can use generative AI to translate natural language into SQL for complex healthcare databases like DE-SynPUF. We chose the DE-SynPUF dataset for this text-to-SQL solution due to its realistic representation of healthcare data, offering a complex yet accessible environment for demonstrating the capabilities of the system. Its unique challenges, including coded attributes, non-intuitive column names, and the need to handle ambiguous queries, provided an opportunity to showcase the robustness and adaptability of the custom approach in generating accurate SQL queries from natural language input.

By formulating the text-to-SQL use case and building an application using Amazon Bedrock, we demonstrated the potential of this technology to revolutionize data accessibility and analytics in healthcare. The text-to-SQL solution at MSD has markedly accelerated data access, streamlining the extraction process from complex databases and thereby facilitating quicker, more informed decision-making. Additionally, it has boosted analyst productivity by simplifying the SQL query process, allowing you to dedicate more time to data interpretation and strategic decision-making, while also enhancing the company’s scalability for future data-driven growth.

You can extend the text-to-SQL application in several ways, such as:

Using Amazon Bedrock Knowledge Bases to find similar question-SQL pairs for few-shot learning
Incorporating data visualization to present results in a more intuitive manner
Integrating with a voice assistant for hands-free interaction
Extending support to multiple languages for global accessibility

As healthcare organizations continue to generate vast amounts of data, generative AI will play a crucial role in unlocking insights and driving data-driven decision-making. By embracing text-to-SQL technology, you can empower your users to access and analyze data more efficiently, ultimately leading to better patient outcomes and operational excellence.

If you’re interested in working with the AWS Generative AI Innovation Center, reach out to the GenAIIC.

About the authors

Tesfagabir Meharizghi is an Applied Scientist at the AWS Generative AI Innovation Center, where he leads projects and collaborates with enterprise customers across various industries to leverage cutting-edge generative AI technologies in solving complex business challenges. He specializes in identifying and prioritizing high-impact use cases, developing scalable AI solutions, and fostering knowledge-sharing partnerships with stakeholders.

Shinan Zhang is an Applied Science Manager at the AWS Generative AI Innovation Center. With over a decade of experience in ML and NLP, he has worked with large organizations from diverse industries to solve business problems with innovative AI solutions, and bridge the gap between research and industry applications.

Rifat Jafreen is a Generative AI Strategist in the AWS Generative AI Innovation center where her focus is to help customers realize business value and operational efficiency by using generative AI. She has worked in industries across telecom, finance, healthcare and energy; and onboarded machine learning workloads for numerous customers. Rifat is also very involved in MLOps, FMOps and Responsible AI.

Henry Wang is a senior applied scientist at the AWS Generative AI Innovation Center, where he researches and builds generative AI solutions for AWS customers. His interest in adapting multimodal LLMs and building agentic workflows across custom domains. During his spare time, he likes to play tennis and golf.

Vladimir Turzhitsky is a Director of Data Science and Outcomes research at MSD. He received a Ph.D. degree from Northwestern University and obtained postdoctoral training at Harvard Medical School, where he later served as faculty researching algorithms and devices for cancer and other disease prediction. He joined Merck Research Laboratories in 2018, where his focus has been on applying data science methods for observational studies in healthcare.

Varun Kumar Nomula is Principal AI/ML Engineer consultant for MSD, specializing in Generative AI, Cloud computing, and Data Science. He is passionate about leveraging cutting-edge technology to solve real-world challenges and creating impactful AI-driven solutions. Varun is also a published author of several books and research papers in the fields of AI and Healthcare, contributing to the academic and professional community.

Yezhou Sun is a data scientist and outcome researcher, and associate director at MSD. His works focus on real world evidence generation for market access and reimbursement, and the application of advanced analytics and AI/ML methods in outcome research. Prior to MSD, he was senior principal engineer at UnitedHealth Group/Optum, building AI/ML solutions for risk stratification and business process automation.

Generate AWS Resilience Hub findings in natural language using Amazon Bedrock

Resilient architectures are the foundation upon which successful businesses are built. However, keeping up with the latest advancements and making sure your systems are resilient can be a daunting task. Between monitoring, analyzing, and documenting architectural findings, a lack of crucial information can leave your organization vulnerable to potential risks and inefficiencies. Even when architectural assessments are conducted, the reports can be highly technical and challenging to comprehend for key stakeholders.

In this post, we explore how to use the power of AWS Resilience Hub and Amazon Bedrock to bridge this gap and streamline the process of sharing architectural findings across your organization. We walk through a solution that uses the generative AI capabilities of Amazon Bedrock to translate technical reports into concise, natural language summaries, making them accessible to a broader audience.

By using the capabilities of Resilience Hub and Amazon Bedrock, you can share findings with C-suite executives, engineers, managers, and other personas within your corporation to provide better visibility over maintaining a resilient architecture.

Solution Overview

By combining Resilience Hub and Amazon Bedrock, you can generate architectural findings in natural language to save time, better understand Recovery Time Objective (RTO) and Recovery Point Objective (RPO) requirements, and distribute assessments through a clear and concise view. Resilience Hub is a central location on the AWS Management Console to manage, define, and assess resilience goals with recommendations based on the AWS Well-Architected Framework. Amazon Bedrock is a fully managed service to build generative AI applications with foundation models (FMs) from leading AI companies such as Anthropic, Mistral AI, Meta, Stability AI, Cohere, AI21 Labs, and Amazon through a single API. Amazon Bedrock allows for integrating generative AI solutions within your application with the ability to test, fine-tune, and customize top FMs based on your use case.

The solution presented in this post is orchestrated through Amazon Cognito to log in to a sample UI that invokes AWS Lambda functions and Amazon Bedrock prompts through large language models (LLMs). Resilience Hub provides resiliency and operational recommendations that include alarms, standard operating procedures (SOPs), and fault injection experiments through AWS Fault Injection Service (FIS). After the assessment Amazon Resource Name (ARN) is input from Resilience Hub, the findings are summarized in natural language to share with other users.

The following diagram illustrates the solution architecture.

The solution workflow includes the following steps:

The user is authenticated through Amazon Cognito with a user name and password.
The user accesses the main UI through Amazon CloudFront, which runs a single-page application hosted on Amazon Simple Storage Service (Amazon S3).
Amazon API Gateway validates the access token with Amazon Cognito, then uses a Lambda function as the integration target.
Lambda gathers the most recent assessment ARN from your published applications in Resilience Hub.
A second Lambda function invokes the Amazon Bedrock API.
Amazon Bedrock processes the assessment and uses prompt engineering techniques to generate the report in natural language based on target personas.

Prerequisites

For this walkthrough, the following are required:

An AWS account.
AWS Management Console access.
A Python 3.12 environment.
AWS Cloud Development Kit (AWS CDK) v2.160.0 or higher installed.
Deploying stacks with the AWS CDK requires dedicated S3 buckets and other containers to be available to AWS CloudFormation during deployment. The AWS CDK is a popular open source framework that allows developers to define cloud resources using familiar programming languages. For more information, see AWS CDK bootstrapping.
An existing assessment and application in Resilience Hub. For instructions, see Measure and Improve Your Application Resilience with AWS Resilience Hub, which also includes a sample template.
Access to AI21 Lab’s Jamba 1.5 Mini model on Amazon Bedrock. This is a one-time action. For more information, refer to Access Amazon Bedrock foundation models.

Deploy solution resources

You can deploy the solution using a CloudFormation template, found on the GitHub repo, to automatically provision the necessary resources in your AWS account. You will provision the Amazon S3 hosted UI using the AWS CDK.

Run the solution

Complete the following steps to run the solution:

Within your terminal or preferred integrated development environment (IDE), run the following commands:
```
git clone https://github.com/aws-samples/resilience-hub-genai.git 
cd aws-resilience-hub-genai/backend
```
Using the text editor (vim, nano, notepad) of your choice, replace EMAIL in the constants.py file with your email.
Deploy with the following code:
```
cd ..
./deploy.sh
```

Wait for the CloudFormation template to successfully launch. This template takes approximately 10 minutes to deploy.

On the AWS CloudFormation console, on the stack’s Outputs tab, locate the public-facing URL for your web application (labeled CLOUDFRONTDISTRIBUTION).

You should have received an email with your user name being the email you provided in the constants.py file and a temporary password.

Log in using the provided credentials, then confirm the password change.
In the UI, choose Report in the navigation pane.
For Persona, choose your desired persona.
For Application, choose your desired application from the list of existing published applications.
Choose Generate Report to review the concise, summarized report generated from the most recent assessment, which is ready for distribution.

Review the summary

This solution includes a summary example from a sample stack from the executive persona. Due to the nature of generative AI, your results may slightly vary, but will look similar to the following screenshot.

Clean up

To clean up the solution, complete the following steps:

On the AWS CloudFormation console, delete the CloudFormation stack you created earlier.
If you downloaded the sample CloudFormation template to assess in Resilience Hub, delete that stack as well.
On the Resilience Hub console, delete the newly created application. This will delete the assessments.

Conclusion

In this post, we discussed how Resilience Hub and Amazon Bedrock can greatly improve the maintenance and evaluation of resilient architectures in your organization. This solution automates the translation of technical architectural findings into natural language summaries, making critical information accessible to various stakeholders, including C-suite executives, auditors, and managers. Streamlined communication leads to improved understanding and faster decision-making, ultimately benefiting your business operations. Integrating AWS services such as Lambda and Amazon Cognito further automates and simplifies the workflow, providing a seamless experience from assessment to reporting.

Ready to enhance your organization’s architectural resilience? Deploy the solution today and begin transforming your technical reports into concise summaries by following the steps outlined in this post. This allows stakeholders to access important information, promoting informed decision-making and a resilient culture.

For more insights and related content, refer to the following:

About the Authors

Ibrahim Ahmad is a Solutions Architect at AWS with a focus in resilience and machine learning. He builds solutions for government technology customers to scale and modernize their cloud solutions. Outside of work, he loves to spend time with friends and family, work out, and race cars.

Mike P. is a Sr. Solutions Architect at AWS based in South Florida. He specializes in helping customers use AWS services to enhance their security posture and explore the potential of generative AI technologies. Mike works closely with organizations to design and implement robust security solutions while exploring innovative use cases for generative AI.

Leland Johnson is a Sr. Solutions Architect for AWS focusing on travel and hospitality. As a Solutions Architect, he plays a crucial role in guiding customers through their cloud journey by designing scalable and secure cloud solutions. Outside of work, he enjoys playing music and flying light aircraft.

Generate and evaluate images in Amazon Bedrock with Amazon Titan Image Generator G1 v2 and Anthropic Claude 3.5 Sonnet

Recent enhancements in the field of generative AI, such as media generation technologies, are rapidly transforming the way businesses create and manipulate visual content. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. With that, it brings functionalities such as model customization, fine-tuning, and Retrieval Augmented Generation (RAG).

In your business, you might want to use those capabilities to improve the user experience and generate media content—such as images, diagrams, infographics or custom shapes—and understand the level of confidence of that generated content according to another model or even a customized, pre-trained evaluation model, with data and parameters from your own organization.

In this post, we demonstrate how to interact with the Amazon Titan Image Generator G1 v2 model on Amazon Bedrock to generate an image. Then, we show you how to use Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock to describe it, evaluate it with a score from 1–10, explain the reason behind the given score, and suggest improvements to the image. Amazon Titan Image Generator G1 v2 was recently released on Amazon Bedrock bringing new features in the image generation field and Anthropic’s Claude 3.5 Sonnet, also newly released, setting new industry benchmarks for graduate-level reasoning and improvements in grasping complex instructions.

Amazon Titan Image Generator G1 v2

Exclusive to Amazon Bedrock, the Amazon Titan models incorporate the 25 years of experience that Amazon has innovating with AI and machine learning (ML) across its business. It allows content creators to quickly generate high-quality, realistic images using simple English text prompts, and returns studio-quality images suitable for advertising, ecommerce, and entertainment.

The newly announced Amazon Titan Image Generator G1 v2 expands its initial version by allowing you to guide image creation using reference images, edit existing visuals, remove backgrounds, generate image variations, and securely customize the model to maintain brand style and subject consistency.

Anthropic Claude 3.5 Sonnet

Anthropic Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming other generative AI models on a wide range of evaluations, including Anthropic’s previously most intelligent model, Anthropic Claude 3 Opus. Anthropic Claude 3.5 Sonnet is available on Amazon Bedrock with the speed and cost of the original Anthropic Claude 3 Sonnet model.

Solution overview

This solution is running in AWS Region us-east-1. It exposes an API endpoint through Amazon API Gateway that proxies the initial prompt request to a Python-based AWS Lambda function, which calls Amazon Bedrock twice. The following diagram illustrates the flow of events.

Users or applications submit a prompt as an API request.
The prompt and parameters are passed to Amazon Bedrock using an inference API called by the Lambda function.
Amazon Bedrock generates a high-quality image based on the prompt with Amazon Titan Image Generator G1 v2.
The Lambda function sends the image bytes and the original prompt to Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock.
Anthropic’s Claude 3.5 Sonnet evaluates the generated image against the original prompt.
The Lambda function saves the image to an Amazon Simple Storage Service (Amazon S3) bucket and generates a pre-signed URL.
The pre-signed URL and the evaluation are returned as an API response in JSON format.

Ultimately, the function saves the image in an S3 bucket and generates a pre-signed URL, returning it and the evaluation summary as the API response.

API Gateway proxies the request to a Lambda function that uses the Python Boto3 library to call Amazon Titan Image Generator v2 on Amazon Bedrock to generate the image and then decodes the image bytes. Then, it passes the image and an evaluation prompt through a multimodal call to Anthropic’s Claude 3.5 Sonnet and, after receiving the score, saves the image to Amazon S3, generates a pre-signed URL, and returns the complete response.

Prerequisites

You should have the following prerequisites:

An AWS account to create and manage the necessary AWS resources for this solution
Amazon Titan Image Generator G1 v2 and Anthropic Claude 3.5 Sonnet models enabled on Amazon Bedrock in AWS Region us-east-1

Provision the solution

You can build the solution architecture using AWS CloudFormation. A single YAML file contains the infrastructure, including AWS Identity and Access Management (IAM) users, policies, API methods, the S3 bucket, and the Lambda function code. Complete the following steps to set up the solution resources:

Sign in to the AWS Management Console as an IAM administrator or appropriate IAM user.
Choose Launch Stack to deploy the CloudFormation template.

Choose Next.
In the Parameters section, enter the following:
- A name for the new S3 bucket that will receive the images (for example, image-gen-your-initials)
- The name of an existing S3 bucket where access logs will be stored.
- A token that you will use to authenticate your API (a string of your choice)
After entering the parameters, choose Next.
Choose Next again.
Acknowledge the creation of IAM resources and choose Submit.

When the stack status is CREATE_COMPLETE, navigate to the Outputs tab and find the API information. Copy the ApiId, the ApiUrl and ResourceId to a safe place and continue to test.

Test the solution

You can test the deployed API by calling it with a programming language of your choice (Python, React, and so in), using the console, a terminal window, or the AWS Command Line Interface (AWS CLI). In this post, we will review the console, the terminal, and AWS CLI. For a visual reference, the following picture is a rendered representation of the image and its evaluation using Streamlit (Python) and the prompt a black cat in an alleyway with blue eyes.

Note that the use of Amazon Bedrock is subject to the AWS Responsible AI Policy. If you encounter errors, or if generations or evaluations are being blocked, your prompt might conflict with the AWS Acceptable Use Policy or the AWS Responsible AI Policy. Retry with a different prompt that adheres to the policy.

Test the solution using the console

Complete the following steps to test the solution using the console:

On the API Gateway console, choose APIs in the navigation pane.
On the APIs list, choose BedrockImageGenEval.
In the Resources section, select the POST method below /generate-image.
Choose the Test tab in the method execution settings.
In the Request body section, enter the following JSON structure:
{ “prompt”:”your prompt” }
Choose Test.

Test the solution using the AWS CLI

To test the solution using the AWS CLI, make sure you have the latest version installed and configured. For instructions, see Install or update to the latest version of the AWS CLI. For configuration, see Configure the AWS CLI. Then complete the following steps:

Retrieve the ApiId and ResourceId information you saved from the Outputs tab.
In an environment running AWS CLI, run the following command:

aws apigateway test-invoke-method --rest-api-id ApiId --resource-id ResouceId --http-method POST --path-with-query-string "/generate-image" --body '{"prompt":"your prompt"}' | grep '"body"' | sed 's/.*"body": "(.*)".*/1/' | sed 's/\//g'

Test the solution using the terminal

To test the solution using a terminal window, you need to have the curl tool installed. After you have it, run the following command:

curl -X POST YOUR_API_URL -H 'Authorization: YOUR_TOKEN_STRING_PARAMETER' -H 'Content-Type: application/json' -d '{"prompt":"your prompt"}'

Regardless of your choice, you should get a response with the following JSON structure:

{"url": "You pre-signed S3 URL",

"evaluation_data": {

"description": "Short description of your image given by the model",

"score": "9",

"reason": "Reason for the score above",

"suggestions": "Suggestions to improve the prompt to have better results."}}

Clean up

To avoid incurring future charges, clean up all the AWS resources that you created using CloudFormation. You can delete these resources on the console or using the AWS CLI. To clean up using the console:

On the Amazon S3 console, empty the S3 bucket that you created and delete it.
On the CloudFormation console, select the stack and choose Delete.

Conclusion

In this post, we demonstrated how to use Amazon Titan Generator G1 v2 and Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock to generate and evaluate media assets (images) and create accurate, fine-grained, exclusive content for your users or internal business case. Thanks to the multimodal capabilities of Amazon Bedrock models, you can apply this solution to different types of media, such as documents, summarizations, translations, and more.

We encourage you to learn and experiment with Amazon Bedrock capabilities, such as how to customize a model to use your own data for generation or evaluation, or try different models and apply security guardrails to have standardized safety controls over the content generated.

About the Author

Raul Tavares is a Solutions Architect focused on games customers across EMEA. With a strong engineering approach, when not knee-deep in cloud architecture, you can find him transforming ideas into solutions, writing code samples or listening to some Japanese heavy metal bands to relax.

Solomonic learning: Large language models and the art of induction

Large language models’ emergent abilities are improving with scale; as scale grows, where are LLMs heading? Insights from Ray Solomonoff’s theory of induction and stochastic realization theory may help us envision — and guide — the limits of scaling.Read More

AI Will Drive Scientific Breakthroughs, NVIDIA CEO Says at SC24

NVIDIA kicked off SC24 in Atlanta with a wave of AI and supercomputing tools set to revolutionize industries like biopharma and climate science.

The announcements, delivered by NVIDIA founder and CEO Jensen Huang and Vice President of Accelerated Computing Ian Buck, are rooted in the company’s deep history in transforming computing.

“Supercomputers are among humanity’s most vital instruments, driving scientific breakthroughs and expanding the frontiers of knowledge,” Huang said. “Twenty-five years after creating the first GPU, we have reinvented computing and sparked a new industrial revolution.”

NVIDIA’s journey in accelerated computing began with CUDA in 2006 and the first GPU for scientific computing, Huang said.

Milestones like Tokyo Tech’s Tsubame supercomputer in 2008, the Oak Ridge National Laboratory’s Titan supercomputer in 2012 and the AI-focused NVIDIA DGX-1 delivered to OpenAI in 2016 highlight NVIDIA’s transformative role in the field.

“Since CUDA’s inception, we’ve driven down the cost of computing by a millionfold,” Huang said. “For some, NVIDIA is a computational microscope, allowing them to see the impossibly small. For others, it’s a telescope exploring the unimaginably distant. And for many, it’s a time machine, letting them do their life’s work within their lifetime.”

At SC24, NVIDIA’s announcements spanned tools for next-generation drug discovery, real-time climate forecasting and quantum simulations.

Central to the company’s advancements are CUDA-X libraries, described by Huang as “the engines of accelerated computing,” which power everything from AI-driven healthcare breakthroughs to quantum circuit simulations.

Huang and Buck highlighted examples of real-world impact, including Nobel Prize-winning breakthroughs in neural networks and protein prediction, powered by NVIDIA technology.

“AI will accelerate scientific discovery, transforming industries and revolutionizing every one of the world’s $100 trillion markets,” Huang said.

CUDA-X Libraries Power New Frontiers

At SC24, NVIDIA announced the new cuPyNumeric library, a GPU-accelerated implementation of NumPy, designed to supercharge applications in data science, machine learning and numerical computing.

With over 400 CUDA-X libraries, including cuDNN for deep learning and cuQuantum for quantum circuit simulations, NVIDIA continues to lead in enhancing computing capabilities across various industries.

Real-Time Digital Twins With Omniverse Blueprint

NVIDIA unveiled the NVIDIA Omniverse Blueprint for real-time computer-aided engineering digital twins, a reference workflow designed to help developers create interactive digital twins for industries like aerospace, automotive, energy and manufacturing.

Built on NVIDIA acceleration libraries, physics-AI frameworks and interactive, physically based rendering, the blueprint accelerates simulations by up to 1,200x, setting a new standard for real-time interactivity.

Early adopters, including Siemens, Altair, Ansys and Cadence, are already using the blueprint to optimize workflows, cut costs and bring products to market faster.

Quantum Leap With CUDA-Q

NVIDIA’s focus on real-time, interactive technologies extends across fields, from engineering to quantum simulations.

In partnership with Google, NVIDIA’s CUDA-Q now powers detailed dynamical simulations of quantum processors, reducing weeks-long calculations to minutes.

Buck explained that with CUDA-Q, developers of all quantum processors can perform larger simulations and explore more scalable qubit designs.

AI Breakthroughs in Drug Discovery and Chemistry

With the open-source release of BioNeMo Framework, NVIDIA is advancing AI-driven drug discovery as researchers gain powerful tools tailored specifically for pharmaceutical applications.

BioNeMo accelerates training by 2x compared to other AI software, enabling faster development of lifesaving therapies.

NVIDIA also unveiled DiffDock 2.0, a breakthrough tool for predicting how drugs bind to target proteins — critical for drug discovery.

Powered by the new cuEquivariance library, DiffDock 2.0 is 6x faster than before, enabling researchers to screen millions of molecules with unprecedented speed and accuracy.

And the NVIDIA ALCHEMI NIM microservice, NVIDIA introduces generative AI to chemistry, allowing researchers to design and evaluate novel materials with incredible speed.

Scientists start by defining the properties they want — like strength, conductivity, low toxicity or even color, Buck explained.

A generative model suggests thousands of potential candidates with the desired properties. Then the ALCHEMI NIM sorts candidate compounds for stability by solving for their lowest energy states using NVIDIA Warp.

This microservice is a game-changer for materials discovery, helping developers tackle challenges in renewable energy and beyond.

These innovations demonstrate how NVIDIA is harnessing AI to drive breakthroughs in science, transforming industries and enabling faster solutions to global challenges.

Earth-2 NIM Microservices: Redefining Climate Forecasts in Real Time

Buck also announced two new microservices — CorrDiff NIM and FourCastNet NIM — to accelerate climate change modeling and simulation results by up to 500x in the NVIDIA Earth-2 platform.

Earth-2, a digital twin for simulating and visualizing weather and climate conditions, is designed to empower weather technology companies with advanced generative AI-driven capabilities.

These tools deliver higher-resolution and more accurate predictions, enabling the forecasting of extreme weather events with unprecedented speed and energy efficiency.

With natural disasters causing $62 billion in insured losses in the first half of this year — 70% higher than the 10-year average — NVIDIA’s innovations address a growing need for precise, real-time climate forecasting. These tools highlight NVIDIA’s commitment to leveraging AI for societal resilience and climate preparedness.

Expanding Production With Foxconn Collaboration

As demand for AI systems like the Blackwell supercomputer grows, NVIDIA is scaling production through new Foxconn facilities in the U.S., Mexico and Taiwan.

Foxconn is building the production and testing facilities using NVIDIA Omniverse to bring up the factories as fast as possible.

Scaling New Heights With Hopper

NVIDIA also announced the general availability of the NVIDIA H200 NVL, a PCIe GPU based on the NVIDIA Hopper architecture optimized for low-power, air-cooled data centers.

The H200 NVL offers up to 1.7x faster large language model inference and 1.3x more performance on HPC applications, making it ideal for flexible data center configurations.

It supports a variety of AI and HPC workloads, enhancing performance while optimizing existing infrastructure.

And the GB200 Grace Blackwell NVL4 Superchip integrates four NVIDIA NVLink-connected Blackwell GPUs unified with two Grace CPUs over NVLink-C2C, Buck said. It provides up to 2x performance for scientific computing, training and inference applications over the prior generation. |

The GB200 NVL4 superchip will be available in the second half of 2025.

The talk wrapped up with an invitation to attendees to visit NVIDIA’s booth at SC24 to interact with various demos, including James, NVIDIA’s digital human, the world’s first real-time interactive wind tunnel and the Earth-2 NIM microservices for climate modeling.

Learn more about how NVIDIA’s innovations are shaping the future of science at SC24.

Faster Forecasts: NVIDIA Launches Earth-2 NIM Microservices for 500x Speedup in Delivering Higher-Resolution Simulations

NVIDIA today at SC24 announced two new NVIDIA NIM microservices that can accelerate climate change modeling simulation results by 500x in NVIDIA Earth-2.

Earth-2 is a digital twin platform for simulating and visualizing weather and climate conditions. The new NIM microservices offer climate technology application providers advanced generative AI-driven capabilities to assist in forecasting extreme weather events.

NVIDIA NIM microservices help accelerate the deployment of foundation models while keeping data secure.

Extreme weather incidents are increasing in frequency, raising concerns over disaster safety and preparedness, and possible financial impacts.

Natural disasters were responsible for roughly $62 billion of insured losses during the first half of this year. That’s about 70% more than the 10-year average, according to a report in Bloomberg.

NVIDIA is releasing the CorrDiff NIM and FourCastNet NIM microservices to help weather technology companies more quickly develop higher-resolution and more accurate predictions. The NIM microservices also deliver leading energy efficiency compared with traditional systems.

New CorrDiff NIM Microservices for Higher-Resolution Modeling

NVIDIA CorrDiff is a generative AI model for kilometer-scale super resolution. Its capability to super-resolve typhoons over Taiwan was recently shown at GTC 2024. CorrDiff was trained on the Weather Research and Forecasting (WRF) model’s numerical simulations to generate weather patterns at 12x higher resolution.

High-resolution forecasts capable of visualizing within the fewest kilometers are essential to meteorologists and industries. The insurance and reinsurance industries rely on detailed weather data for assessing risk profiles. But achieving this level of detail using traditional numerical weather prediction models like WRF or High-Resolution Rapid Refresh is often too costly and time-consuming to be practical.

The CorrDiff NIM microservice is 500x faster and 10,000x more energy-efficient than traditional high-resolution numerical weather prediction using CPUs. Also, CorrDiff is now operating at 300x larger scale. It is super-resolving — or increasing the resolution of lower-resolution images or videos — for the entire United States and predicting precipitation events, including snow, ice and hail, with visibility in the kilometers.

Enabling Large Sets of Forecasts With New FourCastNet NIM Microservice

Not every use case requires high-resolution forecasts. Some applications benefit more from larger sets of forecasts at coarser resolution.

State-of-the-art numerical models like IFS and GFS are limited to 50 and 20 sets of forecasts, respectively, due to computational constraints.

The FourCastNet NIM microservice, available today, offers global, medium-range coarse forecasts. By using the initial assimilated state from operational weather centers such as European Centre for Medium-Range Weather Forecasts or National Oceanic and Atmospheric Administration, providers can generate forecasts for the next two weeks, 5,000x faster than traditional numerical weather models.

This opens new opportunities for climate tech providers to estimate risks related to extreme weather at a different scale, enabling them to predict the likelihood of low-probability events that current computational pipelines overlook.

Learn more about CorrDiff and FourCastNet NIM microservices on ai.nvidia.com.

Generate Binary Embeddings with Amazon Titan Text Embeddings V2

Configure Amazon Bedrock Knowledge Bases with Binary Vector Embeddings

Conclusion

About the Authors

Solution overview

Solution benefits

Prerequisites

Deploy the solution

Test the solution

Clean up

Conclusion

About the Authors

Solution overview

LLM-powered router

LLM-powered tools

File search tool

File name-based search

Semantic content-based search

Tables tool

LAS tool

Conversational capabilities

Conclusion

About the authors

Understanding the DE-SynPUF dataset

Solution overview

Create and query the DE-SynPUF SQLite database

Build the text-to-SQL application using in-context learning

Create lookup tools for codes

Prompt template

Add dataset information to the prompt

Use few-shot examples to improve performance

Conclusion

About the authors

Solution Overview

Prerequisites

Deploy solution resources

Run the solution

Review the summary

Clean up

Conclusion

About the Authors

Amazon Titan Image Generator G1 v2

Anthropic Claude 3.5 Sonnet

Solution overview

Prerequisites

Provision the solution

Test the solution

Test the solution using the console

Test the solution using the AWS CLI

Test the solution using the terminal

Clean up

Conclusion

About the Author

New CorrDiff NIM Microservices for Higher-Resolution Modeling

Enabling Large Sets of Forecasts With New FourCastNet NIM Microservice

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.