Derive generative AI-powered insights from ServiceNow with Amazon Q Business

Derive generative AI-powered insights from ServiceNow with Amazon Q Business

Effective customer support, project management, and knowledge management are critical aspects of providing efficient customer relationship management. ServiceNow is a platform for incident tracking, knowledge management, and project management functions for software projects and has become an indispensable part of many organizations’ workflows to ensure success of the customer and the product. However, extracting valuable insights from the vast amount of data stored in ServiceNow often requires manual effort and building specialized tooling. Users such as support engineers, project managers, and product managers need to be able to ask questions about an incident or a customer, or get answers from knowledge articles in order to provide excellent customer support. Organizations use ServiceNow to manage workflows, such as IT services, ticketing systems, configuration management, and infrastructure changes across IT systems. Generative artificial intelligence (AI) provides the ability to take relevant information from a data source such as ServiceNow and provide well-constructed answers back to the user.

Building a generative AI-based conversational application integrated with relevant data sources requires an enterprise to invest time, money, and people. First, you need to build connectors to the data sources. Next, you need to index this data to make it available for a Retrieval Augmented Generation (RAG) approach, where relevant passages are delivered with high accuracy to a large language model (LLM). To do this, you need to select an index that provides the capabilities to index the content for semantic and vector search, build the infrastructure to retrieve and rank the answers, and build a feature-rich web application. Additionally, you need to hire and staff a large team to build, maintain, and manage such a system.

Amazon Q Business is a fully managed generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Amazon Q Business can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take action using the data and expertise found in your company’s information repositories, code, and enterprise systems (such as ServiceNow, among others). Amazon Q provides out-of-the-box native data source connectors that can index content into a built-in retriever and uses an LLM to provide accurate, well-written answers. A data source connector is a component of Amazon Q that helps integrate and synchronize data from multiple repositories into one index.

Amazon Q Business offers multiple prebuilt connectors to a large number of data sources, including ServiceNow, Atlassian Confluence, Amazon Simple Storage Service (Amazon S3), Microsoft SharePoint, Salesforce, and many more, and helps you create your generative AI solution with minimal configuration. For a full list of Amazon Q business supported data source connectors, see Amazon Q Business connectors.

You can use the Amazon Q Business ServiceNow Online data source connector to connect to the ServiceNow Online platform and index ServiceNow entities such as knowledge articles, Service Catalogs, and incident entries, along with the metadata and document access control lists (ACLs).

This post shows how to configure the Amazon Q ServiceNow connector to index your ServiceNow platform and take advantage of generative AI searches in Amazon Q. We use an example of an illustrative ServiceNow platform to discuss technical topics related to AWS services.

Find accurate answers from content in ServiceNow using Amazon Q Business

After you integrate Amazon Q Business with ServiceNow, you can ask questions from the description of the document, such as:

  • How do I troubleshoot an invalid IP configuration on a network router? – This could be derived from an internal knowledge article on that topic
  • Which form do I use to request a new email account? – This could be derived from an internal Service Catalog entry
  • Is there a previous incident on the topic of resetting cloud root user password? – This could be derived from an internal incident entry

Overview of the ServiceNow connector

A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. Amazon Q Business offers multiple data source connectors that can connect to your data sources and help you create your generative AI solution with minimal configuration.

To crawl and index contents in ServiceNow, we configure Amazon Q Business ServiceNow connector as a data source in your Amazon Q business application.

When you connect Amazon Q Business to a data source and initiate the data synchronization process, Amazon Q Business crawls and adds documents from the data source to its index.

Types of documents

Let’s look at what are considered as documents in the context of Amazon Q Business ServiceNow connector.

The Amazon Q Business ServiceNow connector supports crawling of the following entities in ServiceNow:

  • Knowledge articles – Each article is considered a single document
  • Knowledge article attachments – Each attachment is considered a single document
  • Service Catalog – Each catalog item is considered a single document
  • Service Catalog attachments – Each catalog attachment is considered a single document
  • Incidents – Each incident is considered a single document
  • Incident attachments – Each incident attachment is considered a single document

Although not all metadata is available at the time of writing, you can also configure field mappings. Field mappings allow you to map ServiceNow field names to Amazon Q index field names. This includes both default field mappings created automatically by Amazon Q, as well as custom field mappings that you can create and edit. Refer to ServiceNow data source connector field mappings documentation for more information.

Authentication

The Amazon Q Business ServiceNow connector support two types of authentication methods:

  • Basic authentication – ServiceNow host URL, user name, and password
  • OAuth 2.0 authentication with Resource Owner Password Flow – ServiceNow host URL, user name, password, client ID, and client secret

Supported ServiceNow versions

ServiceNow usually names platform versions after cities for the added convenience of easily differentiating between versions and associated features. At the time of writing, the following versions are natively supported in the Amazon Q Business ServiceNow connector:

  • San Diego
  • Tokyo
  • Rome
  • Vancouver
  • Others

ACL crawling

To maintain a secure environment, Amazon Q Business now requires ACL and identity crawling for all connected data sources. When preparing to connect Amazon Q Business applications to AWS IAM Identity Center, you need to enable ACL indexing and identity crawling and re-synchronize your connector.

Amazon Q Business enforces data security by supporting the crawling of ACLs and identity information from connected data sources. Indexing documents with ACLs is crucial for maintaining data security, because documents without ACLs are considered public.

If you need to index documents without ACLs, make sure they’re explicitly marked as public in your data source. When connecting a ServiceNow data source, Amazon Q Business crawls ACL information, including user and group information, from your ServiceNow instance. With ACL crawling, you can filter chat responses based on the end-user’s document access level, making sure users only see information they’re authorized to access.

In ServiceNow, user IDs are mapped from user emails and exist on files with set access permissions. This mapping allows Amazon Q Business to effectively enforce access controls based on the user’s identity and permissions within the ServiceNow environment.

Refer to How Amazon Q Business connector crawls ServiceNow ACLs for more information.

Overview of solution

Amazon Q is a generative-AI powered assistant that helps customers answer questions, provide summaries, generate content, and complete tasks based on data in their company repository. It also exists as a learning tool for AWS users who want to ask questions about services and best practices in the cloud. You can use the Amazon Q connector for ServiceNow online to crawl your ServiceNow domain and index service tickets, guides, and community posts to discover answers for your questions faster.

Amazon Q understands and respects your existing identities, roles, and permissions and uses this information to personalize its interactions. If a user doesn’t have permission to access data without Amazon Q, they can’t access it using Amazon Q either. The following table outlines which documents each user is authorized to access for our use case. For a complete list of ServiceNow roles, refer to documentation. The documents being used in this example are a subset of AWS public documents from re:Post pre-loaded into ServiceNow with access restriction.

# First Name Last Name Document type authorized for access ServiceNow Roles
1 John Stiles Knowledge Articles, Service Catalog and Incidents knowledge, catalog, incident_manager
2 Mary Major Knowledge Articles and Service Catalog knowledge, catalog
3 Mateo Jackson Incidents incident_manager

In this post, we show how to use the Amazon Q Business ServiceNow connector to index data from your ServiceNow platform for intelligent search.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Configure your ServiceNow connection

In your ServiceNow platform, complete the following steps to create an OAuth2 secret that could be consumed from your Amazon Q application:

  1. In ServiceNow, on the All menu, expand System OAuth and choose Application Registry.

ServiceNow console

  1. Choose New.

ServiceNow System OAuth App Registry

  1. Choose Create an OAuth API endpoint for external clients.

ServiceNow System OAuth App Registry Create Endpoint

  1. For Name, enter a unique name.
  2. Fill out the remaining parameters according to your requirements and choose Submit.

Note down the client ID and client secret to use in later steps.

ServiceNow Create OAuth Token

Create an Amazon Q Business application

Complete the following steps to create an Amazon Q Business application:

  1. On the Amazon Q console, choose Getting started in the navigation pane.
  2. Under Amazon Q Business Pro, choose Q Business to subscribe.

QBusiness Create App

  1. On the Amazon Q Business console, choose Get started.

QBusiness CreateApp2

  1. On the Applications page, choose Create application.

QBusiness CreateApp3

  1. On the Create application page, provide your application details.
  2. Choose Create.

Make sure the Amazon Q Business application is connected to IAM Identity Center. For more information, see Setting up Amazon Q Business with IAM Identity Center as identity provider.

QBusiness CreateApp4

  1. On the Select retriever page, select Use native retriever for your retriever and select Starter for the index provisioning type.
  2. Choose Next.

QBusiness CreateApp5

  1. On the Connect data sources page, choose Next without connecting to any data source (we do that in the next section).

QBusiness CreateApp6

QBusiness CreateApp7

  1. On the Add groups and users page, choose Add groups and users.

QBusiness CreateApp7

  1. Add any groups and users to access the application.

For more details, refer to Adding users and subscriptions to an Amazon Q Business application.

  1. Choose Create application.

QBusiness CreateApp8

Configure the data source using the Amazon Q ServiceNow Online connector

Now let’s configure the ServiceNow Online data source connector with the Amazon Q application that we created in the previous section.

  1. On the Amazon Q console, navigate to the Applications page and choose the application you just created.

Q Business - Connector Config1

  1. In the Data sources section, choose Add data source.

Q Business - Connector Config2

  1. Search for and choose the ServiceNow Online connector.

Q Business - Connector Config3

  1. Provide the name, ServiceNow host, and version information.

If your ServiceNow version isn’t on the dropdown menu, choose Others.

Q Business - Connector Config4

  1. Choose Create and add new secret to create a new secret to connect with the ServiceNow platform account.

Q Business - Connector Config5

  1. Provide the connection information based on the OAuth2 endpoint created in ServiceNow previously, then choose Save.

Q Business - Connector Config6

  1. Leave the defaults for the VPC and Identity crawler
  2. For IAM role, choose Create a new service role (Recommended) and keep the default role name.

Q Business - Connector Config7

  1. Choose entities that you want to bring over from ServiceNow.

This example shows knowledge articles, Service Catalog items, and incidents. The Filter query option helps curate the list of items that you want to bring into Amazon Q. When you use a query, you can specify multiple knowledge bases, including private knowledge bases. For more details on how to build ServiceNow filters, refer to Filters. For additional query building resources, see Specifying documents to index with a query.

Q Business - Connector Config8

Q Business - Connector Config9

Q Business - Connector Config10

  1. For Sync mode, select Full sync.
  2. For Sync run schedule, choose Run on demand.

Q Business - Connector Config11

  1. Leave the remaining options as default and choose Add data source.

Q Business - Connector Config12

  1. When the data source status shows as Active, initiate data synchronization by choosing Sync now.

Q Business - Connector Config12

Wait until the synchronization status changes to Completed before continuing to the next steps.

Q Business Connector Config13

For information about common issues encountered and related troubleshooting steps, refer to Troubleshooting data source connectors.

Run queries with the Amazon Q web experience

Now that the data synchronization is complete, you can start exploring insights from Amazon Q. You have three users for testing— John with admin access, Mary with access to knowledge articles and service catalog, and Mateo with access only to incidents. In the following steps, you will sign in as each user and ask various questions to see what responses Amazon Q provides based on the permitted document types for their respective groups. You will also test edge cases where users try to access information from restricted sources to validate the access control functionality.

  1. On the details page of the new Amazon Q application, navigate to the Web experience settings tab and choose the link under Deployed URL. This will open a new tab with a preview of the UI and options to customize according to your needs.

Q Business - Web Experience1

  1. Log in to the application as John Stiles first, using the credentials for the user that you added to the Amazon Q application.

Q Business - Web Experience2

  1. After the login is successful, choose the application that you just created.

Q Business - Web Application3

  1. From there, you’ll be redirected to the Amazon Q assistant UI, where you can start asking questions using natural language and get insights from your ServiceNow platform.

Q Business - Web Experience4

  1. Let’s run some queries to see how Amazon Q can answer questions related to synchronized data. John has access to all ServiceNow document types. When asked “How do I upgrade my EKS cluster to the latest version”, Amazon Q will provide a summary pulling information from the related knowledge article, highlighting the sources at the end of each excerpt.

QBusiness-ServiceNow-Connector

  1. Still logged in as John, when asked “What is Amazon QLDB?”, Amazon Q will provide a summary pulling information from the related ServiceNow incident.

QBusiness-ServiceNow-Connector

  1. Sign out as user John. Start a new incognito browser session or use a different browser. Copy the web experience URL and sign in as user Mary. Repeat these steps each time you need to sign in as a different user. Mary only has access to knowledge articles and service catalog with no incident access. When asked “How do I perform vector search with Amazon Redshift”, Amazon Q will provide a summary pulling information from the related knowledge article, highlighting the source.

QBusiness-ServiceNow-Connector

  1. However, when asked “What is Amazon QLDB?”, Amazon Q responds that it could not find relevant information. This because Mary does not have access to ServiceNow incidents which is the only place where the answer to that question can be found.

QBusiness-ServiceNow-Connector

  1. Sign out as user Mary. Start a new incognito browser session or use a different browser. Copy the web experience URL and sign in as user Mateo. Mateo only has access to incidents with no knowledge article or service catalog access. When asked “What is Amazon QLDB?”, Amazon Q will provide a summary pulling information from the related incident, highlighting the source.

QBusiness-ServiceNow-Connector

  1. However, when asked “How do I perform vector search with Amazon Redshift?”, Amazon Q responds that it could not find relevant information. This because Mateo does not have access to ServiceNow knowledge article which is the only place where the answer to this question can be found.

QBusiness-ServiceNow-Connector

Try out the assistant with additional queries, such as:

  • How do you set up new blackberry device?
  • How do I set up S3 object replication?
  • How do I resolve empty log issues in CloudWatch?
  • How do I troubleshoot 403 Access Denied errors from Amazon S3?

Frequently asked questions

In this section, we provide guidance to frequently asked questions.

Amazon Q Business is unable to answer your questions

If you get the response “Sorry, I could not find relevant information to complete your request,” this may be due to a few reasons:

  • No permissions – ACLs applied to your account don’t allow you to query certain data sources. If this is the case, reach out to your application administrator to make sure your ACLs are configured to access the data sources.
  • Email ID doesn’t match user ID – In rare scenarios, a user may have a different email ID associated with Amazon Q in IAM Identity Center than what is associated in the ServiceNow user profile. In such cases, make sure the Amazon Q user profile is updated to recognize the ServiceNow email ID through the update-user command in the AWS Command Line Interface (AWS CLI) or the related API call.
  • Data connector sync failed – Your data connector may have failed to sync information from the source to the Amazon Q Business application. Verify the data connector’s sync run schedule and sync history to confirm the sync is successful.
  • Empty or private ServiceNow projects – Private or empty projects aren’t crawled during the sync run.

If none of these reasons apply to your use case, open a support case and work with your technical account manager to get this resolved.

How to generate responses from authoritative data sources

If you want Amazon Q Business to only generate responses from authoritative data sources, you can configure this using the Amazon Q Business application global controls under Admin controls and guardrails.

  1. Log in to the Amazon Q Business console as an Amazon Q Business application administrator.
  2. Navigate to the application and choose Admin controls and guardrails in the navigation pane.
  3. Choose Edit in the Global controls section to set these options.

For more information, refer to Admin controls and guardrails in Amazon Q Business.

Q Business - Troubleshooting

Amazon Q Business responds using old (stale) data even though your data source is updated

Each Amazon Q Business data connector can be configured with a unique sync run schedule frequency. Verifying the sync status and sync schedule frequency for your data connector reveals when the last sync ran successfully. It could be that your data connector’s sync run schedule is either set to sync at a scheduled time of day, week, or month. If it’s set to run on demand, the sync has to be manually invoked. When the sync run is complete, verify the sync history to make sure the run has successfully synced all new issues. Refer to Sync run schedule for more information about each option.

Clean up

To avoid incurring future charges, clean up any resources created as part of this solution. Delete the Amazon Q ServiceNow Online connector data source, OAuth API endpoint created in ServiceNow, and the Q Business application. Also, delete the user management setup in IAM Identity Center.

Conclusion

In this post, we discussed how to configure the Amazon Q ServiceNow Online connector to crawl and index service tickets, community posts, and knowledge guides. We showed how generative AI-based search in Amazon Q enables your business leaders and agents to discover insights from your ServiceNow content quicker. This is all available through a user-friendly interface with Amazon Q Business doing the undifferentiated heavy lifting.

To learn more about the Amazon Q Business connector for ServiceNow Online, refer to Connecting ServiceNow Online to Amazon Q Business.


About the Authors

Prabhakar Chandrasekaran is a Senior Technical Account Manager with AWS Enterprise Support. Prabhakar enjoys helping customers build cutting-edge AI/ML solutions on the cloud. He also works with enterprise customers providing proactive guidance and operational assistance, helping them improve the value of their solutions when using AWS. Prabhakar holds six AWS and seven other professional certifications. With over 20 years of professional experience, Prabhakar was a data engineer and a program leader in the financial services space prior to joining AWS.

Lakshmi Dogiparti is a is a Software Development Engineer at Amazon Web Services. She works on the Amazon Q and Amazon Kendra connector design, development, integration and test operations.

Vijai Gandikota is a Principal Product Manager in the Amazon Q and Amazon Kendra organization of Amazon Web Services. He is responsible for the Amazon Q and Amazon Kendra connectors, ingestion, security, and other aspects of the Amazon Q and Amazon Kendra services.

Read More

Intelligent healthcare forms analysis with Amazon Bedrock

Intelligent healthcare forms analysis with Amazon Bedrock

Generative artificial intelligence (AI) provides an opportunity for improvements in healthcare by combining and analyzing structured and unstructured data across previously disconnected silos. Generative AI can help raise the bar on efficiency and effectiveness across the full scope of healthcare delivery.

The healthcare industry generates and collects a significant amount of unstructured textual data, including clinical documentation such as patient information, medical history, and test results, as well as non-clinical documentation like administrative records. This unstructured data can impact the efficiency and productivity of clinical services, because it’s often found in various paper-based forms that can be difficult to manage and process. Streamlining the handling of this information is crucial for healthcare providers to improve patient care and optimize their operations.

Handling large volumes of data, extracting unstructured data from multiple paper forms or images, and comparing it with the standard or reference forms can be a long and arduous process, prone to errors and inefficiencies. However, advancements in generative AI solutions have introduced automated approaches that offer a more efficient and reliable solution for comparing multiple documents.

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. Amazon Bedrock offers a serverless experience, so you can get started quickly, privately customize FMs with your own data, and quickly integrate and deploy them into your applications using the AWS tools without having to manage the infrastructure.

In this post, we explore using the Anthropic Claude 3 on Amazon Bedrock large language model (LLM). Amazon Bedrock provides access to several LLMs, such as Anthropic Claude 3, which can be used to generate semi-structured data relevant to the healthcare industry. This can be particularly useful for creating various healthcare-related forms, such as patient intake forms, insurance claim forms, or medical history questionnaires.

Solution overview

To provide a high-level understanding of how the solution works before diving deeper into the specific elements and the services used, we discuss the architectural steps required to build our solution on AWS. We illustrate the key elements of the solution, giving you an overview of the various components and their interactions.

We then examine each of the key elements in more detail, exploring the specific AWS services that are used to build the solution, and discuss how these services work together to achieve the desired functionality. This provides a solid foundation for further exploration and implementation of the solution.

Part 1: Standard forms: Data extraction and storage

The following diagram highlights the key elements of a solution for data extraction and storage with standard forms.

Figure 1: Architecture – Standard Form – Data Extraction & Storage.

The Standard from processing steps are as follows:

  1. A user upload images of paper forms (PDF, PNG, JPEG) to Amazon Simple Storage Service (Amazon S3), a highly scalable and durable object storage service.
  2. Amazon Simple Queue Service (Amazon SQS) is used as the message queue. Whenever a new form is loaded, an event is invoked in Amazon SQS.
    1. If an S3 object is not processed, then after two tries it will be moved to the SQS dead-letter queue (DLQ), which can be configured further with an Amazon Simple Notification Service (Amazon SNS) topic to notify the user through email.
  3. The SQS message invokes an AWS Lambda The Lambda function is responsible for processing the new form data.
  4. The Lambda function reads the new S3 object and passes it to the Amazon Textract API to process the unstructured data and generate a hierarchical, structured output. Amazon Textract is an AWS service that can extract text, handwriting, and data from scanned documents and images. This approach allows for the efficient and scalable processing of complex documents, enabling you to extract valuable insights and data from various sources.
  5. The Lambda function passes the converted text to Anthropic Claude 3 on Amazon Bedrock Anthropic Claude 3 to generate a list of questions.
  6. Lastly, the Lambda function stores the question list in Amazon S3.

Amazon Bedrock API call to extract form details

We call an Amazon Bedrock API twice in the process for the following actions:

  • Extract questions from the standard or reference form – The first API call is made to extract a list of questions and sub-questions from the standard or reference form. This list serves as a baseline or reference point for comparison with other forms. By extracting the questions from the reference form, we can establish a benchmark against which other forms can be evaluated.
  • Extract questions from the custom form – The second API call is made to extract a list of questions and sub-questions from the custom form or the form that needs to be compared against the standard or reference form. This step is necessary because we need to analyze the custom form’s content and structure to identify its questions and sub-questions before we can compare them with the reference form.

By having the questions extracted and structured separately for both the reference and custom forms, the solution can then pass these two lists to the Amazon Bedrock API for the final comparison step. This approach maintains the following:

  • Accurate comparison – The API has access to the structured data from both forms, making it straightforward to identify matches, mismatches, and provide relevant reasoning
  • Efficient processing – Separating the extraction process for the reference and custom forms helps avoid redundant operations and optimizes the overall workflow
  • Observability and interoperability – Keeping the questions separate enables better visibility, analysis, and integration of the questions from different forms
  • Hallucination avoidance – By following a structured approach and relying on the extracted data, the solution helps avoid generating or hallucinating content, providing integrity in the comparison process

This two-step approach uses the capabilities of the Amazon Bedrock API while optimizing the workflow, enabling accurate and efficient form comparison, and promoting observability and interoperability of the questions involved.

See the following code (API Call):

def get_response_from_claude3(context, prompt_data):
    body = json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 4096,
        "system":"""You are an expert form analyzer and can understand different sections and subsections within a form and can find all the questions  being asked. You can find similarities and differences at the question level between different types of forms.""",
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", 
                     "text": f"""Given the following document(s): {context} n {prompt_data}"""},
                ],
            }
        ],
    })
    modelId = f'anthropic.claude-3-sonnet-20240229-v1:0'     
    config = Config(read_timeout=1000)
    bedrock = boto3.client('bedrock-runtime',config=config)    
    response = bedrock.invoke_model(body=body, modelId=modelId)
    response_body = json.loads(response.get("body").read())
    answer = response_body.get("content")[0].get("text")
   return answer

User prompt to extract fields and list them

We provide the following user prompt to Anthropic Claude 3 to extract the fields from the raw text and list them for comparison as shown in step 3B (of Figure 3: Data Extraction & Form Field comparison).

get_response_from_claude3(response, f""" Create a summary of the different sections in the form, then
                                         for each section create a list of all questions and sub questions asked in the
                                         whole form and group by section including signature, date, reviews and approvals. 
                                         Then concatenate all questions and return a single numbered list, Be very detailed."""))

The following figure illustrates the output from Amazon Bedrock with a list of questions from the standard or reference form.

Figure 2:  Standard Form Sample Question List

Store this question list in Amazon S3 so it can be used for comparison with other forms, as shown in Part 2 of the process below.

Part 2: Data extraction and form field comparison

The following diagram illustrates the architecture for the next step, which is data extraction and form field comparison.

Figure 3: Data Extraction & Form Field comparison

Steps 1 and 2 are similar to those in Figure 1, but are repeated for the forms to be compared against the standard or reference forms. The next steps are as follows:

  1. The SQS message invokes a Lambda function. The Lambda function is responsible for processing the new form data.
    1. The raw text is extracted by Amazon Textract using a Lambda function. The extracted raw text is then passed to Step 3B for further processing and analysis.
    2. Anthropic Claude 3 generates a list of questions from the custom form that needs to be compared with the standard from. Then both forms and document question lists are passed to Amazon Bedrock, which compares the extracted raw text with standard or reference raw text to identify differences and anomalies to provide insights and recommendations relevant to the healthcare industry by respective category. It then generates the final output in JSON format for further processing and dashboarding. The Amazon Bedrock API call and user prompt from Step 5 (Figure 1: Architecture – Standard Form – Data Extraction & Storage) are reused for this step to generate a question list from the custom form.

We discuss Steps 4–6 in the next section.

The following screenshot shows the output from Amazon Bedrock with a list of questions from the custom form.

Figure 4:  Custom Form Sample Question List

Final comparison using Anthropic Claude 3 on Amazon Bedrock:

The following examples show the results from the comparison exercise using Amazon Bedrock with Anthropic Claude 3, showing one that matched and one that didn’t match with the reference or standard form.

The following is the user prompt for forms comparison:

categories = ['Personal Information','Work History','Medical History','Medications and Allergies','Additional Questions','Physical Examination','Job Description','Examination Results']
forms = f"Form 1 : {reference_form_question_list}, Form 2 : {custom_form_question_list}"

The following is the first call:

match_result = (get_response_from_claude3(forms, f""" Go through questions and sub questions {start}- {processed} in Form 2 return the question whether it matches with any question /sub question/field in Form 1 in terms of meaning and context and provide reasoning, or if it does not match with any question/sub question/field in Form 1 and provide reasoning. Treat each sub question as its own question and the final output should be a numbered list with the same length as the number of questions and sub questions in Form 2. Be concise"""))

The following is the second call:

get_response_from_claude3(match_result, 
f""" Go through all the questions and sub questions in the Form 2 Results and turn this into a JSON object called 'All Questions' which has the keys 'Question' with only the matched or unmatched question, 'Match' with valid values of yes or no, and 'Reason' which is the reason of match or no match, ‘Category' placing the question in one the categories in this list: {categories} . Do not omit any questions in output."""))

The following screenshot shows the questions matched with the reference form.

The following screenshot shows the questions that didn’t match with the reference form.

The steps from the preceding architecture diagram continue as follows:

4. The SQS queue invokes a Lambda function.

5. The Lambda function invokes an AWS Glue job and monitors for completion.

a. The AWS Glue job processes the final JSON output from the Amazon Bedrock model in tabular format for reporting.

6. Amazon QuickSight is used to create interactive dashboards and visualizations, allowing healthcare professionals to explore the analysis, identify trends, and make informed decisions based on the insights provided by Anthropic Claude 3.

The following screenshot shows a sample QuickSight dashboard.

       

Next steps

Many healthcare providers are investing in digital technology, such as electronic health records (EHRs) and electronic medical records (EMRs) to streamline data collection and storage, allowing appropriate staff to access records for patient care. Additionally, digitized health records provide the convenience of electronic forms and remote data editing for patients. Electronic health records offer a more secure and accessible record system, reducing data loss and facilitating data accuracy. Similar solutions can offer capturing the data in these paper forms into EHRs.

Conclusion

Generative AI solutions like Amazon Bedrock with Anthropic Claude 3 can significantly streamline the process of extracting and comparing unstructured data from paper forms or images. By automating the extraction of form fields and questions, and intelligently comparing them against standard or reference forms, this solution offers a more efficient and accurate approach to handling large volumes of data. The integration of AWS services like Lambda, Amazon S3, Amazon SQS, and QuickSight provides a scalable and robust architecture for deploying this solution. As healthcare organizations continue to digitize their operations, such AI-powered solutions can play a crucial role in improving data management, maintaining compliance, and ultimately enhancing patient care through better insights and decision-making.


About the Authors

Satish Sarapuri is a Sr. Data Architect, Data Lake at AWS. He helps enterprise-level customers build high-performance, highly available, cost-effective, resilient, and secure generative AI, data mesh, data lake, and analytics platform solutions on AWS, through which customers can make data-driven decisions to gain impactful outcomes for their business and help them on their digital and data transformation journey. In his spare time, he enjoys spending time with his family and playing tennis.

Harpreet Cheema is a Machine Learning Engineer at the AWS Generative AI Innovation Center. He is very passionate in the field of machine learning and in tackling data-oriented problems. In his role, he focuses on developing and delivering machine learning focused solutions for customers across different domains.

Deborah Devadason is a Senior Advisory Consultant in the Professional Service team at Amazon Web Services. She is a results-driven and passionate Data Strategy specialist with over 25 years of consulting experience across the globe in multiple industries. She leverages her expertise to solve complex problems and accelerate business-focused journeys, thereby creating a stronger backbone for the digital and data transformation journey.

Read More

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

Harness the power of AI and ML using Splunk and Amazon SageMaker Canvas

As the scale and complexity of data handled by organizations increase, traditional rules-based approaches to analyzing the data alone are no longer viable. Instead, organizations are increasingly looking to take advantage of transformative technologies like machine learning (ML) and artificial intelligence (AI) to deliver innovative products, improve outcomes, and gain operational efficiencies at scale. Furthermore, the democratization of AI and ML through AWS and AWS Partner solutions is accelerating its adoption across all industries.

For example, a health-tech company may be looking to improve patient care by predicting the probability that an elderly patient may become hospitalized by analyzing both clinical and non-clinical data. This will allow them to intervene early, personalize the delivery of care, and make the most efficient use of existing resources, such as hospital bed capacity and nursing staff.

AWS offers the broadest and deepest set of AI and ML services and supporting infrastructure, such as Amazon SageMaker and Amazon Bedrock, to help you at every stage of your AI/ML adoption journey, including adoption of generative AI. Splunk, an AWS Partner, offers a unified security and observability platform built for speed and scale.

As the diversity and volume of data increases, it is vital to understand how they can be harnessed at scale by using complementary capabilities of the two platforms. For organizations looking beyond the use of out-of-the-box Splunk AI/ML features, this post explores how Amazon SageMaker Canvas, a no-code ML development service, can be used in conjunction with data collected in Splunk to drive actionable insights. We also demonstrate how to use the generative AI capabilities of SageMaker Canvas to speed up your data exploration and help you build better ML models.

Use case overview

In this example, a health-tech company offering remote patient monitoring is collecting operational data from wearables using Splunk. These device metrics and logs are ingested into and stored in a Splunk index, a repository of incoming data. Within Splunk, this data is used to fulfill context-specific security and observability use cases by Splunk users, such as monitoring the security posture and uptime of devices and performing proactive maintenance of the fleet.

Separately, the company uses AWS data services, such as Amazon Simple Storage Service (Amazon S3), to store data related to patients, such as patient information, device ownership details, and clinical telemetry data obtained from the wearables. These could include exports from customer relationship management (CRM), configuration management database (CMDB), and electronic health record (EHR) systems. In this example, they have access to an extract of patient information and hospital admission records that reside in an S3 bucket.

The following table illustrates the different data explored in this example use case.

Description

Feature Name

Storage

Example Source

Age of patient

age

AWS

EHR

Units of alcohol consumed by patient every week

alcohol_consumption

AWS

EHR

Tobacco usage by patient per week

tabacco_use

AWS

EHR

Average systolic blood pressure of patient

avg_systolic

AWS

Wearables

Average diastolic blood pressure of patient

avg_diastolic

AWS

Wearables

Average resting heart rate of patient

avg_resting_heartrate

AWS

Wearables

Patient admission record

admitted

AWS

EHR

Number of days the device has been active over a period

num_days_device_active

Splunk

Wearables

Average end of the day battery level over a period

avg_eod_device_battery_level

Splunk

Wearables

This post describes an approach with two key components:

  • The two data sources are stored alongside each other using a common AWS data engineering pipeline. Data is presented to the personas that need access using a unified interface.
  • An ML model to predict hospital admissions (admitted) is developed using the combined dataset and SageMaker Canvas. Professionals without a background in ML are empowered to analyze the data using no-code tooling.

The solution allows custom ML models to be developed from a broader variety of clinical and non-clinical data sources to cater for different real-life scenarios. For example, it can be used to answer questions such as “If patients have a propensity to have their wearables turned off and there is no clinical telemetry data available, can the likelihood that they are hospitalized still be accurately predicted?”

AWS data engineering pipeline

The adaptable approach detailed in this post starts with an automated data engineering pipeline to make data stored in Splunk available to a wide range of personas, including business intelligence (BI) analysts, data scientists, and ML practitioners, through a SQL interface. This is achieved by using the pipeline to transfer data from a Splunk index into an S3 bucket, where it will be cataloged.

The approach is shown in the following diagram.

The diagram shows an architecture overview of data engineering pipeline. The components marked in the diagram are listed below.

Figure 1: Architecture overview of data engineering pipeline

The automated AWS data pipeline consists of the following steps:

  1. Data from wearables is stored in a Splunk index where it can be queried by users, such as security operations center (SOC) analysts, using the Splunk search processing language (SPL). Spunk’s out-of-the-box AI/ML capabilities, such as the Splunk Machine Learning Toolkit (Splunk MLTK) and purpose-built models for security and observability use cases (for example, for anomaly detection and forecasting), can be utilized inside the Splunk Platform. Using these Splunk ML features allows you to derive contextualized insights quickly without the need for additional AWS infrastructure or skills.
  2. Some organizations may look to develop custom, differentiated ML models, or want to build AI-enabled applications using AWS services for their specific use cases. To facilitate this, an automated data engineering pipeline is built using AWS Step Functions. The Step Functions state machine is configured with an AWS Lambda function to retrieve data from the Splunk index using the Splunk Enterprise SDK for Python. The SPL query requested through this REST API call is scoped to only retrieve the data of interest.
      1. Lambda supports container images. This solution uses a Lambda function that runs a Docker container image. This allows larger data manipulation libraries, such as pandas and PyArrow, to be included in the deployment package.
      2. If a large volume of data is being exported, the code may need to run for longer than the maximum possible duration, or require more memory than supported by Lambda functions. If this is the case, Step Functions can be configured to directly run a container task on Amazon Elastic Container Service (Amazon ECS).
  3. For authentication and authorization, the Spunk bearer token is securely retrieved from AWS Secrets Manager by the Lambda function before calling the Splunk /search REST API endpoint. This bearer authentication token lets users access the REST endpoint using an authenticated identity.
  4. Data retrieved by the Lambda function is transformed (if required) and uploaded to the designated S3 bucket alongside other datasets. This data is partitioned and compressed, and stored in storage and performance-optimized Apache Parquet file format.
  5. As its last step, the Step Functions state machine runs an AWS Glue crawler to infer the schema of the Splunk data residing in the S3 bucket, and catalogs it for wider consumption as tables using the AWS Glue Data Catalog.
  6. Wearable data exported from Splunk is now available to users and applications through the Data Catalog as a table. Analytics tooling such as Amazon Athena can now be used to query the data using SQL.
  7. As data stored in your AWS environment grows, it is essential to have centralized governance in place. AWS Lake Formation allows you to simplify permissions management and data sharing to maintain security and compliance.

An AWS Serverless Application Model (AWS SAM) template is available to deploy all AWS resources required by this solution. This template can be found in the accompanying GitHub repository.

Refer to the README file for required prerequisites, deployment steps, and the process to test the data engineering pipeline solution.

AWS AI/ML analytics workflow

After the data engineering pipeline’s Step Functions state machine successfully completes and wearables data from Splunk is accessible alongside patient healthcare data using Athena, we use an example approach based on SageMaker Canvas to drive actionable insights.

SageMaker Canvas is a no-code visual interface that empowers you to prepare data, build, and deploy highly accurate ML models, streamlining the end-to-end ML lifecycle in a unified environment. You can prepare and transform data through point-and-click interactions and natural language, powered by Amazon SageMaker Data Wrangler. You can also tap into the power of automated machine learning (AutoML) and automatically build custom ML models for regression, classification, time series forecasting, natural language processing, and computer vision, supported by Amazon SageMaker Autopilot.

In this example, we use the service to classify whether a patient is likely to be admitted to a hospital over the next 30 days based on the combined dataset.

The approach is shown in the following diagram.

The diagram shows an architecture overview of ML development. Important components of the solution are listed below.

Figure 2: Architecture overview of ML development

The solution consists of the following steps:

  1. An AWS Glue crawler crawls the data stored in S3 bucket. The Data Catalog exposes this data found in the folder structure as tables.
  2. Athena provides a query engine to allow people and applications to interact with the tables using SQL.
  3. SageMaker Canvas uses Athena as a data source to allow the data stored in the tables to be used for ML model development.

Solution overview

SageMaker Canvas allows you to build a custom ML model using a dataset that you have imported. In the following sections, we demonstrate how to create, explore, and transform a sample dataset, use natural language to query the data, check for data quality, create additional steps for the data flow, and build, test, and deploy an ML model.

Prerequisites

Before proceeding, refer to Getting started with using Amazon SageMaker Canvas to make sure you have the required prerequisites in place. Specifically, validate that the AWS Identity and Access Management (IAM) role your SageMaker domain is using has a policy attached with sufficient permissions to access Athena, AWS Glue, and Amazon S3 resources.

Create the dataset

SageMaker Canvas supports Athena as a data source. Data from wearables and patient healthcare data residing across your S3 bucket is accessed using Athena and the Data Catalog. This allows this tabular data to be directly imported into SageMaker Canvas to start your ML development.

To create your dataset, complete the following steps:

  1. On the SageMaker Canvas console, choose Data Wrangler in the navigation pane.
  2. On the Import and prepare dropdown menu, choose Tabular as the dataset type to denote that the imported data consists of rows and columns.
The screenshot shows how tabular data is imported using SageMaker Data Wrangler. Tabular from the import and prepare option is highlighted.

Figure 3: Importing tabular data using SageMaker Data Wrangler

  1. For Select a data source, choose Athena.

On this page, you will see your Data Catalog database and tables listed, named patient_data and splunk_ops_data.

  1. Join (inner join) the tables together using the user_id and id to create one overarching dataset that can be used during ML model development.
  2. Under Import settings, enter unprocessed_data for Dataset name.
  3. Choose Import to complete the process.
The screenshot shows how tabular data is joined using SageMaker Data Wrangler. 2 tables discovered from Athena are highlighted, alongside the user id fields that are used to join the 2 tables together.

Figure 4: Joining data using SageMaker Data Wrangler

The combined dataset is now available to explore and transform using SageMaker Data Wrangler.

Explore and transform the dataset

SageMaker Data Wrangler enables you to transform and analyze the source dataset through data flows while still maintaining a no-code approach.

The previous step automatically created a data flow in the SageMaker Canvas console which we have renamed to data_prep_data_flow.flow. Additionally, two steps are automatically generated, as listed in the following table.

Step

Name

Description

1

Athena Source

Sets the unprocessed_data dataset as source of the data flow

2

Data types

Sets column types of unprocessed_data dataset

Before we create additional transform steps, let’s explore two SageMaker Canvas features that can help us focus on the right actions.

Use natural language to query the data

SageMaker Data Wrangler also provides generative AI capabilities called Chat for data prep powered by a large language model (LLM). This feature allows you to explore your data using natural language without any background in ML or SQL. Furthermore, any contextualized recommendations returned by the generative AI model can be introduced directly back into the data flow without writing any code.

In this section, we present some example prompts to demonstrate this in action. These examples have been selected to illustrate the art of the possible. We recommend that you experiment with different prompts to gain the best results for your particular use cases.

Example 1: Identify Splunk default fields

In this first example, we want to know whether there are Splunk default fields that we could potentially exclude from our dataset prior to ML model development.

  1. In SageMaker Data Wrangler, open your data flow.
  2. Choose Step 2 Data types, and choose Chat for data prep.
  3. In the Chat for data prep pane, you can enter prompts in natural language to explore and transform the data. For example:
Prompt: This data has been exported from a Splunk index. Which columns are likely to be Splunk default fields? 
Response: _bkt, _cd, _indextime, _raw, _si, _sourcetype, _time are likely Splunk default fields based on the Summary section showing they are mostly metadata/index fields. 
Prompt: OK. Let's drop _bkt, _cd, _indextime, _raw, _si, _sourcetype, and _time. 
Response: The code drops the columns '_bkt', '_cd', '_indextime', '_raw', '_si', '_sourcetype' and '_time' from the dataframe df using the drop() method.

In this example, the generative AI LLM has correctly identified Splunk default fields that could be safely dropped from the dataset.

  1. Choose Add to steps to add this identified transformation to the data flow.
The screenshot shows how SageMaker Data Wrangler’s chat for data prep is used to identify Splunk’s default fields. Conversation has been captured above the screenshot. Add to steps button is highlighted in the screenshot.

Figure 5: Using SageMaker Data Wrangler’s chat for data prep to identify Splunk’s default fields

Example 2: Identify additional columns that could be dropped

We now want to identify any further columns that could be dropped without being too specific about what we’re looking for. We want the LLM to make the suggestions based on the data, and provide us with the rationale. For example:

Prompt: List out the names of other columns that we should drop before ML development. Give me a list of columns, along with the reasons why. 
Response: The code drops the timestamp, punct, id, index and linecount columns from the dataset. These columns do not contain useful information for machine learning and would add noise to the models.

In addition to the Splunk default fields identified earlier, the generative AI model is now proposing the removal of columns such as timestamp, punct, id, index, and linecount that don’t appear to be conducive to ML model development.

The screenshot shows how SageMaker Data Wrangler’s chat for data prep is used to identify additional fields that can be dropped. Conversation has been captured above the screenshot.

Figure 6: Using SageMaker Data Wrangler’s chat for data prep to identify additional fields that can be dropped

Example 3: Calculate average age column in dataset

You can also use the generative AI model to perform Text2SQL tasks in which you can simply ask questions of the data using natural language. This is useful if you want to validate the content of the dataset.

In this example, we want to know what the average patient age value is within the dataset:

Prompt: What is the average age column of the dataset? 
Response: The average age in the dataset is 67 years.

By expanding View code, you can see what SQL statements the LLM has constructed using its Text2SQL capabilities. This gives you full visibility into how the results are being returned.

The screenshot shows how SageMaker Data Wrangler’s chat for data prep is used to run SQL statements. Under view code, the screenshot shows SELECT AVG(age) FROM df; which is the SQL statement chat for data prep has returned.

Figure 7: Using SageMaker Data Wrangler’s chat for data prep to run SQL statements

Check for data quality

SageMaker Canvas also provides exploratory data analysis (EDA) capabilities that allow you to gain deeper insights into the data prior to the ML model build step. With EDA, you can generate visualizations and analyses to validate whether you have the right data, and whether your ML model build is likely to yield results that are aligned to your organization’s expectations.

Example 1: Create a Data Quality and Insights Report

Complete the following steps to create a Data Quality and Insights Report:

  1. While in the data flow step, choose the Analyses tab.
  2. For Analysis type, choose Data Quality and Insights Report.
  3. For Target column, choose admitted.
  4. For Problem type, choose Classification.

This performs an analysis of the data that you have and provides information such as the number of missing values and outliers.

The screenshot shows how SageMaker Data Wrangler’s data quality and insights report is used to perform analysis of the data. It shows a summary of dataset characteristics, such as number of features, number of rows, missing values, duplicated rows and data validity.

Figure 8: Running SageMaker Data Wrangler’s data quality and insights report

Refer to Get Insights On Data and Data Quality for details on how to interpret the results of this report.

Example 2: Create a Quick Model

In this second example, choose Quick Model for Analysis type and for Target column, choose admitted. The Quick Model estimates the expected predicted quality of the model.

By running the analysis, the estimated F1 score (a measure of predictive performance) of the model and feature importance scores are displayed.

The screenshot shows how SageMaker Data Wrangler’s quick model feature is used to assess the potential accuracy of the model. It has determined that the model achieved a F1 score of 0.76, and that systlolic blood pressure, average end of day device battery level, average number of days device is active and age values all have an impact to the hospital admission prediction.

Figure 9: Running SageMaker Data Wrangler’s quick model feature to assess the potential accuracy of the model

SageMaker Canvas supports many other analysis types. By reviewing these analyses in advance of your ML model build, you can continue to engineer the data and features to gain sufficient confidence that the ML model will meet your business objectives.

Create additional steps in the data flow

In this example, we have decided to update our data_prep_data_flow.flow data flow to implement additional transformations. The following table summarizes these steps.

Step

Transform

Description

3

Chat for data prep

Removes Splunk default fields identified.

4

Chat for data prep

Removes additional fields identified as being unhelpful to ML model development.

5

Group by

Groups together the rows by user_id and calculates an average
of time-ordered numerical fields from Splunk. This is performed to convert the ML problem type from time series forecasting into a simple two-category prediction of target feature (
admitted) using averages of the input values over a given time period. Alternatively, SageMaker Canvas also supports time series forecasting.

6

Drop column (manage columns)

Drops remaining columns that are unnecessary for our ML development, such as columns with high cardinality (for example, user_id).

7

Parse column as type

Converts numerical value types, for example from Float to Long. This is performed to make sure values, such as those in unit of days, remain integers after calculations.

8

Parse column as type

Converts additional columns that need to be parsed (each column requires a separate step).

9

Drop duplicates (manage rows)

Drops duplicate rows to avoid overfitting.

To create a new transform, view the data flow, then choose Add transform on the last step.

The screenshot shows how a transform can be added to a data flow in SageMaker Data Wrangler. The add transform option on the final step is highlighted.

Figure 10: Using SageMaker Data Wrangler to add a transform to a data flow

Choose Add transform, and proceed to choose a transform type and its configuration.

The screenshot shows how a transform can be added to a data flow in SageMaker Data Wrangler. The add transform option on the final step is highlighted.

Figure 11: Using SageMaker Data Wrangler to add a transform to a data flow

The following screenshot shows our newly updated end-to-end data flow featuring multiple steps. In this example, we ran the analyses at the end of the data flow.

The screenshot shows the end-to-end data flow in SageMaker Data Wrangler. The steps shown in the data flow are described in the table above.

Figure 12: Showing the end-to-end SageMaker Canvas Data Wrangler data flow

If you want to incorporate this data flow into a productionized ML workflow, SageMaker Canvas can create a Jupyter notebook that exports your data flow to Amazon SageMaker Pipelines.

Develop the ML model

To get started with ML model development, complete the following steps:

  1. Choose Create model directly from the last step of the data flow.
The screenshot shows how a model is created from the data flow in SageMaker Data Wrangler. Create model option is highlighted on the final data flow step.

Figure 13: Creating a model from the SageMaker Data Wrangler data flow

  1. For Dataset name, enter a name for your transformed dataset (for example, processed_data).
  2. Choose Export.
The screenshot shows how the exported dataset is named in SageMaker Data Wrangler. A name, processed_data, is being entered into the dataset name field.

Figure 14: Naming the exported dataset to be used by the model in SageMaker Data Wrangler

This step will automatically create a new dataset.

  1. After the dataset has been created successfully, choose Create model to begin the ML model creation.
The screenshot shows how the model is then created from the exported dataset using SageMaker Data Wrangler. The create model link at the borttom of the screen is being highlighted.

Figure 15: Creating the model in SageMaker Data Wrangler

  1. For Model name, enter a name for the model (for example, my_healthcare_model).
  2. For Problem type, select Predictive analysis.
  3. Choose Create.
The screenshot shows how the model is named and predictive analysis type is selected in SageMaker Canvas. Model name my_healthcare_model is being entered, and the predictive analysis option being selected.

Figure 16: Naming the model in SageMaker Canvas and selecting the predictive analysis type

You are now ready to progress through the Build, Analyze, Predict, and Deploy stages to develop and operationalize the ML model using SageMaker Canvas.

  1. On the Build tab, for Target column, choose the column you want to predict (admitted).
  2. Choose Quick build to build the model.

The Quick build option has a shorter build time, but the Standard build option generally enjoys higher accuracy.

The screenshot shows how the target column to predict for the model is selected in SageMaker Canvas. Field admitted has been chosen in the target column drop-down. The quick build button is highlighted.

Figure 17: Selecting the target column to predict in SageMaker Canvas

After a few minutes, on the Analyze tab, you will be able to view the accuracy of the model, along with column impact, scoring, and other advanced metrics. For example, we can see that a feature from the wearables data captured in Splunk—average_num_days_device_active—has a strong impact on whether the patient is likely to be admitted or not, along with their age. As such, the health-tech company may proactively reach out to elderly patients who tend to keep their wearables off to minimize the risk of their hospitalization.

The screenshot shows how the results from the model quick build is displayed in SageMaker Canvas. For the specific column impact selected, it shows that there is strong correlation between the average number of days a device has been active for and the probability of the patient’s admission. Model accuracy is 82% with a F1 score of 0.609.

Figure 18: Displaying the results from the model quick build in SageMaker Canvas

When you’re happy with the results from the Quick build, repeat the process with a Standard build to make sure you have an ML model with higher accuracy that can be deployed.

Test the ML model

Our ML model has now been built. If you’re satisfied with its accuracy, you can make predictions using this ML model using net new data on the Predict tab. Predictions can be performed either using batch (list of patients) or for a single entry (one patient).

Experiment with different values and choose Update prediction. The ML model will respond with a prediction for the new values that you have entered.

In this example, the ML model has identified a 64.5% probability that this particular patient will be admitted to hospital in the next 30 days. The health-tech company will likely want to prioritize the care of this patient.

The screenshot shows how the results from a single prediction using the developed model is displayed in SageMaker Canvas. A prediction has been made for 88-year old patient. The model has returned that there is a 64.487% that they will be admitted into hospital.

Figure 19: Displaying the results from a single prediction using the model in SageMaker Canvas

Deploy the ML model

It is now possible for the health-tech company to build applications that can use this ML model to make predictions. ML models developed in SageMaker Canvas can be operationalized using a broader set of SageMaker services. For example:

To deploy the ML model, complete the following steps:

  1. On the Deploy tab, choose Create Deployment.
  2. Specify Deployment name, Instance type, and Instance count.
  3. Choose Deploy to make the ML model available as a SageMaker endpoint.

In this example, we reduced the instance type to ml.m5.4xlarge and instance count to 1 before deployment.

The screenshot shows how the developed model is deployed using SageMaker Canvas. The ml.m5.4xlarge instance type with an instance count of 1 has been selected.

Figure 20: Deploying the using SageMaker Canvas

At any time, you can directly test the endpoint from SageMaker Canvas on the Test deployment tab of the deployed endpoint listed under Operations on the SageMaker Canvas console.

Refer to the Amazon SageMaker Canvas Developer Guide for detailed steps to take your ML model development through its full development lifecycle and build applications that can consume the ML model to make predictions.

Clean up

Refer to the instructions in the README file to clean up the resources provisioned for the AWS data engineering pipeline solution.

SageMaker Canvas bills you for the duration of the session, and we recommend logging out of SageMaker Canvas when you are not using it. Refer to Logging out of Amazon SageMaker Canvas for more details. Furthermore, if you deployed a SageMaker endpoint, make sure you have deleted it.

Conclusion

This post explored a no-code approach involving SageMaker Canvas that can drive actionable insights from data stored across both Splunk and AWS platforms using AI/ML techniques. We also demonstrated how you can use the generative AI capabilities of SageMaker Canvas to speed up your data exploration and build ML models that are aligned to your business’s expectations.

Learn more about AI on Splunk and ML on AWS.


About the Authors

Alan Peaty

Alan Peaty is a Senior Partner Solutions Architect, helping Global Systems Integrators (GSIs), Global Independent Software Vendors (GISVs), and their customers adopt AWS services. Prior to joining AWS, Alan worked as an architect at systems integrators such as IBM, Capita, and CGI. Outside of work, Alan is a keen runner who loves to hit the muddy trails of the English countryside, and is an IoT enthusiast.

Brett Roberts

Brett Roberts is the Global Partner Technical Manager for AWS at Splunk, leading the technical strategy to help customers better secure and monitor their critical AWS environments and applications using Splunk. Brett was a member of the Splunk Trust and holds several Splunk and AWS certifications. Additionally, he co-hosts a community podcast and blog called Big Data Beard, exploring trends and technologies in the analytics and AI space.

Arnaud Lauer

Arnaud Lauer is a Principal Partner Solutions Architect in the Public Sector team at AWS. He enables partners and customers to understand how to best use AWS technologies to translate business needs into solutions. He brings more than 18 years of experience in delivering and architecting digital transformation projects across a range of industries, including public sector, energy, and consumer goods.

Read More

How Deltek uses Amazon Bedrock for question and answering on government solicitation documents

How Deltek uses Amazon Bedrock for question and answering on government solicitation documents

This post is co-written by Kevin Plexico and Shakun Vohra from Deltek.

Question and answering (Q&A) using documents is a commonly used application in various use cases like customer support chatbots, legal research assistants, and healthcare advisors. Retrieval Augmented Generation (RAG) has emerged as a leading method for using the power of large language models (LLMs) to interact with documents in natural language.

This post provides an overview of a custom solution developed by the AWS Generative AI Innovation Center (GenAIIC) for Deltek, a globally recognized standard for project-based businesses in both government contracting and professional services. Deltek serves over 30,000 clients with industry-specific software and information solutions.

In this collaboration, the AWS GenAIIC team created a RAG-based solution for Deltek to enable Q&A on single and multiple government solicitation documents. The solution uses AWS services including Amazon Textract, Amazon OpenSearch Service, and Amazon Bedrock. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) and LLMs from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Deltek is continuously working on enhancing this solution to better align it with their specific requirements, such as supporting file formats beyond PDF and implementing more cost-effective approaches for their data ingestion pipeline.

What is RAG?

RAG is a process that optimizes the output of LLMs by allowing them to reference authoritative knowledge bases outside of their training data sources before generating a response. This approach addresses some of the challenges associated with LLMs, such as presenting false, outdated, or generic information, or creating inaccurate responses due to terminology confusion. RAG enables LLMs to generate more relevant, accurate, and contextual responses by cross-referencing an organization’s internal knowledge base or specific domains, without the need to retrain the model. It provides organizations with greater control over the generated text output and offers users insights into how the LLM generates the response, making it a cost-effective approach to improve the capabilities of LLMs in various contexts.

The main challenge

Applying RAG for Q&A on a single document is straightforward, but applying the same across multiple related documents poses some unique challenges. For example, when using question answering on documents that evolve over time, it is essential to consider the chronological sequence of the documents if the question is about a concept that has transformed over time. Not considering the order could result in providing an answer that was accurate at a past point but is now outdated based on more recent information across the collection of temporally aligned documents. Properly handling temporal aspects is a key challenge when extending question answering from single documents to sets of interlinked documents that progress over the course of time.

Solution overview

As an example use case, we describe Q&A on two temporally related documents: a long draft request-for-proposal (RFP) document, and a related subsequent government response to a request-for-information (RFI response), providing additional and revised information.

The solution develops a RAG approach in two steps.

The first step is data ingestion, as shown in the following diagram. This includes a one-time processing of PDF documents. The application component here is a user interface with minor processing such as splitting text and calling the services in the background. The steps are as follows:

  1. The user uploads documents to the application.
  2. The application uses Amazon Textract to get the text and tables from the input documents.
  3. The text embedding model processes the text chunks and generates embedding vectors for each text chunk.
  4. The embedding representations of text chunks along with related metadata are indexed in OpenSearch Service.

The second step is Q&A, as shown in the following diagram. In this step, the user asks a question about the ingested documents and expects a response in natural language. The application component here is a user interface with minor processing such as calling different services in the background. The steps are as follows:

  1. The user asks a question about the documents.
  2. The application retrieves an embedding representation of the input question.
  3. The application passes the retrieved data from OpenSearch Service and the query to Amazon Bedrock to generate a response. The model performs a semantic search to find relevant text chunks from the documents (also called context). The embedding vector maps the question from text to a space of numeric representations.
  4. The question and context are combined and fed as a prompt to the LLM. The language model generates a natural language response to the user’s question.

We used Amazon Textract in our solution, which can convert PDFs, PNGs, JPEGs, and TIFFs into machine-readable text. It also formats complex structures like tables for easier analysis. In the following sections, we provide an example to demonstrate Amazon Textract’s capabilities.

OpenSearch is an open source and distributed search and analytics suite derived from Elasticsearch. It uses a vector database structure to efficiently store and query large volumes of data. OpenSearch Service currently has tens of thousands of active customers with hundreds of thousands of clusters under management processing hundreds of trillions of requests per month. We used OpenSearch Service and its underlying vector database to do the following:

  • Index documents into the vector space, allowing related items to be located in proximity for improved relevancy
  • Quickly retrieve related document chunks at the question answering step using approximate nearest neighbor search across vectors

The vector database inside OpenSearch Service enabled efficient storage and fast retrieval of related data chunks to power our question answering system. By modeling documents as vectors, we could find relevant passages even without explicit keyword matches.

Text embedding models are machine learning (ML) models that map words or phrases from text to dense vector representations. Text embeddings are commonly used in information retrieval systems like RAG for the following purposes:

  • Document embedding – Embedding models are used to encode the document content and map them to an embedding space. It is common to first split a document into smaller chunks such as paragraphs, sections, or fixed size chunks.
  • Query embedding – User queries are embedded into vectors so they can be matched against document chunks by performing semantic search.

For this post, we used the Amazon Titan model, Amazon Titan Embeddings G1 – Text v1.2, which intakes up to 8,000 tokens and outputs a numerical vector of 1,536 dimensions. The model is available through Amazon Bedrock.

Amazon Bedrock provides ready-to-use FMs from top AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon. It offers a single interface to access these models and build generative AI applications while maintaining privacy and security. We used Anthropic Claude v2 on Amazon Bedrock to generate natural language answers given a question and a context.

In the following sections, we look at the two stages of the solution in more detail.

Data ingestion

First, the draft RFP and RFI response documents are processed to be used at the Q&A time. Data ingestion includes the following steps:

  1. Documents are passed to Amazon Textract to be converted into text.
  2. To better enable our language model to answer questions about tables, we created a parser that converts tables from the Amazon Textract output into CSV format. Transforming tables into CSV improves the model’s comprehension. For instance, the following figures show part of an RFI response document in PDF format, followed by its corresponding extracted text. In the extracted text, the table has been converted to CSV format and sits among the rest of the text.
  3. For long documents, the extracted text may exceed the LLM’s input size limitation. In these cases, we can divide the text into smaller, overlapping chunks. The chunk sizes and overlap proportions may vary depending on the use case. We apply section-aware chunking, (perform chunking independently on each document section), which we discuss in our example use case later in this post.
  4. Some classes of documents may follow a standard layout or format. This structure can be used to optimize data ingestion. For example, RFP documents tend to have a certain layout with defined sections. Using the layout, each document section can be processed independently. Also, if a table of contents exists but is not relevant, it can potentially be removed. We provide a demonstration of detecting and using document structure later in this post.
  5. The embedding vector for each text chunk is retrieved from an embedding model.
  6. At the last step, the embedding vectors are indexed into an OpenSearch Service database. In addition to the embedding vector, the text chunk and document metadata such as document, document section name, or document release date are also added to the index as text fields. The document release date is useful metadata when documents are related chronologically, so that LLM can identify the most updated information. The following code snippet shows the index body:
index_body = {
    "embedding_vector": <embedding vector of a text chunk>,
    "text_chunk": <text chunk>,
    "document_name": <document name>,
    "section_name": <document section name>,
    "release_date": <document release date>,
    # more metadata can be added
}

Q&A

In the Q&A phrase, users can submit a natural language question about the draft RFP and RFI response documents ingested in the previous step. First, semantic search is used to retrieve relevant text chunks to the user’s question. Then, the question is augmented with the retrieved context to create a prompt. Finally, the prompt is sent to Amazon Bedrock for an LLM to generate a natural language response. The detailed steps are as follows:

  1. An embedding representation of the input question is retrieved from the Amazon Titan embedding model on Amazon Bedrock.
  2. The question’s embedding vector is used to perform semantic search on OpenSearch Service and find the top K relevant text chunks. The following is an example of a search body passed to OpenSearch Service. For more details see the OpenSearch documentation on structuring a search query.
search_body = {
    "size": top_K,
    "query": {
        "script_score": {
            "query": {
                "match_all": {}, # skip full text search
            },
            "script": {
                "lang": "knn",
                "source": "knn_score",
                "params": {
                    "field": "embedding-vector",
                    "query_value": question_embedding,
                    "space_type": "cosinesimil"
                }
            }
        }
    }
}

  1. Any retrieved metadata, such as section name or document release date, is used to enrich the text chunks and provide more information to the LLM, such as the following:
    def opensearch_result_to_context(os_res: dict) -> str:
        """
        Convert OpenSearch result to context
        Args:
        os_res (dict): Amazon OpenSearch results
        Returns:
        context (str): Context to be included in LLM's prompt
        """
        data = os_res["hits"]["hits"]
        context = []
        for item in data:
            text = item["_source"]["text_chunk"]
            doc_name = item["_source"]["document_name"]
            section_name = item["_source"]["section_name"]
            release_date = item["_source"]["release_date"]
            context.append(
                f"<<Context>>: [Document name: {doc_name}, Section name: {section_name}, Release date: {release_date}] {text}"
            )
        context = "n n ------ n n".join(context)
        return context

  2. The input question is combined with retrieved context to create a prompt. In some cases, depending on the complexity or specificity of the question, an additional chain-of-thought (CoT) prompt may need to be added to the initial prompt in order to provide further clarification and guidance to the LLM. The CoT prompt is designed to walk the LLM through the logical steps of reasoning and thinking that are required to properly understand the question and formulate a response. It lays out a type of internal monologue or cognitive path for the LLM to follow in order to comprehend the key information within the question, determine what kind of response is needed, and construct that response in an appropriate and accurate way. We use the following CoT prompt for this use case:
"""
Context below includes a few paragraphs from draft RFP and RFI response documents:

Context: {context}

Question: {question}

Think step by step:

1- Find all the paragraphs in the context that are relevant to the question.
2- Sort the paragraphs by release date.
3- Use the paragraphs to answer the question.

Note: Pay attention to the updated information based on the release dates.
"""
  1. The prompt is passed to an LLM on Amazon Bedrock to generate a response in natural language. We use the following inference configuration for the Anthropic Claude V2 model on Amazon Bedrock. The Temperature parameter is usually set to zero for reproducibility and also to prevent LLM hallucination. For regular RAG applications, top_k and top_p are usually set to 250 and 1, respectively. Set max_tokens_to_sample to maximum number of tokens expected to be generated (1 token is approximately 3/4 of a word). See Inference parameters for more details.
{
    "temperature": 0,
    "top_k": 250,
    "top_p": 1,
    "max_tokens_to_sample": 300,
    "stop_sequences": [“nnHuman:nn”]
}

Example use case

As a demonstration, we describe an example of Q&A on two related documents: a draft RFP document in PDF format with 167 pages, and an RFI response document in PDF format with 6 pages released later, which includes additional information and updates to the draft RFP.

The following is an example question asking if the project size requirements have changed, given the draft RFP and RFI response documents:

Have the original scoring evaluations changed? if yes, what are the new project sizes?

The following figure shows the relevant sections of the draft RFP document that contain the answers.

The following figure shows the relevant sections of the RFI response document that contain the answers.

For the LLM to generate the correct response, the retrieved context from OpenSearch Service should contain the tables shown in the preceding figures, and the LLM should be able to infer the order of the retrieved contents from metadata, such as release dates, and generate a readable response in natural language.

The following are the data ingestion steps:

  1. The draft RFP and RFI response documents are uploaded to Amazon Textract to extract text and tables as the content. Additionally, we used regular expression to identify document sections and table of contents (see the following figures, respectively). The table of contents can be removed for this use case because it doesn’t have any relevant information.

  2. We split each document section independently into smaller chunks with some overlaps. For this use case, we used a chunk size of 500 tokens with the overlap size of 100 tokens (1 token is approximately 3/4 a word). We used a BPE tokenizer, where each token corresponds to about 4 bytes.
  3. An embedding representation of each text chunk is obtained using the Amazon Titan Embeddings G1 – Text v1.2 model on Amazon Bedrock.
  4. Each text chunk is stored into an OpenSearch Service index along with metadata such as section name and document release date.

The Q&A steps are as follows:

  1. The input question is first transformed to a numeric vector using the embedding model. The vector representation used for semantic search and retrieval of relevant context in the next step.
  2. The top K relevant text chunk and metadata are retrieved from OpenSearch Service.
  3. The opensearch_result_to_context function and the prompt template (defined earlier) are used to create the prompt given the input question and retrieved context.
  4. The prompt is sent to the LLM on Amazon Bedrock to generate a response in natural language. The following is the response generated by Anthropic Claude v2, which matched with the information presented in the draft RFP and RFI response documents. The question was “Have the original scoring evaluations changed? If yes, what are the new project sizes?” Using CoT prompting, the model can correctly answer the question.

Key features

The solution contains the following key features:

  • Section-aware chunking – Identify document sections and split each section independently into smaller chunks with some overlaps to optimize data ingestion.
  • Table to CSV transformation – Convert tables extracted by Amazon Textract into CSV format to improve the language model’s ability to comprehend and answer questions about tables.
  • Adding metadata to index – Store metadata such as section name and document release date along with text chunks in the OpenSearch Service index. This allowed the language model to identify the most up-to-date or relevant information.
  • CoT prompt – Design a chain-of-thought prompt to provide further clarification and guidance to the language model on the logical steps needed to properly understand the question and formulate an accurate response.

These contributions helped improve the accuracy and capabilities of the solution for answering questions about documents. In fact, based on Deltek’s subject matter experts’ evaluations of LLM-generated responses, the solution achieved a 96% overall accuracy rate.

Conclusion

This post outlined an application of generative AI for question answering across multiple government solicitation documents. The solution discussed was a simplified presentation of a pipeline developed by the AWS GenAIIC team in collaboration with Deltek. We described an approach to enable Q&A on lengthy documents published separately over time. Using Amazon Bedrock and OpenSearch Service, this RAG architecture can scale for enterprise-level document volumes. Additionally, a prompt template was shared that uses CoT logic to guide the LLM in producing accurate responses to user questions. Although this solution is simplified, this post aimed to provide a high-level overview of a real-world generative AI solution for streamlining review of complex proposal documents and their iterations.

Deltek is actively refining and optimizing this solution to ensure it meets their unique needs. This includes expanding support for file formats other than PDF, as well as adopting more cost-efficient strategies for their data ingestion pipeline.

Learn more about prompt engineering and generative AI-powered Q&A in the Amazon Bedrock Workshop. For technical support or to contact AWS generative AI specialists, visit the GenAIIC webpage.

Resources

To learn more about Amazon Bedrock, see the following resources:

To learn more about OpenSearch Service, see the following resources:

See the following links for RAG resources on AWS:


About the Authors

Kevin Plexico is Senior Vice President of Information Solutions at Deltek, where he oversees research, analysis, and specification creation for clients in the Government Contracting and AEC industries. He leads the delivery of GovWin IQ, providing essential government market intelligence to over 5,000 clients, and manages the industry’s largest team of analysts in this sector. Kevin also heads Deltek’s Specification Solutions products, producing premier construction specification content including MasterSpec® for the AIA and SpecText.

Shakun Vohra is a distinguished technology leader with over 20 years of expertise in Software Engineering, AI/ML, Business Transformation, and Data Optimization. At Deltek, he has driven significant growth, leading diverse, high-performing teams across multiple continents. Shakun excels in aligning technology strategies with corporate goals, collaborating with executives to shape organizational direction. Renowned for his strategic vision and mentorship, he has consistently fostered the development of next-generation leaders and transformative technological solutions.

Amin Tajgardoon is an Applied Scientist at the AWS Generative AI Innovation Center. He has an extensive background in computer science and machine learning. In particular, Amin’s focus has been on deep learning and forecasting, prediction explanation methods, model drift detection, probabilistic generative models, and applications of AI in the healthcare domain.

Anila Joshi has more than a decade of experience building AI solutions. As an Applied Science Manager at AWS Generative AI Innovation Center, Anila pioneers innovative applications of AI that push the boundaries of possibility and accelerate the adoption of AWS services with customers by helping customers ideate, identify, and implement secure generative AI solutions.

Yash Shah and his team of scientists, specialists and engineers at AWS Generative AI Innovation Center, work with some of AWS most strategic customers on helping them realize art of the possible with Generative AI by driving business value. Yash has been with Amazon for more than 7.5 years now and has worked with customers across healthcare, sports, manufacturing and software across multiple geographic regions.

Jordan Cook is an accomplished AWS Sr. Account Manager with nearly two decades of experience in the technology industry, specializing in sales and data center strategy. Jordan leverages his extensive knowledge of Amazon Web Services and deep understanding of cloud computing to provide tailored solutions that enable businesses to optimize their cloud infrastructure, enhance operational efficiency, and drive innovation.

Read More

Cisco achieves 50% latency improvement using Amazon SageMaker Inference faster autoscaling feature

Cisco achieves 50% latency improvement using Amazon SageMaker Inference faster autoscaling feature

This post is co-authored with Travis Mehlinger and Karthik Raghunathan from Cisco.

Webex by Cisco is a leading provider of cloud-based collaboration solutions which includes video meetings, calling, messaging, events, polling, asynchronous video and customer experience solutions like contact center and purpose-built collaboration devices. Webex’s focus on delivering inclusive collaboration experiences fuels our innovation, which leverages AI and Machine Learning, to remove the barriers of geography, language, personality, and familiarity with technology. Its solutions are underpinned with security and privacy by design. Webex works with the world’s leading business and productivity apps – including AWS.

Cisco’s Webex AI (WxAI) team plays a crucial role in enhancing these products with AI-driven features and functionalities, leveraging LLMs to improve user productivity and experiences. In the past year, the team has increasingly focused on building artificial intelligence (AI) capabilities powered by large language models (LLMs) to improve productivity and experience for users. Notably, the team’s work extends to Webex Contact Center, a cloud-based omni-channel contact center solution that empowers organizations to deliver exceptional customer experiences. By integrating LLMs, WxAI team enables advanced capabilities such as intelligent virtual assistants, natural language processing, and sentiment analysis, allowing Webex Contact Center to provide more personalized and efficient customer support. However, as these LLM models grew to contain hundreds of gigabytes of data, WxAI team faced challenges in efficiently allocating resources and starting applications with the embedded models. To optimize its AI/ML infrastructure, Cisco migrated its LLMs to Amazon SageMaker Inference, improving speed, scalability, and price-performance.

This blog post highlights how Cisco implemented faster autoscaling release reference. For more details on Cisco’s Use Cases, Solution & Benefits see How Cisco accelerated the use of generative AI with Amazon SageMaker Inference.

In this post, we will discuss the following:

  1. Overview of Cisco’s use-case and architecture
  2. Introduce new faster autoscaling feature
    1. Single Model real-time endpoint
    2. Deployment using Amazon SageMaker InferenceComponents
  3. Share results on the performance improvements Cisco saw with faster autoscaling feature for GenAI inference
  4. Next Steps

Cisco’s Use-case: Enhancing Contact Center Experiences

Webex is applying generative AI to its contact center solutions, enabling more natural, human-like conversations between customers and agents. The AI can generate contextual, empathetic responses to customer inquiries, as well as automatically draft personalized emails and chat messages. This helps contact center agents work more efficiently while maintaining a high level of customer service.

Architecture

Initially, WxAI embedded LLM models directly into the application container images running on Amazon Elastic Kubernetes Service (Amazon EKS). However, as the models grew larger and more complex, this approach faced significant scalability and resource utilization challenges. Operating the resource-intensive LLMs through the applications required provisioning substantial compute resources, which slowed down processes like allocating resources and starting applications. This inefficiency hampered WxAI’s ability to rapidly develop, test, and deploy new AI-powered features for the Webex portfolio.

To address these challenges, WxAI team turned to SageMaker Inference – a fully managed AI inference service that allows seamless deployment and scaling of models independently from the applications that use them. By decoupling the LLM hosting from the Webex applications, WxAI could provision the necessary compute resources for the models without impacting the core collaboration and communication capabilities.

“The applications and the models work and scale fundamentally differently, with entirely different cost considerations, by separating them rather than lumping them together, it’s much simpler to solve issues independently.”

– Travis Mehlinger, Principal Engineer at Cisco. 

This architectural shift has enabled Webex to harness the power of generative AI across its suite of collaboration and customer engagement solutions.

Today Sagemaker endpoint uses autoscaling with invocation per instance. However, it takes ~6 minutes to detect need for autoscaling.

Introducing new Predefined metric types for faster autoscaling

Cisco Webex AI team wanted to improve their inference auto scaling times, so they worked with Amazon SageMaker to improve inference.

Amazon SageMaker’s real-time inference endpoint offers a scalable, managed solution for hosting Generative AI models. This versatile resource can accommodate multiple instances, serving one or more deployed models for instant predictions. Customers have the flexibility to deploy either a single model or multiple models using SageMaker InferenceComponents on the same endpoint. This approach allows for efficient handling of diverse workloads and cost-effective scaling.

To optimize real-time inference workloads, SageMaker employs application automatic scaling (auto scaling). This feature dynamically adjusts both the number of instances in use and the quantity of model copies deployed (when using inference components), responding to real-time changes in demand. When traffic to the endpoint surpasses a predefined threshold, auto scaling increases the available instances and deploys additional model copies to meet the heightened demand. Conversely, as workloads decrease, the system automatically removes unnecessary instances and model copies, effectively reducing costs. This adaptive scaling ensures that resources are optimally utilized, balancing performance needs with cost considerations in real-time.

Working with Cisco, Amazon SageMaker releases new sub-minute high-resolution pre-defined metric type SageMakerVariantConcurrentRequestsPerModelHighResolution for faster autoscaling and reduced detection time. This newer high-resolution metric has shown to reduce scaling detection times by up to 6x (compared to existing SageMakerVariantInvocationsPerInstance metric) and thereby improving overall end-to-end inference latency by up to 50%, on endpoints hosting Generative AI models like Llama3-8B.

With this new release, SageMaker real-time endpoints also now emits new ConcurrentRequestsPerModel and ConcurrentRequestsPerModelCopy CloudWatch metrics as well, which are more suited for monitoring and scaling Amazon SageMaker endpoints hosting LLMs and FMs.

Cisco’s Evaluation of faster autoscaling feature for GenAI inference

Cisco evaluated Amazon SageMaker’s new pre-defined metric types for faster autoscaling on their Generative AI workloads. They observed up to a 50% latency improvement in end-to-end inference latency by using the new SageMakerequestsPerModelHighResolution metric, compared to the existing SageMakerVariantInvocationsPerInstance  metric.

The setup involved using their Generative AI models, on SageMaker’s real-time inference endpoints. SageMaker’s autoscaling feature dynamically adjusted both the number of instances and the quantity of model copies deployed to meet real-time changes in demand. The new high-resolution SageMakerVariantConcurrentRequestsPerModelHighResolution metric reduced scaling detection times by up to 6x, enabling faster autoscaling and lower latency.

In addition, SageMaker now emits new CloudWatch metrics, including ConcurrentRequestsPerModel and ConcurrentRequestsPerModelCopy, which are better suited for monitoring and scaling endpoints hosting large language models (LLMs) and foundation models (FMs). This enhanced autoscaling capability has been a game-changer for Cisco, helping to improve the performance and efficiency of their critical Generative AI applications.

We are really pleased with the performance improvements we’ve seen from Amazon SageMaker’s new autoscaling metrics. The higher-resolution scaling metrics have significantly reduced latency during initial load and scale-out on our Gen AI workloads. We’re excited to do a broader rollout of this feature across our infrastructure

– Travis Mehlinger, Principal Engineer at Cisco.

Cisco further plans to work with SageMaker inference to drive improvements in rest of the variables that impact autoscaling latencies. Like model download and load times.

Conclusion

Cisco’s Webex AI team is continuing to leverage Amazon SageMaker Inference to power generative AI experiences across its Webex portfolio. Evaluation with faster autoscaling from SageMaker has shown Cisco up to 50% latency improvements in its GenAI inference endpoints. As WxAI team continues to push the boundaries of AI-driven collaboration, its partnership with Amazon SageMaker will be crucial in informing upcoming improvements and advanced GenAI inference capabilities. With this new feature Cisco looks forward to further optimizing its AI Inference performance by rolling it broadly in multiple regions and delivering even more impactful generative AI features to its customers.


About the Authors

Travis Mehlinger is a Principal Software Engineer in the Webex Collaboration AI group, where he helps teams develop and operate cloud-native AI and ML capabilities to support Webex AI features for customers around the world.In his spare time, Travis enjoys cooking barbecue, playing video games, and traveling around the US and UK to race go karts.

Karthik Raghunathan is the Senior Director for Speech, Language, and Video AI in the Webex Collaboration AI Group. He leads a multidisciplinary team of software engineers, machine learning engineers, data scientists, computational linguists, and designers who develop advanced AI-driven features for the Webex collaboration portfolio. Prior to Cisco, Karthik held research positions at MindMeld (acquired by Cisco), Microsoft, and Stanford University.

Praveen Chamarthi is a Senior AI/ML Specialist with Amazon Web Services. He is passionate about AI/ML and all things AWS. He helps customers across the Americas to scale, innovate, and operate ML workloads efficiently on AWS. In his spare time, Praveen loves to read and enjoys sci-fi movies.

Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing AI. He focuses on core challenges related to deploying complex AI applications, multi-tenant models, cost optimizations, and making deployment of Generative AI models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch and spending time with his family.

Ravi Thakur is a Sr Solutions Architect Supporting Strategic Industries at AWS, and is based out of Charlotte, NC. His career spans diverse industry verticals, including banking, automotive, telecommunications, insurance, and energy. Ravi’s expertise shines through his dedication to solving intricate business challenges on behalf of customers, utilizing distributed, cloud-native, and well-architected design patterns. His proficiency extends to microservices, containerization, AI/ML, Generative AI, and more. Today, Ravi empowers AWS Strategic Customers on personalized digital transformation journeys, leveraging his proven ability to deliver concrete, bottom-line benefits.

Read More

How Cisco accelerated the use of generative AI with Amazon SageMaker Inference

How Cisco accelerated the use of generative AI with Amazon SageMaker Inference

This post is co-authored with Travis Mehlinger and Karthik Raghunathan from Cisco.

Webex by Cisco is a leading provider of cloud-based collaboration solutions, including video meetings, calling, messaging, events, polling, asynchronous video, and customer experience solutions like contact center and purpose-built collaboration devices. Webex’s focus on delivering inclusive collaboration experiences fuels their innovation, which uses artificial intelligence (AI) and machine learning (ML), to remove the barriers of geography, language, personality, and familiarity with technology. Its solutions are underpinned with security and privacy by design. Webex works with the world’s leading business and productivity apps—including AWS.

Cisco’s Webex AI (WxAI) team plays a crucial role in enhancing these products with AI-driven features and functionalities, using large language models (LLMs) to improve user productivity and experiences. In the past year, the team has increasingly focused on building AI capabilities powered by LLMs to improve productivity and experience for users. Notably, the team’s work extends to Webex Contact Center, a cloud-based omni-channel contact center solution that empowers organizations to deliver exceptional customer experiences. By integrating LLMs, the WxAI team enables advanced capabilities such as intelligent virtual assistants, natural language processing (NLP), and sentiment analysis, allowing Webex Contact Center to provide more personalized and efficient customer support. However, as these LLM models grew to contain hundreds of gigabytes of data, the WxAI team faced challenges in efficiently allocating resources and starting applications with the embedded models. To optimize its AI/ML infrastructure, Cisco migrated its LLMs to Amazon SageMaker Inference, improving speed, scalability, and price-performance.

This post highlights how Cisco implemented new functionalities and migrated existing workloads to Amazon SageMaker inference components for their industry-specific contact center use cases. By integrating generative AI, they can now analyze call transcripts to better understand customer pain points and improve agent productivity. Cisco has also implemented conversational AI experiences, including chatbots and virtual agents that can generate human-like responses, to automate personalized communications based on customer context. Additionally, they are using generative AI to extract key call drivers, optimize agent workflows, and gain deeper insights into customer sentiment. Cisco’s adoption of SageMaker Inference has enabled them to streamline their contact center operations and provide more satisfying, personalized interactions that address customer needs.

In this post, we discuss the following:

  • Cisco’s business use cases and outcomes
  • How Cisco accelerated the use of generative AI powered by LLMs for their contact center use cases with the help of SageMaker Inference
  • Cisco’s generative AI inference architecture, which is built as a robust and secure foundation, using various services and features such as SageMaker Inference, Amazon Bedrock, Kubernetes, Prometheus, Grafana, and more
  • How Cisco uses an LLM router and auto scaling to route requests to appropriate LLMs for different tasks while simultaneously scaling their models for resiliency and performance efficiency.
  • How the solutions in this post impacted Cisco’s business roadmap and strategic partnership with AWS
  • How Cisco helped SageMaker Inference build new capabilities to deploy generative AI applications at scale

Enhancing collaboration and customer engagement with generative AI: Webex’s AI-powered solutions

In this section, we discuss Cisco’s AI-powered use cases.

Meeting summaries and insights

For Webex Meetings, the platform uses generative AI to automatically summarize meeting recordings and transcripts. This extracts the key takeaways and action items, helping distributed teams stay informed even if they missed a live session. The AI-generated summaries provide a concise overview of important discussions and decisions, allowing employees to quickly get up to speed. Beyond summaries, Webex’s generative AI capabilities also surface intelligent insights from meeting content. This includes identifying action items, highlighting critical decisions, and generating personalized meeting notes and to-do lists for each participant. These insights help make meetings more productive and hold attendees accountable.

Enhancing contact center experiences

Webex is also applying generative AI to its contact center solutions, enabling more natural, human-like conversations between customers and agents. The AI can generate contextual, empathetic responses to customer inquiries, as well as automatically draft personalized emails and chat messages. This helps contact center agents work more efficiently while maintaining a high level of customer service.

Webex customers realize positive outcomes with generative AI

Webex’s adoption of generative AI is driving tangible benefits for customers. Clients using the platform’s AI-powered meeting summaries and insights have reported productivity gains. Webex customers using the platform’s generative AI for contact centers have handled hundreds of thousands of calls with improved customer satisfaction and reduced handle times, enabling more natural, empathetic conversations between agents and clients. Webex’s strategic integration of generative AI is empowering users to work smarter and deliver exceptional experiences.

For more details on how Webex is harnessing generative AI to enhance collaboration and customer engagement, see Webex | Exceptional Experiences for Every Interaction on the Webex blog.

Using SageMaker Inference to optimize resources for Cisco

Cisco’s WxAI team is dedicated to delivering advanced collaboration experiences powered by cutting-edge ML. The team develops a comprehensive suite of AI and ML features for the Webex ecosystem, including audio intelligence capabilities like noise removal and optimizing speaker voices, language intelligence for transcription and translation, and video intelligence features like virtual backgrounds. At the forefront of WxAI’s innovations is the AI-powered Webex Assistant, a virtual assistant that provides voice-activated control and seamless meeting support in multiple languages. To build these sophisticated capabilities, WxAI uses LLMs, which can contain up to hundreds of gigabytes of training data.

Initially, WxAI embedded LLM models directly into the application container images running on Amazon Elastic Kubernetes Service (Amazon EKS). However, as the models grew larger and more complex, this approach faced significant scalability and resource utilization challenges. Operating the resource-intensive LLMs through the applications required provisioning substantial compute resources, which slowed down processes like allocating resources and starting applications. This inefficiency hampered WxAI’s ability to rapidly develop, test, and deploy new AI-powered features for the Webex portfolio. To address these challenges, the WxAI team turned to SageMaker Inference—a fully managed AI inference service that allows seamless deployment and scaling of models independently from the applications that use them. By decoupling the LLM hosting from the Webex applications, WxAI could provision the necessary compute resources for the models without impacting the core collaboration and communication capabilities.

 “The applications and the models work and scale fundamentally differently, with entirely different cost considerations; by separating them rather than lumping them together, it’s much simpler to solve issues independently.”

– Travis Mehlinger, Principal Engineer at Cisco.

This architectural shift has enabled Webex to harness the power of generative AI across its suite of collaboration and customer engagement solutions.

Solution overview: Improving efficiency and reducing costs by migrating to SageMaker Inference

To address the scalability and resource utilization challenges faced with embedding LLMs directly into their applications, the WxAI team migrated to SageMaker Inference. By taking advantage of this fully managed service for deploying LLMs, Cisco unlocked significant performance and cost-optimization opportunities. Key benefits include the ability to deploy multiple LLMs behind a single endpoint for faster scaling and improved response latencies, as well as cost savings. Additionally, the WxAI team implemented an LLM proxy to simplify access to LLMs for Webex teams, enable centralized data collection, and reduce operational overhead. With SageMaker Inference, Cisco can efficiently manage and scale their LLM deployments, harnessing the power of generative AI across the Webex portfolio while maintaining optimal performance, scalability, and cost-effectiveness.

The following diagram illustrates the WxAI architecture on AWS.

The architecture is built on a robust and secure AWS foundation:

  • The architecture uses AWS services like Application Load Balancer, AWS WAF, and EKS clusters for seamless ingress, threat mitigation, and containerized workload management.
  • The LLM proxy (a microservice deployed on an EKS pod as part of the Service VPC) simplifies the integration of LLMs for Webex teams, providing a streamlined interface and reducing operational overhead. The LLM proxy supports LLM deployments on SageMaker Inference, Amazon Bedrock, or other LLM providers for Webex teams.
  • The architecture uses SageMaker Inference for optimized model deployment, auto scaling, and routing mechanisms.
  • The system integrates Loki for logging, Amazon Managed Service for Prometheus for metrics, and Grafana for unified visualization, seamlessly integrated with Cisco SSO.
  • The Data VPC houses the data layer components, including Amazon ElastiCache for caching and Amazon Relational Database Service (Amazon RDS) for database services, providing efficient data access and management.

Use case overview: Contact center topic analytics

A key focus area for the WxAI team is to enhance the capabilities of the Webex Contact Center platform. A typical Webex Contact Center installation has hundreds of agents handling many interactions through various channels like phone calls and digital channels. Webex’s AI-powered Topic Analytics feature extracts the key reasons customers are calling about by analyzing aggregated historical interactions and clustering them into meaningful topic categories, as shown in the following screenshot. The contact center administrator can then use these insights to optimize operations, enhance agent performance, and ultimately deliver a more satisfactory customer experience.

The Topic Analytics feature is powered by a pipeline of three models: a call driver extraction model, a topic clustering model, and a topic labeling model, as illustrated in the following diagram.

The model details are as follows:

  • Call driver extraction – This generative model summarizes the primary reason or intent (referred to as the call driver) behind a customer’s call. Accurate automatic tagging of calls with call drivers helps contact center supervisors and administrators quickly understand the primary reason for any historical call. One of the key considerations when solving this problem was selecting the right model to balance quality and operational costs. The WxAI team chose the FLAN T5 model on SageMaker Inference and instruction fine-tuned it for extracting call drivers from call transcripts. FLAN-T5 is a powerful text-to-text transfer transformer model that performs various natural language understanding and generation tasks. This workload had a global footprint deployed in us-east-2, eu-west-2, eu-central-1, ap-southeast-1, ap-southeast-2, ap-northeast-1, and ca-central-1 AWS
  • Topic clustering – Although automatically tagging every contact center interaction with its call driver is a useful feature in itself, analyzing these call drivers in an aggregated fashion over a large batch of calls can uncover even more interesting trends and insights. The topic clustering model achieves this by clustering all the individually extracted call drivers from a large batch of calls into different topic clusters. It does this by creating a semantic embedding for each call driver and employing an unsupervised hierarchical clustering technique that operates on the vector embeddings. This results in distinct and coherent topic clusters where semantically similar call drivers are grouped together.
  • Topic labeling – The topic labeling model is a generative model that creates a descriptive name to serve as the label for each topic cluster. Several LLMs were prompt-tuned and evaluated in a few-shot setting to choose the ideal model for the label generation task. Finally, Llama2-13b-chat, with its ability to better capture contextual nuances and semantics of natural language conversation, was used for its accuracy, performance, and cost-effectiveness. Additionally, Llama2-13b-chat was deployed and used on SageMaker inference components, while maintaining relatively low operating costs compared to other LLMs, by using specific hardware like g4dn and g5

This solution also used the auto scaling capabilities of SageMaker to dynamically adjust the number of instances based on a desired minimum of 1 endpoint and maximum of 30. This approach provides efficient resource utilization while maintaining high throughput, allowing the WxAI platform to handle batch jobs overnight and scale to hundreds of inferences per minute during peak hours. By deploying the model on SageMaker Inference with auto scaling, WxAI team was able to deliver reliable and accurate responses to customer interactions for their Topic Analytics use case.

By accurately pinpointing the call driver, the system can suggest appropriate actions, resources, and next steps to the agent, streamlining the customer support process, further leading to personalized and accurate responses to customer questions.

To handle fluctuating demand and optimize resource utilization, the WxAI team implemented auto scaling for their SageMaker Inference endpoints. They configured the endpoints to scale from a minimum to a maximum instance count based on GPU utilization. Additionally, the LLM proxy routed requests between the different LLMs deployed on SageMaker Inference. This proxy abstracts the complexities of communicating with various LLM providers and enables centralized data collection and analysis. This led to enhanced generative AI workflows, optimized latency, and personalized use case implementations.

Benefits

Through the strategic adoption of AWS AI services, Cisco’s WxAI team has realized significant benefits, enabling them to build cutting-edge, AI-powered collaboration capabilities more rapidly and cost-effectively:

  • Improved development and deployment cycle time – By decoupling models from applications, the team has streamlined processes like bug fixes, integration testing, and feature rollouts across environments, accelerating their overall development velocity.
  • Simplified engineering and delivery – The clear separation of concerns between the lean application layer and resource-intensive model layer has simplified engineering efforts and delivery, allowing the team to focus on innovation rather than infrastructure complexities.
  • Reduced costs – By using fully managed services like SageMaker Inference, the team has offloaded infrastructure management overhead. Additionally, capabilities like asynchronous inference and multi-model endpoints have enabled significant cost optimization without compromising performance or availability.
  • Scalability and performance – Services like SageMaker Inference and Amazon Bedrock, combined with technologies like NVIDIA Triton Inference Server on SageMaker, have empowered the WxAI team to scale their AI/ML workloads reliably and deliver high-performance inference for demanding use cases.
  • Accelerated innovation – The partnership with AWS has given the WxAI team access to cutting-edge AI services and expertise, enabling them to rapidly prototype and deploy innovative capabilities like the AI-powered Webex Assistant and advanced contact center AI features.

Cisco’s contributions to SageMaker Inference: Enhancing generative AI inference capabilities

Building upon the success of their strategic migration to SageMaker Inference, Cisco has been instrumental in partnering with the SageMaker Inference team to build and enhance key generative AI capabilities within the SageMaker platform. Since the early days of generative AI, Cisco has provided the SageMaker Inference team with valuable inputs and expertise, enabling the introduction of several new features and optimizations:

  • Cost and performance optimizations for generative AI inference – Cisco helped the SageMaker Inference team develop innovative techniques to optimize the use of accelerators, enabling SageMaker Inference to reduce foundation model (ML) deployment costs by 50% on average and latency by 20% on average with inference components. This breakthrough delivers significant cost savings and performance improvements for customers running generative AI workloads on SageMaker.
  • Scaling improvements for generative AI inference – Cisco’s expertise in distributed systems and auto scaling has also helped the SageMaker team develop advanced capabilities to better handle the scaling requirements of generative AI models. These improvements reduce auto scaling times by up to 40% and auto scaling detection by 6 times, so customers can rapidly scale their generative AI workloads on SageMaker to meet spikes in demand without compromising performance.
  • Streamlined generative AI model deployment for inference – Recognizing the need for simplified generative AI model deployment, Cisco collaborated with AWS to introduce the ability to deploy open source LLMs and FMs with just a few clicks. This user-friendly functionality removes the complexity traditionally associated with deploying these advanced models, empowering more customers to harness the power of generative AI.
  • Simplified inference deployment for Kubernetes customers – Cisco’s deep expertise in Kubernetes and container technologies helped the SageMaker team develop new Kubernetes Operator-based inference capabilities. These innovations make it straightforward for customers running applications on Kubernetes to deploy and manage generative AI models, reducing LLM deployment costs by 50% on average.
  • Using NVIDIA Triton Inference Server for generative AI – Cisco worked with AWS to integrate the NVIDIA Triton Inference Server, a high-performance model serving container managed by SageMaker, to power generative AI inference on SageMaker Inference. This enabled the WxAI team to scale their AI/ML workloads reliably and deliver high-performance inference for demanding generative AI use cases.
  • Packaging generative AI models more efficiently – To further simplify the generative AI model lifecycle, Cisco worked with AWS to enhance the capabilities in SageMaker for packaging LLMs and FMs for deployment. These improvements make it straightforward to prepare and deploy these generative AI models, accelerating their adoption and integration.
  • Improved documentation for generative AI – Recognizing the importance of comprehensive documentation to support the growing generative AI ecosystem, Cisco collaborated with the AWS team to enhance the SageMaker documentation. This includes detailed guides, best practices, and reference materials tailored specifically for generative AI use cases, helping customers quickly ramp up their generative AI initiatives on the SageMaker platform.

By closely partnering with the SageMaker Inference team, Cisco has played a pivotal role in driving the rapid evolution of generative AI Inference capabilities in SageMaker. The features and optimizations introduced through this collaboration are empowering AWS customers to unlock the transformative potential of generative AI with greater ease, cost-effectiveness, and performance.

“Our partnership with the SageMaker Inference product team goes back to the early days of generative AI, and we believe the features we have built in collaboration, from cost optimizations to high-performance model deployment, will broadly help other enterprises rapidly adopt and scale generative AI workloads on SageMaker, unlocking new frontiers of innovation and business transformation.”

– Travis Mehlinger, Principal Engineer at Cisco.

Conclusion

By using AWS services like SageMaker Inference and Amazon Bedrock for generative AI, Cisco’s WxAI team has been able to optimize their AI/ML infrastructure, enabling them to build and deploy AI-powered features more efficiently, reliably, and cost-effectively. This strategic approach has unlocked significant benefits for Cisco in deploying and scaling its generative AI capabilities for the Webex platform. Cisco’s own journey with generative AI, as showcased in this post, offers valuable lessons and insights for other uses of SageMaker Inference.

Recognizing the impact of generative AI, Cisco has played a crucial role in shaping the future of these capabilities within SageMaker Inference. By providing valuable insights and hands-on collaboration, Cisco has helped AWS develop a range of powerful features that are making generative AI more accessible and scalable for organizations. From optimizing infrastructure costs and performance to streamlining model deployment and scaling, Cisco’s contributions have been instrumental in enhancing the SageMaker Inference service.

Moving forward, the Cisco-AWS partnership aims to drive further advancements in areas like conversational and generative AI inference. As generative AI adoption accelerates across industries, Cisco’s Webex platform is designed to scale and streamline user experiences through various use cases discussed in this post and beyond. You can expect to see ongoing innovation from this collaboration in SageMaker Inference capabilities, as Cisco and SageMaker Inference continue to push the boundaries of what’s possible in the world of AI.

For more information on Webex Contact Center’s Topic Analytics feature and related AI capabilities, refer to The Webex Advantage: Navigating Customer Experience in the Age of AI on the Webex blog.


About the Authors

Travis Mehlinger is a Principal Software Engineer in the Webex Collaboration AI group, where he helps teams develop and operate cloud-centered AI and ML capabilities to support Webex AI features for customers around the world. In his spare time, Travis enjoys cooking barbecue, playing video games, and traveling around the US and UK to race go-karts.

Karthik Raghunathan is the Senior Director for Speech, Language, and Video AI in the Webex Collaboration AI Group. He leads a multidisciplinary team of software engineers, machine learning engineers, data scientists, computational linguists, and designers who develop advanced AI-driven features for the Webex collaboration portfolio. Prior to Cisco, Karthik held research positions at MindMeld (acquired by Cisco), Microsoft, and Stanford University.

Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch and spending time with his family.

Ravi Thakur is a Senior Solutions Architect at AWS, based in Charlotte, NC. He specializes in solving complex business challenges using distributed, cloud-centered, and well-architected patterns. Ravi’s expertise includes microservices, containerization, AI/ML, and generative AI. He empowers AWS strategic customers on digital transformation journeys, delivering bottom-line benefits. In his spare time, Ravi enjoys motorcycle rides, family time, reading, movies, and traveling.

Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington D.C.

Madhur Prashant is an AI and ML Solutions Architect at Amazon Web Services. He is passionate about the intersection of human thinking and generative AI. His interests lie in generative AI, specifically building solutions that are helpful and harmless, and most of all optimal for customers. Outside of work, he loves doing yoga, hiking, spending time with his twin, and playing the guitar.

Read More

Discover insights from Box with the Amazon Q Box connector

Discover insights from Box with the Amazon Q Box connector

Seamless access to content and insights is crucial for delivering exceptional customer experiences and driving successful business outcomes. Box, a leading cloud content management platform, serves as a central repository for diverse digital assets and documents in many organizations. An enterprise Box account typically contains a wealth of materials, including documents, presentations, knowledge articles, and more. However, extracting meaningful information from the vast amount of Box data can be challenging without the right tools and capabilities. Employees in roles such as customer support, project management, and product management require the ability to effortlessly query Box content, uncover relevant insights, and make informed decisions that address customer needs effectively.

Building a generative artificial intelligence (AI)-powered conversational application that is seamlessly integrated with your enterprise’s relevant data sources requires time, money, and people. First, you need to develop connectors to those data sources. Next, you need to index this data to make it available for a Retrieval Augmented Generation (RAG) approach where relevant passages are delivered with high accuracy to a large language model (LLM). To do this, you need to select an index that provides the capabilities to index the content for semantic and vector search, build the infrastructure to retrieve and rank the answers, and build a feature-rich web application. You also need to hire and staff a large team to build, maintain, and manage such a system.

Amazon Q Business is a fully managed generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Amazon Q Business can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take action using the data and expertise found in your company’s information repositories, code, and enterprise systems (such as Box, among others). Amazon Q provides out-of-the-box native data source connectors that can index content into a built-in retriever and uses an LLM to provide accurate, well-written answers. A data source connector is a component of Amazon Q that helps integrate and synchronize data from multiple repositories into one index.

Amazon Q Business offers multiple prebuilt connectors to a large number of data sources, including Box Content Cloud, Atlassian Confluence, Amazon Simple Storage Service (Amazon S3), Microsoft SharePoint, Salesforce, and many more, and helps you create your generative AI solution with minimal configuration. For a full list of Amazon Q Business supported data source connectors, see Amazon Q Business connectors.

In this post, we guide you through the process of configuring and integrating Amazon Q for Business with your Box Content Cloud. This will enable your support, project management, product management, leadership, and other teams to quickly obtain accurate answers to their questions from the documents stored in your Box account.

Find accurate answers from Box documents using Amazon Q Business

After you integrate Amazon Q Business with Box, you can ask questions based on the documents stored in your Box account. For example:

  • Natural language search – You can search for information within documents located in any folder by using conversational language, simplifying the process of finding desired data without the need to remember specific keywords or filters.
  • Summarization – You can ask Amazon Q Business to summarize contents of documents to meet your needs. This enables you to quickly understand the main points and find relevant information in your documents without having to scan through individual document descriptions manually.

Overview of the Box connector for Amazon Q Business

To crawl and index contents in Box, you can configure the Amazon Q Business Box connector as a data source in your Amazon Q Business application. When you connect Amazon Q Business to a data source and initiate the sync process, Amazon Q Business crawls and indexes documents from the data source into its index.

Types of documents

Let’s look at what are considered as documents in the context of the Amazon Q business Box connector. A document is a collection of information that consists of a title, the content (or the body), metadata (data about the document), and access control list (ACL) information to make sure answers are provided from documents that the user has access to.

The Amazon Q Business Box connector supports crawling of the following entities in Box:

  • Files – Each file is considered a single document
  • Comments – Each comment is considered a single document
  • Tasks – Each task is considered a single document
  • Web links – Each web link is considered a single document

Additionally, Box users can create custom objects and custom metadata fields. Amazon Q supports the crawling and indexing of these custom objects and custom metadata.

The Amazon Q Business Box connector also supports the indexing of a rich set of metadata from the various entities in Box. It further provides the ability to map these source metadata fields to Amazon Q index fields for indexing this metadata. These field mappings allow you to map Box field names to Amazon Q index field names. There are two types of metadata fields that Amazon Q connectors support:

  • Reserved or default fields – These are required with each document, such as the title, creation date, or author
  • Custom metadata fields – These are fields created in the data source in addition to what the data source already provides

Refer to Box data source connector field mappings for more information.

Authentication

Before you index the content from Box, you need to first establish a secure connection between the Amazon Q Business connector for Box with your Box cloud instance. To establish a secure connection, you need to authenticate with the data source. Let’s look at the supported authentication mechanisms for the Box connector.

The Amazon Q Box connector supports tokens with JWT authentication by Box as the authentication method. This authentication approach requires the configuration of several parameters, including the Box client ID, client secret, public key ID, private key, and passphrase. By implementing this token-based JWT authentication, the Amazon Q Business assistant can securely connect to and interact with data stored within the Box platform on behalf of your organization.

Refer to JWT Auth in the Box Developer documentation for more information on setting up and managing JWT tokens in Box.

Supported box subscriptions

To integrate Amazon Q Business with Box using the Box connector, access to Box Enterprise or Box Enterprise Plus plans is required. Both plans provide the necessary capabilities to create a custom application, download a JWT token as an administrator, and then configure the connector to ingest relevant data from Box.

Secure querying with ACL crawling, identity crawling, and User Store

The success of Amazon Q Business applications hinges on two key factors: making sure end-users only see responses generated from documents they have access to, and maintaining the privacy and security of each user’s conversation history. Amazon Q Business achieves this by validating the user’s identity every time they access the application, and using this to restrict tasks and answers to the user’s authorized documents. This is accomplished through the integration of AWS IAM Identity Center, which serves as the authoritative identity source and validates users. You can configure IAM Identity Center to use your enterprise identity provider (IdP)—such as Okta or Microsoft Entra ID—as the identity source.

ACLs and identity crawling are enabled by default and can’t be disabled. The Box connector automatically retrieves user identities and ACLs from the connected data sources. This allows Amazon Q Business to filter chat responses based on the end-user’s document access level, so they only see the information they are authorized to view. If you need to index documents without ACLs, you must explicitly mark them as public in your data source. For more information on how the Amazon Q Business connector crawls Box ACLs, refer to How Amazon Q Business connector crawls Box ACLs.

In the Box platform, an administrative user can provision additional user accounts and assign varying permission levels, such as viewer, editor, or co-owner, to files or folders. Fine-grained access is further enhanced through the Amazon Q User Store, which is an Amazon Q data source connector feature that streamlines user and group management across all the data sources attached to your application. This granular permission mapping enables Amazon Q Business to efficiently enforce access controls based on the user’s identity and permissions within the Box environment. For more information on the Amazon Q Business User store, refer to Understanding Amazon Q Business User Store.

Solution overview

In this post, we walk through the steps to configure a Box connector for an Amazon Q Business application. We use an existing Amazon Q application and configure the Box connector to sync data from specific Box folders, map relevant Box fields to the Amazon Q index, initiate the data sync, and then query the ingested Box data using the Amazon Q web experience.

As part of querying the Amazon Q Business application, we cover how to ask natural language questions on documents present in your Box folders and get back relevant results and insights using Amazon Q Business.

Prerequisites

For this walkthrough, you need the following:

Create users in IAM Identity Center

For this post, you need to create three sample users in IAM Identity Center. One user will act as the admin user; the other two will serve as department-specific users. This is to simulate the configuration of user-level access control on distinct folders within your Box account. Make sure to use the same email addresses when creating the users in your Box account.

Complete the following steps to create the users in IAM Identity Center:

  1. On the IAM Identity Center console, choose Users in the navigation pane.
  2. Choose Add user.
  3. For Username, enter a user name. For example, john_doe.
  4. For Password, select Send an email to this user with password setup instructions.
  5. For Email address and Confirm email address, enter your email address.
  6. For First name and Last name, enter John and Doe, respectively. You can also provide your preferred first and last names if necessary.
  7. Keep all other fields as default and choose Next.

  1. On the Add user to groups page, keep everything as default and choose Next.
  2. Verify the details on the Review and add user page, then choose Add user.

The user will get an email containing a link to join IAM Identity Center.

  1. Choose Accept Invitation and set up a password for your user. Remember to note it down for testing the Amazon Q Business application later.
  2. If required by your organization, complete the multi-factor authentication (MFA) setup for this user to enhance security during sign-in.
  3. Confirm that you can log in as the first user using the credentials you created in the previous step.
  4. Repeat the previous steps to create your second department-specific user. Use a different email address for this user. For example, set Username as mary_major, First name as Mary, and Last name as Major. Alternatively, you can use your own values if preferred.
  5. Verify that you can log in as the second user using the credentials you created in the previous step.
  6. Repeat the previous steps to create the third user, who will serve as the admin. Use your Box admin user’s email address for this account, and choose your preferred user name, first name, and last name. For this example, saanvi_sarkar will act as the admin user.
  7. Confirm that you can log in as the admin user using the credentials you created in the previous step.

This concludes the setup of all three users in the IAM Identity Center, each with unique email addresses.

Create two users in your Box account

For this example, you need two demo users in your Box account in addition to the admin user. Complete the following steps to create these two demo users, using the same email addresses you used when setting up these users in IAM Identity Center:

  1. Log in to your Box Enterprise Admin Console as an admin user.
  2. Choose Users & Groups in the navigation pane.

On the Managed Users tab, the admin user is listed by default.

  1. To create your first department-specific user, choose Add Users, then choose Add Users Manually.

  1. Enter the same name and email address that you used while creating this first department-specific user in IAM Identity Center. For example, use John Doe for Name and his email address for Email. You don’t need to specify groups or folders.
  2. Select the acknowledgement check box to agree to the payment method for adding this new user to your Box account.
  3. Choose Next.

  1. On the Add Users page, choose Complete to agree and add this new user to your Box account.
  2. To create your second department-specific user, choose Add Users, then choose Add Users Manually.
  3. Enter the same name and email address that you used while creating this second department-specific user in IAM Identity Center. For example, use Mary Major for Name and her email address for Email. You don’t need to specify groups or folders.

You now have all three users provisioned in your Box account.

Create a custom Box application for Amazon Q

Before you configure the Box data source connector in Amazon Q Business, you create a custom Box application in your Box account.

Complete the following steps to create an application and configure its authentication method:

  1. Log in to your Box Enterprise Developer Console as an admin user.
  2. Choose My Apps in the navigation pane.
  3. Choose Create New App.
  4. Choose Custom App.

  1. For App name, enter a name for your app. For example, AmazonQConnector.
  2. For Purpose, choose Other.
  3. For Please specify, enter Other.
  4. Leave the other options blank and choose Next.

  1. For Authentication Method, select Server Authentication (with JWT).
  2. Choose Create App.

  1. In My Apps, choose your created app and go to the Configuration
  2. In the App Access Level section, choose App + Enterprise Access.

  1. In the Application Scopes section, select the following permissions:
    1. Write all files and folders stored in Box
    2. Manage users
    3. Manage groups
    4. Manage enterprise properties

  1. In the Advanced Features section, select Make API calls using the as-user header.
  2. In the Add and Manage Public Keys section, choose Generate a Public/Private Keypair.

  1. Complete the two-step verification process and choose OK to download the JSON file to your computer.

  1. Choose Save Changes.
  2. On the Authorization tab, choose Review and Submit.

  1. In the Review App Authorization Submission pop-up, for App description, enter AmazonQConnector and choose Submit.

Your Box Enterprise owner needs to approve the application before you can use it. Complete the following steps to complete the authorization:

  1. Log in to your Box Enterprise Admin Console as the admin user.
  2. Choose Apps in the navigation pane and choose the Customs App Manager tab to view the apps that need to be authorized.
  3. Choose the AmazonQConnector app that says Pending Authorization.
  4. Choose the options menu (three dots) and choose Authorize App.

  1. Choose Authorize in the Authorize App pop-up.

This will authorize your AmazonQConnector application and change the status to Authorized.

You can review the downloaded JSON file in your computer’s downloads directory. It contains the client ID, client secret, public key ID, private key, passphrase, and enterprise ID, which you’ll need when creating the Box data source in a later step.

Add sample documents to your Box account

In this step, upload sample documents to your Box account. Later, you use the Amazon Q Box data source connector to crawl and index these documents.

  1. Download the zip file to your computer.
  2. Extract the files to a folder called AWS_Whitepapers.

  1. Log in to your Box Enterprise account as an admin user.
  2. Upload the AWS_Whitepapers folder to your Box account.

At the time of writing, this folder contains 6 folders and 60 files within them.

Set user-specific permissions on folders in your Box account

In this step, you set up user-level access control for two users on two separate folders in your Box account.

For this ACL simulation, consider the two department-specific users created earlier. Assume John is part of the machine learning (ML) team, so he needs access only to the Machine_Learning folder contents, whereas Mary belongs to the database team, so she needs access only to the Databases folder contents.

Log in to your Box account as an admin and grant viewer access to each user for their respective folders, as shown in the following screenshots. This restricts them to see only their assigned folder’s contents.

The Machine_Learning folder is accessible to the owner and user John Doe only.

The Databases folder is accessible to the owner and user Mary Major only.

Configure the Box connector for your Amazon Q Business application

Complete the following steps to configure your Box connector for Amazon Q Business:

  1. On the Amazon Q Business console, choose Applications in the navigation pane.
  2. Select the application you want to add the Box connector to.
  3. On the Actions menu, choose Edit.

  1. On the Update application page, leave all values unchanged and choose Update.

  1. On the Update retriever page, leave all values unchanged and choose Next.

  1. On the Connect data sources page, on the All tab, search for Box.
  2. Choose the plus sign next to the Box connector.

  1. On the Add data source page, for Data source name, enter a name, for example, box-data-source.
  2. Open the JSON file you downloaded from the Box Developer Console.

The file contains values for clientID, clientSecret, publicKeyID, privateKey, passphrase, and enterpriseID.

  1. In the Source section, for Box enterprise ID, enter the value of the enterpriseID key from the JSON file.

  1. For Authorization, no change is needed because by default the ACLs are set to ON for the Box data source connector.
  2. In the Authentication section, under AWS Secrets Manager secret, choose Create and add a new secret.
  3. For Secret name, enter a name for the secret, for example, connector. The prefix QBusiness-Box- is automatically added for you.
  4. For the remaining fields, enter the corresponding values from the downloaded JSON file.
  5. Choose Save to add the secret.

  1. In the Configure VPC and Security group section, use the default setting (No VPC) for this post.
  2. Identity crawling is enabled by default, so no changes are necessary.

  1. In the IAM role section, choose Create a new role (Recommended) and enter a role name, for example, box-role.

For more information on the required permissions to include in the IAM role, see IAM roles for data sources.

  1. In the Sync scope section, in addition to file contents, you can include Box web links, comments, and tasks to your index. Use the default setting (unchecked) for this post.
  2. In the Additional configuration section, you can choose to include or exclude regular expression (regex) patterns. These regex patterns can be applied based on the file name, file type, or file path. For this demo, we skip the regex patterns configuration.

  1. In the Sync mode section, select New, modified, or deleted content sync.
  2. In the Sync run schedule section, choose Run on demand.

  1. In the Field Mappings section, keep the default settings.

After you complete the retriever creation, you can modify field mappings and add custom field attributes. You can access field mapping by editing the data source.

  1. Choose Add data source and wait for the retriever to get created.

It can take a few seconds for the required roles and the connector to be created.

After the data source is created, you’re redirected to the Connect data sources page to add more data sources as needed.

  1. For this walkthrough, choose Next.
  2. In the Update groups and users section, choose Add groups and users to add the groups and users from IAM Identity Center set up by your administrator.

  1. In the Add or assign users and groups pop-up, select Assign existing users and groups to add existing users configured in your connected IAM Identity Center and choose Next.

Optionally, if you have permissions to add users to connected IAM Identity Center, you can select Add new users.

  1. On the Assign users and groups page, choose Get Started.
  2. In the search box, enter John Doe and choose his user name.

  1. Add the second user, Mary Major, by entering her name in the search box.

  1. Optionally, you can add the admin user to this application.
  2. Choose Assign to add these users to this Amazon Q app.
  3. In the Groups and users section, choose the Users tab, where you will see no subscriptions configured currently.
  4. Choose Manage access and subscriptions to configure the subscription.

  1. On the Manage access and subscriptions page, choose the Users
  2. Select your users.
  3. Choose Change subscription and choose Update subscription tier.

  1. On the Confirm subscription change page, for New subscription, choose Business Pro.
  2. Choose Confirm.

  1. Verify the changed subscription for all three users, then choose Done.

  1. Choose Update application to complete adding and setting up the Box data connector for Amazon Q Business.

Configure Box field mappings

To help you structure data for retrieval and chat filtering, Amazon Q Business crawls data source document attributes or metadata and maps them to fields in your Amazon Q index. Amazon Q has reserved fields that it uses when querying your application. When possible, Amazon Q automatically maps these built-in fields to attributes in your data source.

If a built-in field doesn’t have a default mapping, or if you want to map additional index fields, use the custom field mappings to specify how a data source attribute maps to your Amazon Q application.

  1. On the Amazon Q Business console, choose your application.
  2. Under Data sources, select your data source.
  3. On the Actions menu, choose Edit.

  1. In the Field mappings section, select the required fields to crawl under Files and folders, Comments, Tasks, and Web Links that are available and choose Update.

 When selecting all items, make sure you navigate through each page by choosing the page numbers and selecting Select All on every page to include all mapped items.

Index sample documents from the Box account

The Box connector setup for Amazon Q is now complete. Because you configured the data source sync schedule to run on demand, you need to start it manually.

In the Data sources section, choose the data source box-data-source and choose Sync now.

The Current sync state changes to Syncing – crawling, then to Syncing – indexing.

After a few minutes, the Current sync state changes to Idle, the Last sync status changes to Successful, and the Sync run history section shows more details, including the number of documents added.

As shown in the following screenshot, Amazon Q has successfully scanned and added all 60 files from the AWS_Whitepapers Box folder.

Query Box data using the Amazon Q web experience

Now that the data synchronization is complete, you can start exploring insights from Amazon Q. In the newly created Amazon Q application, choose Customize web experience to open a new tab with a preview of the UI and options to customize according to your needs.

You can customize the Title, Subtitle, and Welcome message as needed, which will be reflected in the UI.

For this walkthrough, we use the defaults and choose View web experience to be redirected to the login page for the Amazon Q application.

  1. Log in to the application as your first department-specific user, John Doe, using the credentials for the user that were added to the Amazon Q application.

When the login is successful, you’ll be redirected to the Amazon Q assistant UI, where you can start asking questions using natural language and get insights from your Box index.

  1. Enter a prompt in the Amazon Q Business AI assistant at the bottom, such as “What AWS AI/ML service can I use to convert text from one language to another?” Press Enter or choose the arrow icon to generate the response. You can also try your own prompts.

Because John Doe has access to the Machine_Learning folder, Amazon Q Business successfully processed his query that was related to ML and displayed the response. You can choose Sources to view the source files contributing to the response, enhancing its authenticity.

  1. Let’s attempt a different prompt related to the Databases folder, which John doesn’t have access to. Enter the prompt “How to reduce the amount of read traffic and connections to my Amazon RDS database?” or choose your own database-related prompt. Press Enter or choose the arrow icon to generate the response.

As anticipated, you’ll receive a response from the Amazon Q Business application indicating it couldn’t generate a reply from the documents John can access. Because John lacks access to the Databases folder, the Amazon Q Business application couldn’t generate a response.

  1. Go back to the Amazon Q Business Applications page and choose your application again.
  2. This time, open the web experience URL in private mode to initiate a new session, avoiding interference with the previous session.
  3. Log in as Mary Major, the second department-specific user. Use her user name, password, and any MFA you set up initially.
  4. Enter a prompt in the Amazon Q Business AI assistant at the bottom, such as “How to reduce the amount of read traffic and connections to my Amazon RDS database?” Press Enter or choose the arrow icon to generate the response. You can also try your own prompts.

Because Mary has access to the Databases folder, Amazon Q Business successfully processed her query that was related to databases and displayed the response. You can choose Sources to view the source files that contributed in generating the response.

  1. Now, let’s attempt a prompt that contains information from the Machine_Learning folder, which Mary isn’t authorized to access. Enter the prompt “What AWS AI/ML service can I use to convert text from one language to another?” or choose your own ML-related prompt.

As anticipated, the Amazon Q Business application will indicate it couldn’t generate a response because Mary lacks access to the Machine_Learning folder.

The preceding test scenarios illustrate the functionality of the Amazon Q Box connector in crawling and indexing documents along with their associated ACLs. With this mechanism, only users with the relevant permissions can access the respective folders and files within the linked Box account.

Congratulations! You’ve effectively utilized Amazon Q to unveil answers and insights derived from the content indexed from your Box account.

Frequently asked questions

In this section, we provide guidance to frequently asked questions.

Amazon Q Business is unable to answer your questions

If you get the response “Sorry, I could not find relevant information to complete your request,” this may be due to a few reasons:

  • No permissions – ACLs applied to your Box account don’t allow you to query certain data sources. If this is the case, reach out to your application administrator to make sure your ACLs are configured to access the data sources.
  • Data connector sync failed – Your data connector may have failed to sync information from the source to the Amazon Q Business application. Verify the data connector’s sync run schedule and sync history to confirm the sync is successful.
  • Incorrect regex pattern – Validate the correct definition of the regex include or exclude pattern when setting up the Box data source.

If none of these reasons apply to your use case, open a support case and work with your technical account manager to get this resolved.

How to generate responses from authoritative data sources

If you want Amazon Q Business to only generate responses from authoritative data sources, the use of guardrails can be highly beneficial. Within the application settings, you can specify the authorized data repositories, such as content management systems and knowledge bases, from which the assistant is permitted to retrieve and synthesize information. By defining these approved data sources as guardrails, you can instruct Amazon Q Business to only use reliable, up-to-date, and trustworthy information, eliminating the risk of incorporating data from non-authoritative or potentially unreliable sources.

Additionally, Amazon Q Business offers the capability to define content filters as part of Guardrails for Amazon Bedrock. These filters can specify the types of content, topics, or keywords deemed appropriate and aligned with your organization’s policies and standards. By incorporating these content-based guardrails, you can further refine the assistant’s responses to make sure they align with your authoritative information and messaging. The integration of Amazon Q Business with IAM Identity Center also serves as a critical guardrail, allowing you to validate user identities and align ACLs to make sure end-users only receive responses based on their authorized data access.

Amazon Q Business responds using old (stale) data even though your data source is updated

If you find that Amazon Q Business is responding with outdated or stale data, you can use the relevance tuning and boosting features to surface the latest documents. The relevance tuning functionality allows you to adjust the weightings assigned to various document attributes, such as recency, to prioritize the most recent information. Boosting can also be used to explicitly elevate the ranking of the latest documents, making sure they are prominently displayed in the assistant’s responses. For more information on relevance tuning, refer to Boosting chat responses using relevance tuning.

Additionally, it’s important to review the sync schedule and status for your data connectors. Verifying the sync frequency and the last successful sync run can help identify any issues with data freshness. Adjusting the sync schedule or running manual syncs, as needed, can help keep the data up to date and improve the relevance of the Amazon Q Business responses. For more information, refer to Sync run schedule.

Clean up

To prevent incurring additional costs, it’s essential to clean up and remove any resources created during the implementation of this solution. Specifically, you should delete the Amazon Q application, which will consequently remove the associated index and data connectors. However, any IAM roles and secrets created during the Amazon Q application setup process need to be removed separately. Failing to clean up these resources may result in ongoing charges, so it’s crucial to take the necessary steps to completely remove all components related to this solution.

Complete the following steps to delete the Amazon Q application, secret, and IAM role:

  1. On the Amazon Q Business console, select the application that you created.
  2. On the Actions menu, choose Delete and confirm the deletion.
  3. On the Secrets Manager console, select the secret that was created for the Box connector.
  4. On the Actions menu, choose Delete.
  5. Select the waiting period as 7 days and choose Schedule deletion.
  6. On the IAM console, select the role that was created during the Amazon Q application creation.
  7. Choose Delete and confirm the deletion.
  8. Delete the AWS_Whitepapers folder and its contents from your Box
  9. Delete the two demo users that you created in your Box Enterprise account.
  10. On the IAM Identity Center console, choose Users in the navigation pane.
  11. Select the three demo users that you created and choose Delete users to remove these users.

Conclusion

The Amazon Q Box connector allows organizations to seamlessly integrate their Box files into the powerful generative AI capabilities of Amazon Q. By following the steps outlined in this post, you can quickly configure the Box connector as a data source for Amazon Q and initiate synchronization of your Box information. The native field mapping options enable you to customize exactly which Box data to include in Amazon Q’s index.

Amazon Q can serve as a powerful assistant capable of providing rich insights and summaries about your Box files directly from natural language queries.

The Amazon Q Box integration represents a valuable tool for software teams to gain AI-driven visibility into their organization’s document repository. By bridging Box’s industry-leading content management with Amazon’s cutting-edge generative AI, teams can drive productivity, make better informed decisions, and unlock deeper insights into their organization’s knowledge base. As generative AI continues advancing, integrations like this will become critical for organizations aiming to deliver streamlined, data-driven software development lifecycles.

To learn more about the Amazon Q connector for Box, refer to Connecting Box to Amazon Q.


About the Author

Maran Chandrasekaran is a Senior Solutions Architect at Amazon Web Services, working with our enterprise customers. Outside of work, he loves to travel and ride his motorcycle in Texas Hill Country.

Senthil Kamala Rathinam is a Solutions Architect at Amazon Web Services specializing in data and analytics. He is passionate about helping customers design and build modern data platforms. In his free time, Senthil loves to spend time with his family and play badminton.

Vijai Gandikota is a Principal Product Manager in the Amazon Q and Amazon Kendra organization of Amazon Web Services. He is responsible for the Amazon Q and Amazon Kendra connectors, ingestion, security, and other aspects of the Amazon Q and Amazon Kendra services.

Read More

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

How Twilio generated SQL using Looker Modeling Language data with Amazon Bedrock

This post is co-written with Aishwarya Gupta, Apurva Gawad, and Oliver Cody from Twilio.

Today’s leading companies trust Twilio’s Customer Engagement Platform (CEP) to build direct, personalized relationships with their customers everywhere in the world. Twilio enables companies to use communications and data to add intelligence and security to every step of the customer journey, from sales and marketing to growth, customer service, and many more engagement use cases in a flexible, programmatic way. Across 180 countries, millions of developers and hundreds of thousands of businesses use Twilio to create personalized experiences for their customers. As one of the largest AWS customers, Twilio engages with data, artificial intelligence (AI), and machine learning (ML) services to run their daily workloads.

Data is the foundational layer for all generative AI and ML applications. Managing and retrieving the right information can be complex, especially for data analysts working with large data lakes and complex SQL queries. To address this, Twilio partnered with AWS to develop a virtual assistant that helps their data analysts find and retrieve relevant data from Twilio’s data lake by converting user questions asked in natural language to SQL queries. This virtual assistant tool uses Amazon Bedrock, a fully managed generative AI service that provides access to high-performing foundation models (FMs) and capabilities like Retrieval Augmented Generation (RAG). RAG optimizes language model outputs by extending the models’ capabilities to specific domains or an organization’s internal data for tailored responses.

This post highlights how Twilio enabled natural language-driven data exploration of business intelligence (BI) data with RAG and Amazon Bedrock.

Twilio’s use case

Twilio wanted to provide an AI assistant to help their data analysts find data in their data lake. They used the metadata layer (schema information) over their data lake consisting of views (tables) and models (relationships) from their data reporting tool, Looker, as the source of truth. Looker is an enterprise platform for BI and data applications that helps data analysts explore and share insights in real time.

Twilio implemented RAG using Anthropic Claude 3 on Amazon Bedrock to develop a virtual assistant tool called AskData for their data analysts. This tool converts questions from data analysts asked in natural language (such as “Which table contains customer address information?”) into a SQL query using the schema information available in Looker Modeling Language (LookML) models and views. The analysts can run this generated SQL directly, saving them the time to first identify the tables containing relevant information and then write a SQL query to retrieve the information.

The AskData tool provides ease of use and efficiency to its users:

  • Users need accurate information about the data in a quick and accessible manner to make business decisions. Providing a tool to minimize their time spent finding tables and writing SQL queries allows them to focus more on business outcomes and less on logistical tasks.
  • Users typically reach out to the engineering support channel when they have questions about data that is deeply embedded in the data lake or if they can’t access it using various queries. Having an AI assistant can reduce the engineering time spent in responding to these queries and provide answers more quickly.

Solution overview

In this post, we show you a step-by-step implementation and design of the AskData tool designed to serve as an AI assistant for Twilio’s data analysts. We discuss the following:

  • How to use a RAG approach to retrieve the relevant LookML metadata corresponding to users’ questions with the help of efficient data chunking and indexing and generate SQL queries from natural language
  • How to select the optimal large language model (LLM) for your use case from Amazon Bedrock
  • How analysts can query the data using natural language questions
  • The benefits of using RAG for data analysis, including increased productivity and reduced engineering overhead of finding the data (tables) and writing SQL queries.

This solution uses Amazon Bedrock, Amazon Relational Database Service (Amazon RDS), Amazon DynamoDB, and Amazon Simple Storage Service (Amazon S3). The following diagram illustrates the solution architecture.

The workflow consists of the following steps:

  1. An end-user (data analyst) asks a question in natural language about the data that resides within a data lake.
  2. This question uses metadata (schema information) stored in Amazon RDS and conversation history stored in DynamoDB for personalized retrieval to the user’s questions:
    • The RDS database (PostgreSQL with pgvector) stores the LookML tables and views as embeddings that are retrieved through a vector similarity search.
    • The DynamoDB table stores the previous conversation history with this user.
  3. The context and natural language question are parsed through Amazon Bedrock using an FM (in this case, Anthropic Claude 3 Haiku), which responds with a personalized SQL query that the user can use to retrieve accurate information from the data lake. The following is the prompt template that is used for generating the SQL query:
Human: The context information below represents the LookML data for Looker views and models. 
Using this context data, please generate a presto SQL query that will return the correct result for the user's question. 
Please provide a SQL query with the correct syntax, table names, and column names based on the provided LookML data.

<instructions>

1. Use the correct underlying SQL table names (table name in sql_table_name) 
and column names (use column names from the dimensions of the view as they are the correct column names). 
Use the following as an example:

{{example redacted}}

2. Join tables as necessary to get the correct result. 
- Avoid unnecessary joins if not explicitly requested by the user.

3. Avoid unnecessary filters if not explicitly requested by the user.

4. If the view has a derived table, use the derived query to answer question 
using table names and column names from derived query. Use the following as an example:

{{example redacted}}

5. The schema name is represented as <schema>.<table_name> within the LookML views. 
Use the existing schema name or "public" as the schema name if no schema is specified.

</instructions>

This is the chat history from previous messages:

<chat_history>

{chat_history}

</chat_history>

<context>

{context}

</context>

This is the user question:

<question>

{question}

</question>

Assistant: Here is a SQL query for the user question:

The solution comprises four main steps:

  1. Use semantic search on LookML metadata to retrieve the relevant tables and views corresponding to the user questions.
  2. Use FMs on Amazon Bedrock to generate accurate SQL queries based on the retrieved table and view information.
  3. Create a simple web application using LangChain and Streamlit.
  4. Refine your existing application using strategic methods such as prompt engineering, optimizing inference parameters and other LookML content.

Prerequisites

To implement the solution, you should have an AWS account, model access to your choice of FM on Amazon Bedrock, and familiarity with DynamoDB, Amazon RDS, and Amazon S3.

Access to Amazon Bedrock FMs isn’t granted by default. To gain access to an FM, an AWS Identity and Access Management (IAM) user with sufficient permissions needs to request access to it through the Amazon Bedrock console. After access is provided to a model, it is available for the users in the account.

To manage model access, choose Model access in the navigation pane on the Amazon Bedrock console. The model access page lets you view a list of available models, the output modality of the model, whether you have been granted access to it, and the End User License Agreement (EULA). You should review the EULA for terms and conditions of using a model before requesting access to it. For information about model pricing, refer to Amazon Bedrock pricing.

Model access

Model access

Structure and index the data

In this solution, we use the RAG approach to retrieve the relevant schema information from LookML metadata corresponding to users’ questions and then generate a SQL query using this information.

This solution uses two separate collections that are created in our vector store: one for Looker views and another for Looker models. We used the sentence-transformers/all-mpnet-base-v2 model for creating vector embeddings and PostgreSQL with pgvector as our vector database. As long as the LookML file doesn’t exceed the context window of the LLM used to generate the final response, we don’t split the file into chunks and instead pass the file in its entirety to the embeddings model. The vector similarity search is able to find the correct files that contain the LookML tables and views relevant to the user’s question. We can pass the entire LookML file contents to the LLM, taking advantage of its large context window, and the LLM is able to pick the schemas for the relevant tables and views to generate the SQL query.

The two subsets of LookML metadata provide distinct types of information about the data lake. Views represent individual tables, and models define the relationships between those tables. By separating these components, we can first retrieve the relevant views based on the user’s question, and then use those results to identify the associated models that capture the relationships between the retrieved views.

This two-step procedure provides a more comprehensive understanding of the relevant tables and their relationships to the user question. The following diagram shows how both subsets of metadata are chunked and stored as embeddings in different vectors for enhanced retrieval. The LookML view and model information is brought into Amazon S3 through a separate data pipeline (not shown).

Content ingestion into vector db

Content ingestion into vector db

Select the optimal LLM for your use case

Selecting the right LLM for any use case is essential. Every use case has different requirements for context length, token size, and the ability to handle various tasks like summarization, task completion, chatbot applications, and so on. Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral, Stability AI, and Amazon within a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

This solution is implemented using Anthropic Claude 3, available through Amazon Bedrock. Anthropic Claude 3 is chosen for two main reasons:

  • Increased context window – Anthropic Claude 3 can handle up to 200,000 tokens in its context, allowing for processing larger LookML queries and tables. This expanded capacity is crucial when dealing with complex or extensive data, so the LLM has access to the necessary information for accurate and informed responses to the user.
  • Enhanced reasoning abilities – Anthropic Claude 3 demonstrates enhanced performance when working with larger contexts, enabling it to better understand and respond to user queries that require a deeper comprehension of the views, models, and their relationships. You can gain granular control over the reasoning capabilities using several prompt engineering techniques.

Build a web application

This solution uses LangChain and Streamlit to build a web application and integrate Amazon Bedrock into it. LangChain is a framework specifically designed to simplify the creation of applications using LLMs, and it’s straightforward to use Amazon Bedrock through LangChain using the Amazon Bedrock component available in LangChain. We use Streamlit to develop the frontend for this web application.

For data analysts to effortlessly interact with and get queries to extract relevant data from their data lake, this solution implements a chat engine using the ConversationalRetrievalChain mechanism, which enables you to pass a custom vector store retriever, prompt, and conversation history to the LLM and generate personalized answers to user questions. To store the chat history, we use DynamoDB with the user session ID as the primary key. DynamoDB is a highly scalable and durable NoSQL database service, enabling you to efficiently store and retrieve chat histories for multiple user sessions concurrently. The following screenshot shows an example of the chat interface developed using Streamlit.

AskData user interface

AskData user interface

The following is an example scenario to get an idea of the user workflow when interacting with AskData:

User Question - how do I find a customer address?

AskData Response - Based on the provided LookML data, 
you can find the customer address by querying the raw_mailing_addresses table, 
which contains address-related information. 
Here is the SQL query to retrieve the customer address:

SELECT
    account,
    address1,
    address2,
    city,
    province,
    postal_code,
    iso_country
FROM
    raw_mailing_addresses

This query will return the account SID (which can be used to identify the customer), the first line of the address, the second line of the address (if any), the city, the province or state, the postal code, and the ISO country code for all entries in the raw_mailing_addresses table. If you have a specific customer’s account and you want to retrieve the address for that customer, you can add a WHERE clause to the query:

SELECT
    account,
    address1,
    address2,
    city,
    province,
    postal_code,
    iso_country
FROM
    raw_mailing_addresses
WHERE
    account = 'YourCustomerAccount' -- Replace with the actual account

Replace ‘YourCustomerAccount‘ with the actual account of the customer whose address you want to find.

Optimize the application

Although using an LLM to answer user questions about data is efficient, it comes with recognized limitations, such as the ability of the LLM to generate inaccurate responses, often due to hallucinated information. To enhance the accuracy of our application and reduce hallucinations, we did the following:

  • Set the temperature for the LLM to 0.1 to reduce the LLM’s propensity for overly creative responses.
  • Added instructions in the prompt to only generate the SQL query based on the context (schema, chat history) being provided in the prompt.
  • Meticulously removed duplicate and redundant entries from the LookML data before it was ingested into the vector database.
  • Added a user experience feedback (a rating from 1–5 with an optional text input for comments) as part of the UI of AskData. We used the feedback to improve the quality of our data, prompts, and inference parameter settings.

Based on user feedback, the application achieved a net promoter score (NPS) of 40, surpassing the initial target score of 35%. We set this target due to the following key factors: the lack of relevant information for specific user questions within the LookML data, specific rules related to the structure of SQL queries that might need to be added, and the expectation that sometimes the LLM would make a mistake in spite of all the measures we put in place.

Conclusion

In this post, we illustrated how to use generative AI to significantly enhance the efficiency of data analysts. By using LookML as metadata for our data lake, we constructed vector stores for views (tables) and models (relationships). With the RAG framework, we efficiently retrieved pertinent information from these stores and provided it as context to the LLM alongside user queries and any previous chat history. The LLM then seamlessly generated SQL queries in response.

Our development process was streamlined thanks to various AWS services, particularly Amazon Bedrock, which facilitated the integration of LLM for query responses, and Amazon RDS, serving as our vector stores.

Check out the following resources to learn more:

Get started with Amazon Bedrock today, and leave your feedback and questions in the comments section.


About the Authors

Apurva Gawad is a Senior Data Engineer at Twilio specializing in building scalable systems for data ingestion and empowering business teams to derive valuable insights from data. She has a keen interest in AI exploration, blending technical expertise with a passion for innovation. Outside of work, she enjoys traveling to new places, always seeking fresh experiences and perspectives.

Aishwarya Gupta is a Senior Data Engineer at Twilio focused on building data systems to empower business teams to derive insights. She enjoys to travel and explore new places, foods, and culture.

Oliver Cody is a Senior Data Engineering Manager at Twilio with over 28 years of professional experience, leading multidisciplinary teams across EMEA, NAMER, and India. His experience spans all things data across various domains and sectors. He has focused on developing innovative data solutions, significantly optimizing performance and reducing costs.

Amit Arora is an AI and ML specialist architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington D.C.

Johnny Chivers is a Senior Solutions Architect working within the Strategic Accounts team at AWS. With over 10 years of experience helping customers adopt new technologies, he guides them through architecting end-to-end solutions spanning infrastructure, big data, and AI.

Read More