Derive generative AI powered insights from Alation Cloud Services using Amazon Q Business Custom Connector

This blog post is co-written with Gene Arnold from Alation.

To build a generative AI-based conversational application integrated with relevant data sources, an enterprise needs to invest time, money, and people. First, you would need build connectors to the data sources. Next you need to index this data to make it available for a Retrieval Augmented Generation (RAG) approach where relevant passages are delivered with high accuracy to a large language model (LLM). To do this, you need to select an index that provides the capabilities to index the content for semantic and vector search, build the infrastructure to retrieve data, rank the answers, and build a feature rich web application. Additionally, you might need to hire and staff a large team to build, maintain, and manage such a system.

Amazon Q Business is a fully managed generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Amazon Q Business can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take actions using the data and expertise found in your company’s information repositories, code, and enterprise systems. To do this Amazon Q Business provides out-of-the-box native data source connectors that can index content into a built-in retriever and uses an LLM to provide accurate, well written answers. A data source connector is a component of Amazon Q Business that helps to integrate and synchronize data from multiple repositories into one index. Amazon Q Business offers multiple prebuilt connectors to a large number of data sources, including ServiceNow, Atlassian Confluence, Amazon Simple Storage Service (Amazon S3), Microsoft SharePoint, Salesforce, and many more. For a full list of supported data source connectors, see Amazon Q Business connectors.

However, many organizations store relevant information in the form of unstructured data on company intranets or within file systems on corporate networks that are inaccessible to Amazon Q Business using its native data source connectors. You can now use the custom data source connector within Amazon Q Business to upload content to your index from a wider range of data sources.

Using an Amazon Q Business custom data source connector, you can gain insights into your organization’s third party applications with the integration of generative AI and natural language processing. This post shows how to configure an Amazon Q Business custom connector and derive insights by creating a generative AI-powered conversation experience on AWS using Amazon Q Business while using access control lists (ACLs) to restrict access to documents based on user permissions.

Alation is a data intelligence company serving more than 600 global enterprises, including 40% of the Fortune 100. Customers rely on Alation to realize the value of their data and AI initiatives. Headquartered in Redwood City, California, Alation is an AWS Specialization Partner and AWS Marketplace Seller with Data and Analytics Competency. Organizations trust Alation’s platform for self-service analytics, cloud transformation, data governance, and AI-ready data, fostering innovation at scale. In this post, we will showcase a sample of how Alation’s business policies can be integrated with an Amazon Q Business application using a custom data source connector.

Finding accurate answers from content in custom data sources using Amazon Q Business

After you integrate Amazon Q Business with data sources such as Alation, users can ask questions from the description of the document. For example,

  1. What are the top sections of the HR benefits policies?
  2. Who are the data stewards for my proprietary database sources?

Overview of a custom connector

data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. Amazon Q Business offers multiple pre-built data source connectors that can connect to your data sources and help you create your generative AI solution with minimal configuration. However, if you have valuable data residing in spots for which those pre-built connectors cannot be used, you can use a custom connector.

When you connect Amazon Q Business to a data source and initiate the data synchronization process, Amazon Q Business crawls and adds documents from the data source to its index.

You would typically use an Amazon Q Business custom connector when you have a repository that Amazon Business doesn’t yet provide a data source connector for. Amazon Q Business only provides metric information that you can use to monitor your data source sync jobs. You must create and run the crawler that determines the documents your data source indexes. A simple architectural representation of the steps involved is shown in the following figure.

Architecture Diagram

Solution overview

The solution shown of integrating Alation’s business policies is for demonstration purposes only. We recommend running similar scripts only on your own data sources after consulting with the team who manages them, or be sure to follow the terms of service for the sources that you’re trying to fetch data from. The steps involved for other custom data sources are very similar except the part where we connect to Alation and fetch data from it. To crawl and index contents in Alation you configure an Amazon Q Business custom connector as a data source in your Amazon Q Business application.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Configure your Alation connection

In your Alation cloud account, create an OAuth2 client application that can be consumed from an Amazon Q Business application.

  1. In Alation, sign in as a user with administrator privileges, navigate to the settings page, and choose Authentication (https://[[your-domain]].alationcloud.com/admin/auth/).

Alation Admin Settings

  1. In the OAuth Client Applications section, choose Add.

Alation OAuth Client Applications

  1. Enter an easily identifiable application name, and choose Save.

Create OAuth Client Application

  1. Take note of the OAuth client application data—the Client ID and the Client Secret—created and choose Close.

OAuth Client ID

  1. As a security best practice, storing the client application data in Secrets Manager is recommended. In AWS console, navigate to AWS Secrets Manager and add a new secret. Key in the Client_Id and Client_Secret values copied from the previous step.

AWS Secrets Manager - Create Secret1

  1. Provide a name and description for the secret and choose Next.

AWS Secrets Manager - Create Secret2

  1. Leave the defaults and choose Next.

AWS Secrets Manager - Create Secret3

  1. Choose Store in the last page.

AWS Secrets Manager - Create Secret4

Create sample Alation policies

In our example, you would create three different sets of Alation policies for a fictional organization named Unicorn Rentals. Grouped as Workplace, HR, and Regulatory, each policy contains a rough two-page summary of crucial organizational items of interest. You can find details on how to create policies on Alation documentation.

Alation Create Policies

On the Amazon Q Business side, let’s assume that we want to ensure that the following access policies are enforced. Users and access are setup via code illustrated in later sections.

# First name Last name Policies authorized for access
1 Alejandro Rosalez Workplace, HR, and Regulatory
2 Sofia Martinez Workplace and HR
3 Diego Ramirez Workplace and Regulatory

Create an Amazon Q Business application

  1. Sign in to the AWS Management Console and navigate to Amazon Q Business from the search bar at the top.
  1. On the Amazon Q Business console, choose Get Started.

Amazon Q Business Console

  1. On the Applications page, choose Create application.

Amazon Q Business List Applications

  1. In the first step of the Create application wizard, enter the default values. Additionally, you need to choose a list of users who require access to the Amazon Q Business application by including them through the IAM Identity Center settings.

Q Business Create Application1

  1. In the access management settings page, you would create and add users via AWS IAM Identity Center.

Q Business Create Application2

  1. Once all users are added, choose Create.

Q Business Create Application3

  1. After the application is created, take note of the Application ID value from the landing page.

Q Business Create Application4

  1. Next is to choose an index type for the Amazon Q Business application. Choose the native retriever option.

Q Business Create Index1

Q Business Create Index2

  1. After the index is created, verify that the status has changed to Active. You can then take a note of the Index ID.

Q Business Create Index3

  1. Next step is for you to add the custom data source.

Q Business Add Data Source1

  1. Search for Custom data source and choose the plus sign next to it.

Q Business Add Data Source2

  1. Provide a name and description for the custom data source.

Q Business Add Data Source2

  1. Once done, choose Add data source.

Q Business Add Data Source4

  1. After the data source is added and its status is Active, take note of the Data source ID.

Q Business Add Data Source4

Load policy data from Alation to Amazon Q Business using the custom connector

Now let’s load the Alation data into Amazon Q Business using the correct access permissions. The code examples that follow are also available on the accompanying GitHub code repository.

  1. With the connector ready, move over to the SageMaker Studio notebook and perform data synchronization operations by invoking Amazon Q Business APIs.
  2. To start, retrieve the Alation OAuth client application credentials stored in Secrets Manager.
    secrets_manager_client = boto3.client('secretsmanager')
    secret_name = "alation_test"
    
    try:
        get_secret_value_response = secrets_manager_client.get_secret_value(
            SecretId=secret_name
        )
        secret = eval(get_secret_value_response['SecretString'])
    
    except ClientError as e:
            raise e

  1. Next, initiate the connection using the OAuth client application credentials from Alation.
    base_url = "https://[[your-domain]].alationcloud.com"
    token_url = "/oauth/v2/token/"
    introspect_url = "/oauth/v2/introspect/"
    jwks_url = "/oauth/v2/.well-known/jwks.json/"
    
    api_url = base_url + token_url
    data = {
            "grant_type": "client_credentials",
           }
    client_id = secret['Client_Id']
    client_secret = secret['Client_Secret']
    
    auth = HTTPBasicAuth(username=client_id, password=client_secret)
    response = requests.post(url=api_url, data=data, auth=auth)
    print(response.json())
    
    access_token = response.json().get('access_token','')
    api_url = base_url + introspect_url + "?verify_token=true"
    data = {
            "token": access_token,
           }
    response = requests.post(url=api_url, data=data, auth=auth)
    

  1. You then configure policy type level user access. This section can be customized based on how user access information is stored on any data sources. Here, we assume a pre-set access based on the user’s email IDs.
    primary_principal_list = []
    workplace_policy_principals = []
    hr_policy_principals = []
    regulatory_policy_principals = []
    
    principal_user_email_ids = ['alejandro_rosalez@example.com', ‘sofia_martinez@example.com', ‘diego_martinez@example.com']
    
    workplace_policy_email_ids = ['alejandro_rosalez@example.com', 'sofia_martinez@example.com', 'diego_ramirez@example.com']
    hr_policy_email_ids = ['alejandro_rosalez@example.com', 'sofia_martinez@example.com']
    regulatory_policy_email_ids = ['alejandro_rosalez@example.com', 'diego_ramirez@example.com']
    
    for workplace_policy_member in workplace_policy_email_ids:
        workplace_policy_members_dict = { 'user': { 'id': workplace_policy_member, 'access': 'ALLOW', 'membershipType': 'DATASOURCE' }}
        workplace_policy_principals.append(workplace_policy_members_dict)
        if workplace_policy_member not in primary_principal_list:
            primary_principal_list.append(workplace_policy_member)
    
    for hr_policy_member in hr_policy_email_ids:
        hr_policy_members_dict = { 'user': { 'id': hr_policy_member, 'access': 'ALLOW', 'membershipType': 'DATASOURCE' }}
        hr_policy_principals.append(hr_policy_members_dict)
        if hr_policy_member not in primary_principal_list:
            primary_principal_list.append(hr_policy_member)
            
    for regulatory_policy_member in regulatory_policy_email_ids:
        regulatory_policy_members_dict = { 'user': { 'id': regulatory_policy_member, 'access': 'ALLOW', 'membershipType': 'DATASOURCE' }}
        regulatory_policy_principals.append(regulatory_policy_members_dict)
        if regulatory_policy_member not in primary_principal_list:
            primary_principal_list.append(regulatory_policy_member)

  1. You then pull individual policy details from Alation. This step can be repeated for all three policy types: Workplace, HR, and regulatory
    url = "https://[[your-domain]].com/integration/v1/business_policies/?limit=200&skip=0&search=[[Workplace/HR/Regulatory]]&deleted=false"
    
    headers = {
        "accept": "application/json",
        "TOKEN": access_token
    }
    
    response = requests.get(url, headers=headers)
    policy_data = ""
    
    for policy in json.loads(response.text):
        if policy["title"] is not None:
            policy_title = cleanhtml(policy["title"])
        else:
            policy_title = "None"
        if policy["description"] is not None:
            policy_description = cleanhtml(policy["description"])
        else:
            policy_description = "None"
        temp_data = policy_title + ":n" + policy_description + "nn"
        policy_data += temp_data
    

  1. The next step is to define the Amazon Q Business application, index, and data source information that you created in the previous steps.
    qbusiness_client = boto3.client('qbusiness')
    application_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    index_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    data_source_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

  1. Now you explicitly create the users in Amazon Q Business. Individual user access to different policy type data sets is configured later.
    for principal in primary_principal_list:
        create_user_response = qbusiness_client.create_user(
            applicationId=application_id,
            userId=principal,
            userAliases=[
                {
                    'indexId': index_id,
                    'dataSourceId': data_source_id,
                    'userId': principal
                },
            ],
        )
    
    for principal in primary_principal_list:
        get_user_response = qbusiness_client.get_user(
            applicationId=application_id,
            userId=principal
        )
        for user_alias in get_user_response['userAliases']:
            if "dataSourceId" in user_alias:
                print(user_alias['userId'])

  1. For each policy type data set (Workplace, HR, and Regulatory), we execute the following three steps.
    1. Start an Amazon Q Business data source sync job.
      start_data_source_sync_job_response = qbusiness_client.start_data_source_sync_job(
          dataSourceId = data_source_id,
          indexId = index_id,
          applicationId = application_id
      )
      job_execution_id = start_data_source_sync_job_response['executionId']

    1. Encode and batch upload data with user access mapping.
      workplace_policy_document_id = hashlib.shake_256(policy_data.encode('utf-8')).hexdigest(128)
          docs = [ {
              "id": policy_document_id,
              "content" : {
                  'blob': policy_data.encode('utf-8')
              },
              "contentType": "PLAIN_TEXT",
              "title": "Unicorn Rentals – Workplace/HR/Regulatory Policy",
              "accessConfiguration" : { 'accessControls': [ { 'principals': [[xx]]_policy_principals } ] }   
          }    
          ]
          
          batch_put_document_response = qbusiness_client.batch_put_document(
              applicationId = application_id,
              indexId = index_id,
              dataSourceSyncId = job_execution_id,
              documents = docs,
          )

    1. Stop the data source sync job and wait for the data set to be indexed.
      stop_data_source_sync_job_response = qbusiness_client.stop_data_source_sync_job(
              dataSourceId = data_source_id,
              indexId = index_id,
              applicationId = application_id
          )
          max_time = time.time() + 1*60*60
          found = False
          while time.time() < max_time and bool(found) == False:
              list_documents_response = qbusiness_client.list_documents(
                  applicationId=application_id,
                  indexId=index_id
              )
              if list_documents_response:
                  for document in list_documents_response["documentDetailList"]:
                      if document["documentId"] == workplace_policy_document_id:
                          status = document["status"]
                          print(status)
                          if status == "INDEXED" or status == "FAILED" or status == "DOCUMENT_FAILED_TO_INDEX" or status == "UPDATED":
                              found = True        
                          else:
                              time.sleep(10)        
      except:
          print("Exception when calling API")

  1. Go back to the Amazon Q Business console and see if the data uploads were successful.

Q Business Sync Status1

  1. Find and open the custom data source from the list of data sources.

Q Business Sync Status2

  1. Ensure the ingested documents are added in the Sync history tab and are in the Completed status.

Q Business Sync Status3

  1. Also ensure the Last sync status for the custom data source connector is Completed.

Q Business Sync Status5

Run queries with the Amazon Q Business web experience

Now that the data synchronization is complete, you can start exploring insights from Amazon Q Business. With the newly created Amazon Q Business application, select the Web Application settings tab and navigate to the auto-created URL. This will open a new tab with a preview of the user interface and options that you can customize to fit your use case.

Q Business Web Experience

  1. Sign in as user Alejandro Rosales. As you might recall, Alejandro has access to all three policy type data sets (Workplace, HR and Regulator).
    1. Start by asking a question about HR policy, such as “Per the HR Payroll Policy of Unicorn Rents, what are some additional voluntary deductions taken from employee paychecks.” Note how Q Business provides an answers and also shows where it pulled the answer from.

    Q Business Web Experience

    1. Next, ask a question about a Regulatory policy: “Per the PCI DSS compliance policy of Unicorn Rentals, how is the third-party service provider access to cardholder information protected?” The result includes the summarized answer on PCI DSS compliance and also shows sources where it gathered the data from.

    Q Business Web Experience

    1. Lastly, see how Amazon Q Business responds when asked a question about generic workplace policy. “What does Unicorn Rentals do to protect information of children under the age of 13.” In this case, the application returns the answer and marks it as a Workplace policy question.

    Q Business Web Experience

  1. Let’s next sign in as Sofia Martinez. Sofia has access to HR and Workplace policy types, but not to Regulatory policies.
    1. Start by asking a question about HR policy: “Per the HR Payroll Policy of Unicorn Rentals, list the additional voluntary deductions taken from employee paychecks.” Note how Q Business list the deductions and cite policy where the answer is gathered from.

    Q Business Web Experience

    1. Next, ask a Regulatory policy question: “What are the record keeping requirements mentioned in the ECOA compliance policy of Unicorn Rentals?”. Note how Amazon Q Business contextually answers the question mentioning Sofia does not have access to that data –

    Q Business Web Experience

  1. Finally, sign in as Diego Ramirez. Diego has access to Workplace and Regulatory policies but not to HR policies.
    1. Start by asking the same Regulatory policy question that: “Per the PCI DSS compliance policy of Unicorn Rentals, how is third-party service provider access to cardholder information protected?”. Since Diego has access to Regulatory policy data, expected answer is generated.

    Q Business Web Experience

    1. Next, when Diego asks a question about a HR policy: “Per the HR Compensation Policy of Unicorn Rentals, how is job pricing determined?.” Note how Amazon Q Business contextually answers the question mentioning Diego does not have access to that data.

    Q Business Web Experience

Troubleshooting

If you’re unable to get answers to any of your questions and get the message “Sorry, I could not find relevant information to complete your request,” check to see if any of the following issues apply:

  • No permissions: ACLs applied to your account doesn’t allow you to query certain data sources. If this is the case, please reach out to your application administrator to ensure your ACLs are configured to access the data sources.
  • EmailID not matching UserID: In rare scenarios, a user might have a different email ID associated with the Amazon Q Business Identity Center connection than is associated in the data source’s user profile. Make sure that the Amazon Q Business user profile is updated to recognize the email ID using the update-user CLI command or the related API call.
  • Data connector sync failed: Data connector fails to synchronize information from the source to Amazon Q Business application. Verify the data connectors sync run schedule and sync history to help ensure that the synchronization is successful.
  • Empty or private data sources: Private or empty projects will not be crawled during the synchronization run.

If none of the above are true then open a support case to get this resolved.

Clean up

To avoid incurring future charges, clean up any resources created as part of this solution. Delete the Amazon Q Business custom connector data source and client application created in Alation and the Amazon Q Business application. Next, delete the Secrets Manager secret with Alation OAuth client application credential data. Also, delete the user management setup in IAM Identity Center and the SageMaker Studio domain.

Conclusion

In this post, we discussed how to configure the Amazon Q Business custom connector to crawl and index tasks from Alation as a sample. We showed how you can use Amazon Q Business generative AI-based search to enable your business leaders and agents discover insights from your enterprise data.

To learn more about the Amazon Q Business custom connector, see the Amazon Q Business developer guide. To learn more about Alation Data Catalog, which is available for purchase through AWS Marketplace. Speak to your Alation account representative for custom purchase options. For any additional information, contact your Alation business partner.

AWS Partner Network Alation

Alation – AWS Partner Spotlight

Alation is an AWS Specialization Partner that has pioneered the modern data catalog and is making the leap into a full-service source for data intelligence. Alation is passionate about helping enterprises create thriving data cultures where anyone can find, understand, and trust data.

Contact Alation | Partner Overview | AWS Marketplace


About the Authors

Gene ArnoldGene Arnold is a Product Architect with Alation’s Forward Deployed Engineering team. A curious learner with over 25 years of experience, Gene focuses how to sharpen selling skills and constantly explores new product lines.

Prabhakar ChandrasekaranPrabhakar Chandrasekaran is a Senior Technical Account Manager with AWS Enterprise Support. Prabhakar enjoys helping customers build cutting-edge AI/ML solutions on the cloud. He also works with enterprise customers providing proactive guidance and operational assistance, helping them improve the value of their solutions when using AWS. Prabhakar holds eight AWS and seven other professional certifications. With over 21 years of professional experience, Prabhakar was a data engineer and a program leader in the financial services space prior to joining AWS.

Sindhu JambunathanSindhu Jambunathan is a Senior Solutions Architect at AWS, specializing in supporting ISV customers in the data and generative AI vertical to build scalable, reliable, secure, and cost-effective solutions on AWS. With over 13 years of industry experience, she joined AWS in May 2021 after a successful tenure as a Senior Software Engineer at Microsoft. Sindhu’s diverse background includes engineering roles at Qualcomm and Rockwell Collins, complemented by a Master’s of Science in Computer Engineering from the University of Florida. Her technical expertise is balanced by a passion for culinary exploration, travel, and outdoor activities.

Prateek JainPrateek Jain is a Sr. Solutions Architect with AWS, based out of Atlanta Georgia. He is passionate about GenAI and helping customers build amazing solutions on AWS. In his free time, he enjoys spending time with Family and playing tennis.

Read More