Integrate Amazon Bedrock Knowledge Bases with Microsoft SharePoint as a data source

Amazon Bedrock Knowledge Bases provides foundation models (FMs) and agents in Amazon Bedrock contextual information from your company’s private data sources for Retrieval Augmented Generation (RAG) to deliver more relevant, accurate, and customized responses. Amazon Bedrock Knowledge Bases offers a fully managed RAG experience.

The data sources that can be connected to as knowledge bases are continuously expanding. This post showcases how to use one of the data source connectors; Microsoft SharePoint, an integrated content management and collaboration tool that many organizations use for storing, organizing, and sharing their internal data. See Data source connectors for the full list of supported data source connectors.

Solution overview

The following are some pertinent features of the SharePoint data source within Amazon Bedrock Knowledge Bases:

It provides access to the information stored in SharePoint. The RAG architecture queries and retrieves relevant information from the SharePoint source to provide contextual responses based on the user’s input.
It provides the ability to extract structured data, metadata, and other information from documents ingested from SharePoint to provide relevant search results based on the user query.
It provides the ability to sync incremental SharePoint content updates on an ongoing basis.
It provides source attribution to the response generated by the FM.

In the following sections, we walk through the steps to create a knowledge base, configure your data source, and test the solution.

Prerequisites

The following are the prerequisites necessary to implement Amazon Bedrock Knowledge Bases with SharePoint as a connector:

An AWS account with an AWS Identity and Access Management (IAM) role and user with least privilege permissions to create and manage the necessary resources and components for the application. If you don’t have an AWS account, see How do I create and activate a new Amazon Web Services account?
A Microsoft account and a Microsoft SharePoint Online subscription to create and publish the application using the steps outlined in this post. If you don’t have this, check with your organization administrators to create sandboxes for you to experiment in, or create a new account and trial subscription as needed to complete the steps.

Create a knowledge base and connect to the data source

Complete the following steps to set up a knowledge base on Amazon Bedrock and connect to a SharePoint data source:

On the Amazon Bedrock console, choose Knowledge bases in the navigation pane.
Choose Create knowledge base.

In the Knowledge base details section, optionally change the default name and enter a description for your knowledge base.
In the IAM permissions section, select an IAM role that provides Amazon Bedrock permission to access other AWS services. You can let Amazon Bedrock create the service role or choose a custom role that you have created.
In the Choose data source section, select SharePoint.
Optionally, add tags to your knowledge base. For more information, see Tag resources.
Choose Next.

In the Name and Description section, optionally change the default data source name and enter a description of the data source.
In the Source section, provide the following information:
1. For Site URLs, enter the site URLs to use for crawling and indexing the content for RAG.
2. For Domain, enter the domain name associated with the data source. For example, if the site URL is https://deloittedasits.sharepoint.com/xyz.aspx, the domain value would be deloittedasits.
3. Under Advanced settings, keep the default selections.

While converting your data into embeddings, Amazon Bedrock encrypts your data with a key that AWS owns and manages by default. To use your own AWS Key Management Service (AWS KMS) key, choose Customize encryption settings (Advanced) and choose a key. For more information, see Encryption of transient data storage during data ingestion.

You can also choose from the following options for the data deletion policy for your data source:

Delete – Deletes all underlying data belonging to the data source from the vector store upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted, only the underlying data. This flag is ignored if an AWS account is deleted.
Retain – Retains all underlying data in your vector store upon deletion of a knowledge base or data source resource.

For more information on managing your knowledge base, see Manage a data source.

In the Authentication section, the supported authentication method is set to OAuth 2.0.
1. For Tenant ID, enter your tenant ID. Refer to section Register a new application in the Microsoft Azure Portal of this post to get the Tenant ID.
2. For AWS Secrets Manager secret, enter an AWS Secrets Manager Refer to the section Create a Secrets Manager secret for the SharePoint data source of this post to get the secret.

The SharePoint data source will need credentials to connect to the SharePoint Online site using the Microsoft Graph API. To facilitate this, create a new Secrets Manager secret. These credentials will not be used in any access logs for the SharePoint Online Site.

In the Metadata Settings section, optionally select any content types that you want to include or exclude.

In the Content chunking and parsing section, select Default.

Choose Next.
In the Embeddings model section, select Titan Embeddings G1 – Text or another embeddings model as appropriate.
In the Vector database section, select Quick create a new vector store to create a vector store for the embeddings.
Choose Next.

On the Review and create page, verify the selections you made and choose Create.

The knowledge base creation should be complete.

The knowledge base with SharePoint as the data source is now created. However, the data source needs to be synced in order to crawl the site URLs and index the associated content.

To initiate this process, on the knowledge base details page, select your data source and choose Sync.

Register a new application in the Microsoft Azure Portal

In this section, we register a new application in the Microsoft Azure Portal. We capture the Tenant ID from this step to use when configuring the data source for Knowledge Base for Amazon Bedrock. Complete the following steps:

Open the Azure Portal and log in with your Microsoft account. If you don’t have an account, you can create one or contact your organization’s administration team.
Choose New registration.
Provide the following information:
1. For Name, provide the name for your application. Let’s refer to this application as TargetApp. Amazon Bedrock Knowledge Bases uses TargetApp to connect to the SharePoint site to crawl and index the data.
2. For Who can use this application or access this API, choose Accounts in this organizational directory only (<Tenant name> only – Single tenant).
3. Choose Register.
4. Note down the application (client) ID and the directory (tenant) ID on the Overview You’ll need them later when asked for TargetApp-ClientId and TenantId.
Choose API permissions in the navigation pane.
Configure the permissions as follows:
1. Choose Add a permission.
2. Choose Microsoft Graph.
3. Choose Delegated permissions.
4. Choose Read.All in the User section.
5. Choose Read.All in the GroupMember section.
6. Choose FullControl.All in the Sites section.
7. Choose Add permissions. This permission allows the app to read data in your organization’s directory about the signed-in user.
8. On the options menu (three dots), choose Remove permission.
9. Remove the original Read – Delegated permission.
10. Choose Grant admin consent for the default directory.

Choose Certificates & secrets in the navigation pane.
1. Choose New client secret.
2. For Description, enter a description, such as description of my client secret.
3. Choose a value for Expires. In production, you’ll need to manually rotate your secret before it expires.
4. Choose Add.
5. Note down the value for your new secret. You’ll need it later when asked for your client secret (TargetApp-ClientSecret).
Optionally, choose Owners to add any additional owners for the application. Owners will be able to manage permissions of the Azure AD app (TargetApp).

Create a Secrets Manager secret for the SharePoint data source

Complete the following steps to create a Secrets Manager secret to connect to the SharePoint online sites listed as site URLs within the data source:

On the Secrets Manager console, choose Store a new secret.
For Secret type, select Other type of secret.
For Key/value pairs, enter the following:
1. username
2. password
3. clientId
4. clientSecret
For Encryption key, choose aws/secretsmanager.
Choose Next.
In the Secret name and description section, enter the name of the secret and an optional description.
Add any associated tags in the Tags
Leave Resource permissions and Replication secret as default.
Choose Next.
In the Configure rotation section, leave as default or modify according to your organizational policies.
Choose Next.
Review the options you selected and choose Store.
On the secrets detail page, note your secret ARN value to be used as the secret when creating the Knowledge Base for Amazon Bedrock.

Test the solution

Complete the following steps to test the knowledge base you created:

On the Amazon Bedrock console, choose Knowledge bases in the navigation pane.
Select the knowledge base you created and choose Test.

Choose an appropriate model for testing and choose Apply.

Enter your question for the content housed in the SharePoint site.

Clean up

If you created a new knowledge base to experiment using this post and don’t plan to use it further, delete the knowledge base so that your AWS account doesn’t accumulate costs. For instructions, see Manage a knowledge base.

Conclusion

In this post, we showed you how to configure Amazon Bedrock Knowledge Bases with SharePoint Online as a data source. By connecting SharePoint Online as a data source, employees can interact with the organization’s knowledge and data stored in SharePoint using natural language, making it straightforward to find relevant information, extract key points, and derive valuable insights. This can significantly improve productivity, decision-making, and knowledge sharing within the organization.

Try this feature on the Amazon Bedrock console today! See Amazon Bedrock Knowledge Bases to learn more.

About the Authors

Surendar Gajavelli is a Sr. Solutions Architect based out of Nashville, Tennessee. He is a passionate technology enthusiast who enjoys working with customers and helping them build innovative solutions.

Abhi Patlolla is a Sr. Solutions Architect based out of the New York City region, helping customers in their cloud transformation, AI/ML, and data initiatives. He is a strategic and technical leader, advising executives and engineers on cloud strategies to foster innovation and positive impact.

Vedere AI