Connect Amazon Q Business to Microsoft SharePoint Online using least privilege access controls

Connect Amazon Q Business to Microsoft SharePoint Online using least privilege access controls

Amazon Q Business is the generative artificial intelligence (AI) assistant that empowers employees with your company’s knowledge and data. Microsoft SharePoint Online is used by many organizations as a secure place to store, organize, share, and access their internal data. With generative AI, employees can get answers to their questions, summarize content, or generate insights from data stored in SharePoint Online. Using Amazon Q Business Connectors, you can connect SharePoint Online data to an Amazon Q Business application and start gaining insights from your data quickly.

This post demonstrates how to use Amazon Q Business with SharePoint Online as the data source to provide answers, generate summaries, and present insights using least privilege access controls and best practices recommended by Microsoft SharePoint Dev Support Team.

Solution overview

In this post, we walk you through the process of setting up an Amazon Q Business application that connects to your SharePoint Online sites using an out-of-the-box Amazon Q Business Connector and configuring it using the Sites.Selected application permission scope. The Sites.Selected permission is important because many organizations implement policies that prevent granting read access on all sites (Sites.Read.All) or full control (Sites.FullControl.All) to any connector.

The solution approach respects users’ existing identities, roles, and permissions by enabling identity crawling and access control lists (ACLs) on the Amazon Q Business connector for SharePoint Online using secure credentials facilitated through AWS Secrets Manager. If a user doesn’t have permissions to access certain data without Amazon Q Business, then they can’t access it using Amazon Q Business either. Only the data the user has access to is used to support the user query.

Prerequisites

The following are the prerequisites necessary to deploy the solution:

  • An AWS account with an AWS Identity and Access Management (IAM) role and user with permissions to create and manage the necessary resources and components for the application. If you don’t have an AWS account, see How do I create and activate a new Amazon Web Services account?
  • An Amazon Q Business application. If you haven’t set one up yet, see Creating an Amazon Q Business application environment.
  • A Microsoft account and a SharePoint Online subscription to create and publish the application using the steps outlined in this post. If you don’t have this, check with your organization admins to create sandboxes for you to experiment in, or create a new account and trial subscription as needed to complete the steps.
  • An application in Microsoft Entra ID with Sites.FullControl application-level permissions, along with its client ID and client secret. This application won’t be used by the Amazon Q Business connector, but it’s needed to grant Sites.Selected permissions exclusively to the target application.

Register a new app in the Microsoft Azure portal

Complete the following steps to register a new app in the Microsoft Azure portal:

  1. Log in to the Azure Portal with your Microsoft account.
  2. Choose New registration.
    1. For Name, provide the name for your application. For this post, we use the name TargetApp. The Amazon Q Business application uses TargetApp to connect to the SharePoint Online site to crawl and index the data.
    2. For Who can use this application or access this API, choose Accounts in this organizational directory only (<Tenant name> only – Single tenant).
    3. Choose Register.
  3. Note down the application (client) ID and the directory (tenant) ID on the Overview You’ll need them later when asked for TargetApp-ClientId and TenantId.
  4. Choose API permissions under Manage in the navigation pane.
  5. Choose Add a permission to allow the application to read data in your organization’s directory about the signed-in user.
    1. Choose Microsoft Graph.
    2. Choose Delegated permissions.
    3. Choose User.Read.All from the User section.
    4. Choose GroupMember.Read.All from the GroupMember section.
    5. Choose Sites.Selected from the Sites section.
    6. Choose Add permissions.
  6. On the options menu (three dots), choose Remove permission.
  7. Remove the original User.Read – Delegated permission.
  8. Choose Grant admin consent for Default Directory.

Registering an App and setting permissions

  1. Choose Certificates & secrets in the navigation pane.
  2. Choose New client secret.
    1. For Description, enter a description.
    2. Choose a value for Expires. Note that in production, you’ll need to manually rotate your secret before it expires.
    3. Choose Add.
    4. Note down the value for your new secret. You’ll need it later when asked for your client secret (TargetApp-ClientSecret).
  3. Optionally, choose Owners to add any additional owners for the application. Owners will be able to manage permissions of the Azure AD application (TargetApp).

Use the Graph API to grant permissions to the application on the SharePoint Online site

In this step, you define which of your SharePoint Online sites will be granted access to TargetApp. Amazon Q Business App uses TargetApp to connect to the SharePoint Online site to crawl and index the data.

For this post, we use Postman, a platform for using APIs, to grant permissions. To grant permissions to a specific SharePoint Online site, you need to have another Azure AD application, which we refer to as AdminApp, with Sites.FullControl.All permissions.

If you don’t have the prerequisite AdminApp, follow the previous steps to register AdminApp and for Application Permissions, grant Sites.FullControl.All permissions. As mentioned in the prerequisites, AdminApp will be used only to grant SharePoint Online sites access permissions to TargetApp.

We use the ClientId and ClientSecret values of AdminApp from the Azure AD application to get an AccessToken value.

  1. Create a POST request in Postman with the URL https://login.microsoftonline.com/{TenantId}/oauth2/v2.0/token.
  2. In the body of the request, choose x-www-form-urlencoded and set the following key-value pairs:
    1. Set client_id to AdminApp-ClientId.
    2. Set client_secret to AdminApp-ClientSecret.
    3. Set grant_type to client_credentials.
    4. Set scope to https://graph.microsoft.com/.default.

Get access token

  1. Choose Send.
  2. From the returned response, copy the value of access_token. You need it in a later step when asked for the bearer token.
  3. Use the value of access_token from the previous step to grant permissions to TargetApp.
    1. Get the SiteId of the SharePoint Online site by visiting your site URL (for example, https://<yourcompany>.sharepoint.com/sites/{SiteName}) in a browser. You need to log in to the site by providing valid credentials to access the site.
    2. Edit the URL in the browser address bar to append /_api/site/id at the end of {SiteName} to get the SiteId. You need this SiteId in the next step.

Getting site id

  1. Create another POST request in Postman using the URL https://graph.microsoft.com/v1.0/sites/{SiteId}/permissions. Replace {SiteId} in the URL of the request with the SiteId from the previous step.

You can repeat this step for each site you want to include in the Amazon Q Business SharePoint Online connector.

  1. Choose Bearer Token for Type on the Authorization
  2. Enter the value of access_token from earlier for Token.

Grant permissions to target app

  1. For the payload, select raw and enter the following JSON code (replace the <<TargetApp-ClientId>> and <<TargeApp-Name>> values):
{
    "roles": [
        "fullcontrol"
    ],
    "grantedToIdentities": [
        {
            "application": {
                "id": "<<TargetApp-clientId>>",
                "displayName": "<<TargeApp-Name>>"
            }
        }
    ]
}

Complete granting access

  1. Choose Send to complete the process of granting SharePoint Online sites access to the TargetApp Azure AD application.

Configure the Amazon Q Business SharePoint Online connector

Complete the following steps to configure the Amazon Q Business application’s SharePoint Online connector:

  1. On the Amazon Q Business console, choose Add Data source.
  2. Search for and choose SharePoint.
  3. Give it a name and description (optional).
  4. Choose SharePoint Online for Hosting method under Source settings.
  5. Provide the full URL for the SharePoint site that you want to include in crawling and indexing for Site URLs specific to your SharePoint repository.
    1. If the full URL of the site is https://<yourcompany>.sharepoint.com/sites/anycompany, use <yourcompany> as the value for Domain.
  6. Choose OAuth 2.0 authentication for Authentication method.
  7. Provide the value of TenantId for TenantId.

The SharePoint connector needs credentials to connect to the SharePoint Online site using the Microsoft Graph API. To facilitate this, create a new Secrets Manager secret. These credentials will not be used in any access logs for the SharePoint Online site.

  1. Choose Create and add a new secret.
  2. Enter a name for the secret.
  3. Enter the user name and password of a SiteCollection administrator on the sites included in the Amazon Q repository.
  4. Enter your client ID and client secret that you got from registering TargetApp in the previous steps.
  5. Choose Save.

Create Secret

  1. Choose Create a new service role to create an IAM role, and enter a name for the role.
  2. For Sync scope, choose Select entities and choose All (or specify the combination of items to sync).
  3. Choose a sync option based on your needs (on demand or at a frequency of your choice). For this post, we choose on-demand.
  4. Choose Add data source.
  5. After the data source is created, choose Sync now to start the crawling and indexing.

Test the solution

To test the solution, you can add users and groups, assign subscriptions, and test user and group access within your Amazon Q business application.

Clean up

If you’re only experimenting using the steps in this post, delete your application from the Azure Portal and delete the Amazon Q application from the Amazon Q console to avoid incurring costs.

Conclusion

In this post, we discussed how to configure the Amazon Q Business SharePoint Online connector using least privilege access controls that work with site-level least privileges to crawl and index SharePoint Online site content securely. We also demonstrated how to retain and apply ACLs while responding to user conversations.

Organizations can now use their existing SharePoint Online data to gain better insights, generate summaries, and get answers to natural language queries in a conversational way using Amazon Q Business. By connecting SharePoint Online as a data source, employees can interact with the organization’s knowledge and data stored in SharePoint using natural language, making it effortless to find relevant information, extract key points, and derive valuable insights. This can significantly improve productivity, decision-making, and knowledge sharing within the organization.

Try out the solution in this post, and leave your feedback and questions in the comments section.


About the Authors

Surendar GajavelliSurendar Gajavelli is a Sr. Solutions Architect based out of Nashville, TN. He is a passionate technology enthusiast who enjoys working with customers and helping them build innovative solutions.

Abhi PatlollaAbhi Patlolla is a Sr. Solutions Architect based out of the NYC region, helping customers in their cloud transformation, AI/ML, and data initiatives. He is a strategic and technical leader, advising executives and engineers on cloud strategies to foster innovation and positive impact.

Read More

Improve the productivity of your customer support and project management teams using Amazon Q Business and Atlassian Jira

Improve the productivity of your customer support and project management teams using Amazon Q Business and Atlassian Jira

Effective customer support and project management are critical aspects of providing effective customer relationship management. Atlassian Jira, a platform for issue tracking and project management functions for software projects, has become an indispensable part of many organizations’ workflows to ensure success of the customer and the product. However, extracting valuable insights from the vast amount of data stored in Jira often requires manual efforts and building specialized tooling. Users such as support engineers, project managers, and product managers need to be able to ask questions about a project, issue, or customer in order to provide excellence in their support for customers’ needs. Generative AI provides the ability to take relevant information from a data source and provide well-constructed answers back to the user.

Building a generative AI-based conversational application that is integrated with the data sources that contain the relevant content an enterprise requires time, money, and people. You first need to build connectors to the data sources. Next, you need to index this data to make it available for a Retrieval Augmented Generation (RAG) approach, where relevant passages are delivered with high accuracy to a large language model (LLM). To do this, you need to select an index that provides the capabilities to index the content for semantic and vector search, build the infrastructure to retrieve and rank the answers, and build a feature-rich web application. You also need to hire and staff a large team to build, maintain, and manage such a system.

Amazon Q Business is a fully managed generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Amazon Q Business can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take action using the data and expertise found in your company’s information repositories, code, and enterprise systems (such as Jira, among others). Amazon Q provides out-of-the-box native data source connectors that can index content into a built-in retriever and uses an LLM to provide accurate, well-written answers. A data source connector is a component of Amazon Q that helps integrate and synchronize data from multiple repositories into one index.

Amazon Q Business offers multiple prebuilt connectors to a large number of data sources, including Atlassian Jira, Atlassian Confluence, Amazon Simple Storage Service (Amazon S3), Microsoft SharePoint, Salesforce, and many more, and helps you create your generative AI solution with minimal configuration. For a full list of Amazon Q Business supported data source connectors, see Amazon Q Business connectors.

In this post, we walk you through configuring and integrating Amazon Q for Business with Jira to enable your support, project management, product management, leadership, and other teams to quickly get accurate answers to their questions related to the content in Jira projects, issues, and more.

Find accurate answers from content in Jira using Amazon Q Business

After you integrate Amazon Q Business with Jira, users can ask questions from the description of the document. This enables the following use cases:

  • Natural language search – Users can search for tasks, issues, or other project-related information using conversational language, making it straightforward to find the desired data without having to remember specific keywords or filters
  • Summarization – Users can request a concise summary of all issues, tasks, or other entities matching their search query, allowing them to quickly grasp the key points without having to sift through individual document descriptions manually
  • Query clarification – If a user’s query is ambiguous or lacks sufficient context, Amazon Q Business can engage in a dialogue to clarify the intent, so the user receives the most relevant and accurate results

Overview of Jira connector for Amazon Q Business

To crawl and index contents in Jira, you can configure the Amazon Q Business Jira connector as a data source in your Amazon Q business application. When you connect Amazon Q Business to a data source and initiate the sync process, Amazon Q Business crawls and indexes documents from the data source into its index.

Types of documents

In Amazon Q Business, a document is a unit of data. Let’s look at what are considered as documents in the context of Amazon Q Business Jira connector. A document is a collection of information that consists of a title, the content (or the body), metadata (data about the document) and access control list (ACL) information to make sure answers are provided from documents that the user has access to.

The Amazon Q Business Jira connector supports crawling of the following entities in Jira:

  • Projects – Each project is considered a single document
  • Issues – Each issue is considered a single document
  • Comments – Each comment is considered a single document
  • Attachments – Each attachment is considered a single document
  • Worklogs – Each worklog is considered a single document

Additionally, Jira users can create custom objects and custom metadata fields. Amazon Q supports the crawling and indexing of these custom objects and custom metadata.

Amazon Q Business Jira connector also supports the indexing of a rich set of metadata from the various entities in Jira. It further provides the ability to map these source metadata fields to Amazon Q index fields for indexing this metadata. These field mappings allow you to map Jira field names to Amazon Q index field names. There are three types of metadata fields that Amazon Q connectors support:

  • Default fields – These are required with each document, such as the title, creation date, author, and so on.
  • Optional fields – These are provided by the data source. The administrator can optionally choose one or more of these fields if they contain important and relevant information to obtain accurate answers.
  • Custom metadata fields – These are fields created in the data source in addition to what the data source already provides.

Refer to Jira data source connector field mappings for more information.

Authentication

Before you index the content from Jira, you need to establish a secure connection between the Amazon Q Business connector for Jira with your Jira cloud instance. To establish a secure connection, you need to authenticate with the data source. You can authenticate Amazon Q Business to Jira using basic authentication with a Jira ID and Jira API token.

To authenticate using basic authentication, you create a secret using AWS Secrets Manager with your Jira ID and Jira API token. If you use the AWS Management Console, you can choose to create a new secret or use an existing one. If you use the API, you must provide the Amazon Resource Name (ARN) of an existing secret when you use the CreateDataSource operation.

Refer to Manage API tokens for your Atlassian account for more information on creating and managing API tokens in Jira.

Secure querying with ACL crawling, identity crawling, and user store

Secure querying is a critical feature that makes sure users receive answers only from documents they’re authorized to access. Amazon Q Business implements this security measure through a two-step process. First, it indexes ACLs associated with each document. This indexing is vital for data security, because any document without an ACL is treated as public. Second, when a query is made, the system considers both the user’s credentials (typically their email address) and the query content. This dual-check mechanism means that the results are not only relevant to the query but also confined to documents the user has permission to view. By using ACLs and user authentication, Amazon Q Business maintains a robust barrier against unauthorized data access while delivering pertinent information to users.

If you need to index documents without ACLs, you must make sure they’re explicitly marked as public in your data source. Refer to Allow anonymous access to projects to enable public access to documents. Refer to How Amazon Q business connector crawls Jira ACLs for more information about crawling Jira ACLs.

Solution overview

In this post, we walk through the steps to configure a Jira connector for Amazon Q Business application. We use an existing Amazon Q application and configure the Jira connector to sync data from specific Jira projects and issue types, map relevant Jira fields to the Amazon Q index, initiate the data sync, and then query the ingested Jira data using Amazon Q’s web experience.

As part of querying Jira documents using Amazon Q Business application, we demonstrate how to ask natural language questions on your Jira issues, projects, and other issue types and get back relevant results and insights using Amazon Q Business.

Prerequisites

You should have the following:

Configure the Jira connector for an Amazon Q Business application

Complete the following steps to configure the connector:

  1. On the Amazon Q Business console, choose Applications in the navigation pane.
  2. Select the application that you want to add the Jira connector to.
  3. On the Actions menu, choose Edit.

Edit Amazon Q application

  1.  On the Update application page, leave all values as default and choose Update.

Update Amazon Q application

  1. On the Update retriever page, leave all values as default and choose Next.

Update the retriever

  1. On the Connect data sources page, on the All tab, search for Jira in the search field.
  2. Choose the plus sign on the Jira connector.

Add Jira connector

  1. In the Name and description section, enter a name and description.
  2. In the Source section, enter your company’s Jira account URL in https://yourcompany.atlassian.net/

Enter Jira domain

  1. In the Authentication section, choose Create and add new secret.
  2. Enter a name for your Secrets Manager secret.
  3. For Jira ID, enter the user name for the API token.
  4. For Password/Token, enter the API token details.
  5. Choose Save.

See Manage API tokens for your Atlassian account for details on how to create an API token.

Save Jira authentication

  1. In the IAM role section, for IAM role, choose Create a new service role (recommended).

Create IAM role

  1. In the Sync Scope section, you can select All projects or Only specific projects.
  2. By default, the Jira connector indexes all content from the projects. Optionally, you can choose to sync only specific Jira entities by selecting the appropriate options under Additional configuration.

Select sync scope

  1. In the Sync mode section, choose New, modified, or deleted content sync.

Select sync mode

  1. In the Sync run schedule section, choose your desired frequency. For this post, we choose Run on demand.
  2. Choose Add data source wait for the retriever to be created.

Select run schedule

After the data source is created, you’re redirected to the Connect data sources page to add more data sources as needed.

  1. For this walkthrough, choose Next.
  2. On the Update groups and users page, choose Add groups and users.

The users and groups that you add in this section are from the AWS IAM Identity Center users and groups set up by your administrator.

Add users to application

  1. In the Add or assign users and groups pop-up, select Assign existing users and groups to add existing users configured in your connected IAM Identity Center. Optionally, if you have permissions to add users, you can select Add new users.
  2. Choose Next.

Assign existing users

  1. In the Assign users and groups pop-up, search for users by user display name or groups by group name.
  2. Choose the users or groups you want you add and choose Assign.

This closes the pop-up. The groups and users that you added should now be available on the Groups or Users tabs.

Search for Users

For each group or user entry, an Amazon Q Business subscription tier needs to be assigned.

  1. To enable subscription for a group, on the Update groups and users page, choose the Groups (If individual users need to be assigned a subscription, choose the Users tab).
  2. Under the Current subscription column, choose Choose subscription and choose a subscription (Q Business Lite or Q Business Pro).
  3. Choose Update application to complete adding and setting up the Jira data connector for Amazon Q Business.

Assign a subscription

Configure Jira field mappings

To help you structure data for retrieval and chat filtering, Amazon Q Business crawls data source document attributes or metadata and maps them to fields in your Amazon Q index. Amazon Q has reserved fields that it uses when querying your application. When possible, Amazon Q automatically maps these built-in fields to attributes in your data source.

If a built-in field doesn’t have a default mapping, or if you want to map additional index fields, use the custom field mappings to specify how a data source attribute maps to your Amazon Q application.

  1. On the Amazon Q Business console, choose your application.
  2. Under Data sources, select your data source.
  3. On the Actions menu, choose Edit.

Edit data source

  1. In the Field mappings section, select the required fields to crawl under Projects, Issues, and any other issue types that are available and choose

When selecting all items, make sure you navigate through each page by choosing the page numbers and selecting Select All on every page to include all mapped items.

Edit field mapping

The Jira connector setup for Amazon Q is now complete. To test the connectivity to Jira and initiate the data synchronization, choose Sync now. The initial sync process may take several minutes to complete.

When the sync is complete, on the Sync history tab, you can see the sync status along with a summary of how may total items were added, deleted, modified, and failed during the sync process.

Query Jira data using the Amazon Q web experience

Now that the data synchronization is complete, you can start exploring insights from Amazon Q. In the newly created Amazon Q application, choose Customize web experience to open a new tab with a preview of the UI and options to customize as per your needs.

You can customize the Title, Subtitle, and Welcome message fields according to your needs, which will be reflected in the UI.

Configure web experience

For this walkthrough, we use the defaults and choose View web experience to be redirected to the login page for the Amazon Q application.

Log in to the application using the credentials for the user that were added to the Amazon Q application. After the login is successful, you’re redirected to the Amazon Q assistant UI, where you can ask questions using natural language and get insights from your Jira index.

Login to Amazon Q application

The Jira data source connected to this Amazon Q application has a sample IT software management project with tasks related to the project launch and related issues. We demonstrate how the Amazon Q application lets you ask questions on issues within this project using natural language and receive responses and insights for those queries.

Let’s begin by asking Amazon Q to provide a list of the top three challenges encountered during the project launch. The following screenshot displays the response, listing the top three documents associated with launch issues. The response also includes Sources, which contain links to all the matching documents. Choosing any of those links will redirect you to the corresponding Jira page with the relevant issue or task.

Query launch related issues

For the second query, we ask Amazon Q if there were any website-related issues. The following screenshot displays the response, which includes a summary of website-related issues along with corresponding Jira ticket links.

Query website issues

Frequently asked questions

In this section, we provide guidance to frequently asked questions.

Amazon Q Business is unable to answer your questions

If you get the response “Sorry, I could not find relevant information to complete your request,” this may be due to a few reasons:

  • No permissions – ACLs applied to your account don’t allow you to query certain data sources. If this is the case, reach out to your application administrator to make sure your ACLs are configured to access the data sources.
  • Data connector sync failed – Your data connector may have failed to sync information from the source to the Amazon Q Business application. Verify the data connector’s sync run schedule and sync history to confirm the sync is successful.
  • Empty or private Jira projects – Private or empty projects aren’t crawled during the sync run.

If none of these reasons apply to your use case, open a support case and work with your technical account manager to get this resolved.

How to generate responses from authoritative data sources

If you want Amazon Q Business to only generate responses from authoritative data sources, you can configure this using the Amazon Q Business application global controls under Admin controls and guardrails.

  1. Log in to the Amazon Q Business console as an Amazon Q Business application administrator.
  2. Navigate to the application and choose Admin controls and guardrails in the navigation pane.
  3. Choose Edit in the Global controls section to set these options.

For more information, refer to Admin controls and guardrails in Amazon Q Business.

Admin Controls & Guardrails

Amazon Q Business responds using old (stale) data even though your data source is updated

Each Amazon Q Business data connector can be configured with a unique sync run schedule frequency. Verifying the sync status and sync schedule frequency for your data connector reveals when the last sync ran successfully. It could be that your data connector’s sync run schedule is either set to sync at a scheduled time of day, week, or month. If it’s set to run on demand, the sync has to be manually invoked. When the sync run is complete, verify the sync history to make sure the run has successfully synced all new issues. Refer to Sync run schedule for more information about each option.

Check run schedule

Check sync history

Clean up

To prevent incurring additional costs, it’s essential to clean up and remove any resources created during the implementation of this solution. Specifically, you should delete the Amazon Q application, which will consequently remove the associated index and data connectors. However, any IAM roles and secrets created during the Amazon Q application setup process need to be removed separately. Failing to clean up these resources may result in ongoing charges, so it’s crucial to take the necessary steps to completely remove all components related to this solution.

Complete the following steps to delete the Amazon Q application, secret, and IAM role:

  1. On the Amazon Q Business console, select the application that you created.
  2. On the Actions menu, choose Delete and confirm the deletion.

Delete Amazon Q application

  1. On the Secrets Manager console, select the secret that was created for the Jira connector.
  2. On the Actions menu, choose Delete.
  3. Select the waiting period as 7 days and choose Schedule deletion.

Schedule Secrets deletion

  1. On the IAM console, select the role that was created during the Amazon Q application creation.
  2. Choose Delete and confirm the deletion.

Conclusion

The Amazon Q Jira connector allows organizations to seamlessly integrate their Jira projects, issues, and data into the powerful generative AI capabilities of Amazon Q. By following the steps outlined in this post, you can quickly configure the Jira connector as a data source for Amazon Q and initiate synchronization of your Jira information. The native field mapping options enable you to customize exactly which Jira data to include in the Amazon Q index.

Amazon Q can serve as a powerful assistant capable of providing rich insights and summaries about your Jira projects and issues from natural language queries. The Jira plugin further extends this functionality by allowing users to create new Jira issues from within the AI assistant interface.

The Amazon Q Jira integration represents a valuable tool for software teams to gain AI-driven visibility into their development workflows and pain points. By bridging Jira’s industry-leading project management with Amazon’s cutting-edge generative AI, teams can drive productivity, make better informed decisions, and unlock deeper insights into their software operations. As generative AI continues advancing, integrations like this will become critical for organizations aiming to deliver streamlined, data-driven software development lifecycles.

To learn more about the Amazon Q connector for Jira, refer to Connecting Jira to Amazon Q Business.


About the Authors

Praveen Chamarthi is a Senior AI/ML Specialist with Amazon Web Services. He is passionate about AI/ML and all things AWS. He helps customers across the Americas scale, innovate, and operate ML workloads efficiently on AWS. In his spare time, Praveen loves to read and enjoys sci-fi movies.

Srikanth Reddy is a Senior AI/ML Specialist with Amazon Web Services. He is responsible for providing deep, domain specific expertise to enterprise customers, helping them leverage AWS’s AI and ML capabilities to their fullest potential.

Ge Jiang is a Software Development Engineer Manager in the Amazon Q and Amazon Kendra organization of Amazon Web Services. She is responsible for the design and development of features for the Amazon Q and Amazon Kendra connectors.

Vijai Gandikota is a Principal Product Manager in the Amazon Q and Amazon Kendra organization of Amazon Web Services. He is responsible for the Amazon Q and Amazon Kendra connectors, ingestion, security, and other aspects of the Amazon Q and Amazon Kendra services.

Read More

Apple Intelligence Foundation Language Models

We present foundation language models developed to power Apple Intelligence features, including a ∼3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used to train the model, the training process, how the models are optimized for inference, and the evaluation results. We highlight our focus on Responsible AI and how the principles are applied…Apple Machine Learning Research

DataComp-LM: In Search of the Next Generation of Training Sets for Language Models

This paper was accepted at the NeurIPS Datasets and Benchmarks Workshop at NeurIPS 2024
We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B…Apple Machine Learning Research

Amazon SageMaker inference launches faster auto scaling for generative AI models

Amazon SageMaker inference launches faster auto scaling for generative AI models

Today, we are excited to announce a new capability in Amazon SageMaker inference that can help you reduce the time it takes for your generative artificial intelligence (AI) models to scale automatically. You can now use sub-minute metrics and significantly reduce overall scaling latency for generative AI models. With this enhancement, you can improve the responsiveness of your generative AI applications as demand fluctuates.

The rise of foundation models (FMs) and large language models (LLMs) has brought new challenges to generative AI inference deployment. These advanced models often take seconds to process, while sometimes handling only a limited number of concurrent requests. This creates a critical need for rapid detection and auto scaling to maintain business continuity. Organizations implementing generative AI seek comprehensive solutions that address multiple concerns: reducing infrastructure costs, minimizing latency, and maximizing throughput to meet the demands of these sophisticated models. However, they prefer to focus on solving business problems rather than doing the undifferentiated heavy lifting to build complex inference platforms from the ground up.

SageMaker provides industry-leading capabilities to address these inference challenges. It offers endpoints for generative AI inference that reduce FM deployment costs by 50% on average and latency by 20% on average by optimizing the use of accelerators. The SageMaker inference optimization toolkit, a fully managed model optimization feature in SageMaker, can deliver up to two times higher throughput while reducing costs by approximately 50% for generative AI performance on SageMaker. Besides optimization, SageMaker inference also provides streaming support for LLMs, enabling you to stream tokens in real time rather than waiting for the entire response. This allows for lower perceived latency and more responsive generative AI experiences, which are crucial for use cases like conversational AI assistants. Lastly, SageMaker inference provides the ability to deploy a single model or multiple models using SageMaker inference components on the same endpoint using advanced routing strategies to effectively load balance to the underlying instances backing an endpoint.

Faster auto scaling metrics

To optimize real-time inference workloads, SageMaker employs Application Auto Scaling. This feature dynamically adjusts the number of instances in use and the quantity of model copies deployed, responding to real-time changes in demand. When in-flight requests surpass a predefined threshold, auto scaling increases the available instances and deploys additional model copies to meet the heightened demand. Similarly, as the number of in-flight requests decreases, the system automatically removes unnecessary instances and model copies, effectively reducing costs. This adaptive scaling makes sure resources are optimally utilized, balancing performance needs with cost considerations in real time.

With today’s launch, SageMaker real-time endpoints now emit two new sub-minute Amazon CloudWatch metrics: ConcurrentRequestsPerModel and ConcurrentRequestsPerCopy. ConcurrentRequestsPerModel is the metric used for SageMaker real-time endpoints; ConcurrentRequestsPerCopy is used when SageMaker real-time inference components are used.

These metrics provide a more direct and accurate representation of the load on the system by tracking the actual concurrency or the number of simultaneous requests being handled by the containers (in-flight requests), including the requests queued inside the containers. The concurrency-based target tracking and step scaling policies focus on monitoring these new metrics. When the concurrency levels increase, the auto scaling mechanism can respond by scaling out the deployment, adding more container copies or instances to handle the increased workload. By taking advantage of these high-resolution metrics, you can now achieve significantly faster auto scaling, reducing detection time and improving the overall scale-out time of generative AI models. You can use these new metrics for endpoints created with accelerator instances like AWS Trainium, AWS Inferentia, and NVIDIA GPUs.

In addition, you can enable streaming responses back to the client on models deployed on SageMaker. Many current solutions track a session or concurrency metric only until the first token is sent to the client and then mark the target instance as available. SageMaker can track a request until the last token is streamed to the client instead of until the first token. This way, clients can be directed to instances to GPUs that are less busy, avoiding hotspots. Additionally, tracking concurrency also helps you make sure requests that are in-flight and queued are treated alike for alerting on the need for auto scaling. With this capability, you can make sure your model deployment scales proactively, accommodating fluctuations in request volumes and maintaining optimal performance by minimizing queuing delays.

In this post, we detail how the new ConcurrentRequestsPerModel and ConcurrentRequestsPerCopy CloudWatch metrics work, explain why you should use them, and walk you through the process of implementing them for your workloads. These new metrics allow you to scale your LLM deployments more effectively, providing optimal performance and cost-efficiency as the demand for your models fluctuates.

Components of auto scaling

The following figure illustrates a typical scenario of how a SageMaker real-time inference endpoint scales out to handle an increase in concurrent requests. This demonstrates the automated and responsive nature of scaling in SageMaker. In this example, we walk through the key steps that occur when the inference traffic to a SageMaker real-time endpoint starts to increase and concurrency to the model deployed on every instance goes up. We show how the system monitors the traffic, invokes an auto scaling action, provisions new instances, and ultimately load balances the requests across the scaled-out resources. Understanding this scaling process is crucial for making sure your generative AI models can handle fluctuations in demand and provide a seamless experience for your customers. By the end of this walkthrough, you’ll have a clear picture of how SageMaker real-time inference endpoints can automatically scale to meet your application’s needs.

Let’s dive into the details of this scaling scenario using the provided figure.

The key steps are as follows:

  1. Increased inference traffic (t0) – At some point, the traffic to the SageMaker real-time inference endpoint starts to increase, indicating a potential need for additional resources. The increase in traffic leads to a higher number of concurrent requests required for each model copy or instance.
  2. CloudWatch alarm monitoring (t0 → t1) – An auto scaling policy uses CloudWatch to monitor metrics, sampling it over a few data points within a predefined time frame. This makes sure the increased traffic is a sustained change in demand, not a temporary spike.
  3. Auto scaling trigger (t1) – If the metric crosses the predefined threshold, the CloudWatch alarm goes into an InAlarm state, invoking an auto scaling action to scale up the resources.
  4. New instance provisioning and container startup (t1 → t2) – During the scale-up action, new instances are provisioned if required. The model server and container are started on the new instances. When the instance provisioning is complete, the model container initialization process begins. After the server successfully starts and passes the health checks, the instances are registered with the endpoint, enabling them to serve incoming traffic requests.
  5. Load balancing (t2) – After the container health checks pass and the container reports as healthy, the new instances are ready to serve inference requests. All requests are now automatically load balanced between the two instances using the pre-built routing strategies in SageMaker.

This approach allows the SageMaker real-time inference endpoint to react quickly and handle the increased traffic with minimal impact to the clients.

Application Auto Scaling supports target tracking and step scaling policies. Each have their own logic to handle scale-in and scale-out:

  • Target tracking works to scale out by adding capacity to reduce the difference between the metric value (ConcurrentRequestsPerModel/Copy) and the target value set. When the metric (ConcurrentRequestsPerModel/Copy) is below the target value, Application Auto Scaling scales in by removing capacity.
  • Step scaling works to scales capacity using a set of adjustments, known as step adjustments. The size of the adjustment varies based on the magnitude of the metric value (ConcurrentRequestsPerModel/Copy)/alarm breach.

By using these new metrics, auto scaling can now be invoked and scale out significantly faster compared to the older SageMakerVariantInvocationsPerInstance predefined metric type. This decrease in the time to measure and invoke a scale-out allows you to react to increased demand significantly faster than before (under 1 minute). This works especially well for generative AI models, which are typically concurrency-bound and can take many seconds to complete each inference request.

Using the new high-resolution metrics allow you to greatly decrease the time it takes to scale up an endpoint using Application Auto Scaling. These high-resolution metrics are emitted at 10-second intervals, allowing for faster invoking of scale-out procedures. For models with less than 10 billion parameters, this can be a significant percentage of the time it takes for an end-to-end scaling event. For larger model deployments, this can be up to 5 minutes shorter before a new copy of your FM or LLM is ready to service traffic.

Get started with faster auto scaling

Getting started with using the metrics is straightforward. You can use the following steps to create a new scaling policy to benefit from faster auto scaling. In this example, we deploy a Meta Llama 3 model that has 8 billion parameters on a G5 instance type, which uses NVIDIA A10G GPUs. In this example, the model can fit entirely on a single GPU and we can use auto scaling to scale up the number of inference components and G5 instances based on our traffic. The full notebook can be found on the GitHub for SageMaker Single Model Endpoints and SageMaker with inference components.

  1. After you create your SageMaker endpoint, you define a new auto scaling target for Application Auto Scaling. In the following code block, you set as_min_capacity and as_max_capacity to the minimum and maximum number of instances you want to set for your endpoint, respectively. If you’re using inference components (shown later), you can use instance auto scaling and skip this step.
    autoscaling_client = boto3.client("application-autoscaling", region_name=region)
    
    # Register scalable target
    scalable_target = autoscaling_client.register_scalable_target(
        ServiceNamespace="sagemaker",
        ResourceId=resource_id,
        ScalableDimension="sagemaker:variant:DesiredInstanceCount",
        MinCapacity=as_min_capacity,
        MaxCapacity=as_max_capacity,  # Replace with your desired maximum instances
    )

  2. After you create your new scalable target, you can define your policy. You can choose between using a target tracking policy or step scaling policy. In the following target tracking policy, we have set TargetValue to 5. This means we’re asking auto scaling to scale up if the number of concurrent requests per model is equal to or greater than five.
    # Create Target Tracking Scaling Policy
    target_tracking_policy_response = autoscaling_client.put_scaling_policy(
        PolicyName="SageMakerEndpointScalingPolicy",
        ServiceNamespace="sagemaker",
        ResourceId=resource_id,
        ScalableDimension="sagemaker:variant:DesiredInstanceCount",
        PolicyType="TargetTrackingScaling",
        TargetTrackingScalingPolicyConfiguration={
            "TargetValue": 5.0,  # Scaling triggers when endpoint receives 5 ConcurrentRequestsPerModel
            "PredefinedMetricSpecification": {
                "PredefinedMetricType": "SageMakerVariantConcurrentRequestsPerModelHighResolution"
            },
            "ScaleInCooldown": 180,  # Cooldown period after scale-in activity
            "ScaleOutCooldown": 180,  # Cooldown period after scale-out activity
        },
    )

If you would like to configure a step scaling policy, refer to the following notebook.

That’s it! Traffic now invoking your endpoint will be monitored with concurrency tracked and evaluated against the policy you specified. Your endpoint will scale up and down based on the minimum and maximum values you provided. In the preceding example, we set a cooldown period for scaling in and out to 180 seconds, but you can change this based on what works best for your workload.

SageMaker inference components

If you’re using inference components to deploy multiple generative AI models on a SageMaker endpoint, you can complete the following steps:

  1. After you create your SageMaker endpoint and inference components, you define a new auto scaling target for Application Auto Scaling:
    autoscaling_client = boto3.client("application-autoscaling", region_name=region)
    
    # Register scalable target
    scalable_target = autoscaling_client.register_scalable_target(
        ServiceNamespace="sagemaker",
        ResourceId=resource_id,
        ScalableDimension="sagemaker:inference-component:DesiredCopyCount",
        MinCapacity=as_min_capacity,
        MaxCapacity=as_max_capacity,  # Replace with your desired maximum instances
    )

  2. After you create your new scalable target, you can define your policy. In the following code, we set TargetValue to 5. By doing so, we’re asking auto scaling to scale up if the number of concurrent requests per model is equal to or greater than five.
    # Create Target Tracking Scaling Policy
    target_tracking_policy_response = autoscaling_client.put_scaling_policy(
        PolicyName="SageMakerInferenceComponentScalingPolicy",
        ServiceNamespace="sagemaker",
        ResourceId=resource_id,
        ScalableDimension="sagemaker:inference-component:DesiredCopyCount",
        PolicyType="TargetTrackingScaling",
        TargetTrackingScalingPolicyConfiguration={
            "TargetValue": 5.0,  # Scaling triggers when endpoint receives 5 ConcurrentRequestsPerCopy
            "PredefinedMetricSpecification": {
                "PredefinedMetricType": "SageMakerInferenceComponentConcurrentRequestsPerCopyHighResolution"
            },
            "ScaleInCooldown": 180,  # Cooldown period after scale-in activity
            "ScaleOutCooldown": 180,  # Cooldown period after scale-out activity
        },
    )

You can use the new concurrency-based target tracking auto scaling policies in tandem with existing invocation-based target tracking policies. When a container experiences a crash or failure, the resulting requests are typically short-lived and may be responded to with error messages. In such scenarios, the concurrency-based auto scaling policy can detect the sudden drop in concurrent requests, potentially causing an unintentional scale-in of the container fleet. However, the invocation-based policy can act as a safeguard, avoiding the scale-in if there is still sufficient traffic being directed to the remaining containers. With this hybrid approach, container-based applications can achieve a more efficient and adaptive scaling behavior. The balance between concurrency-based and invocation-based policies allows the system to respond appropriately to various operational conditions, such as container failures, sudden spikes in traffic, or gradual changes in workload patterns. This enables the container infrastructure to scale up and down more effectively, optimizing resource utilization and providing reliable application performance.

Sample runs and results

With the new metrics, we have observed improvements in the time required to invoke scale-out events. To test the effectiveness of this solution, we completed some sample runs with Meta Llama models (Llama 2 7B and Llama 3 8B). Prior to this feature, detecting the need for auto scaling could take over 6 minutes, but with this new feature, we were able to reduce that time to less than 45 seconds. For generative AI models such as Meta Llama 2 7B and Llama 3 8B, we have been able to reduce the overall end-to-end scale-out time by approximately 40%.

The following figures illustrate the results of sample runs for Meta Llama 3 8B.

The following figures illustrate the results of sample runs for Meta Llama 2 7B.

As a best practice, it’s important to optimize your container, model artifacts, and bootstrapping processes to be as efficient as possible. Doing so can help minimize deployment times and improve the responsiveness of AI services.

Conclusion

In this post, we detailed how the ConcurrentRequestsPerModel and ConcurrentRequestsPerCopy metrics work, explained why you should use them, and walked you through the process of implementing them for your workloads. We encourage you to try out these new metrics and evaluate whether they improve your FM and LLM workloads on SageMaker endpoints. You can find the notebooks on GitHub.

Special thanks to our partners from Application Auto Scaling for making this launch happen: Ankur Sethi, Vasanth Kumararajan, Jaysinh Parmar Mona Zhao, Miranda Liu, Fatih Tekin, and Martin Wang.


About the Authors

James Park is a Solutions Architect at Amazon Web Services. He works with Amazon.com to design, build, and deploy technology solutions on AWS, and has a particular interest in AI and machine learning. In h is spare time he enjoys seeking out new cultures, new experiences,  and staying up to date with the latest technology trends. You can find him on LinkedIn.

Praveen Chamarthi is a Senior AI/ML Specialist with Amazon Web Services. He is passionate about AI/ML and all things AWS. He helps customers across the Americas scale, innovate, and operate ML workloads efficiently on AWS. In his spare time, Praveen loves to read and enjoys sci-fi movies.

Dr. Changsha Ma is an AI/ML Specialist at AWS. She is a technologist with a PhD in Computer Science, a master’s degree in Education Psychology, and years of experience in data science and independent consulting in AI/ML. She is passionate about researching methodological approaches for machine and human intelligence. Outside of work, she loves hiking, cooking, hunting food, and spending time with friends and families.

Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch and spending time with his family.

Kunal Shah is a software development engineer at Amazon Web Services (AWS) with 7+ years of industry experience. His passion lies in deploying machine learning (ML) models for inference, and he is driven by a strong desire to learn and contribute to the development of AI-powered tools that can create real-world impact. Beyond his professional pursuits, he enjoys watching historical movies, traveling and adventure sports.

Marc Karp is an ML Architect with the Amazon SageMaker Service team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.

Read More

Tracing the path to self-adapting AI agents

Tracing the path to self-adapting AI agents

white line icons on blue and green gradient background

The games industry has long been a frontier of innovation for AI. In the early 2000s, programmers hand-coded neural networks to breathe life into virtual worlds (opens in new tab), creating engaging AI characters (opens in new tab) that interact with players. Fast forward two decades, neural networks have grown from their humble beginnings to colossal architectures with billions of parameters, powering real-world applications like ChatGPT (opens in new tab) and Microsoft Copilots (opens in new tab). The catalyst for this seismic shift in AI scale and capability is the advent of automatic optimization. AutoDiff frameworks like PyTorch (opens in new tab) and Tensorflow (opens in new tab) have democratized scalable gradient-based end-to-end optimization. This breakthrough has been instrumental in the development of Large Foundation Models (LFMs) that now sit at the core of AI.

Today, the AI systems we interact with are more than just neural network models. They contain intricate workflows that seamlessly integrate customized machine learning models, orchestration code, retrieval modules, and various tools and functions. These components work in concert to create the sophisticated AI experiences that have become an integral part of our digital lives. Nonetheless, up to now, we do not have tools to automatically train these extra components. They are handcrafted through extensive engineering, just like how neural networks were engineered in the early 2000s.

End-to-end automatic optimization of AI systems

The latest research from Microsoft and Stanford University introduces Trace (opens in new tab), a groundbreaking framework poised to revolutionize the automatic optimization of AI systems. Here are three highlights of the transformative potential of Trace:

  • End-to-end optimization: Trace treats AI systems as computational graphs, akin to neural networks, and optimizes them end-to-end through a generalized back-propagation approach.
  • Dynamic adaptation: It handles the dynamic nature of AI systems, where the graph can change with varying inputs and parameters and needs to adapt to various kinds of feedback.
  • Versatile applications: Trace can optimize heterogenous parameters (such as prompts and codes) in AI systems. Empirical studies showcase Trace’s ability to optimize diverse problems, including hyperparameter tuning, large language model (LLM) agents, and robot control, often outperforming specialized optimizers.

In a nutshell, Trace is a new AutoDiff-like tool for training AI systems without using gradients. This generalization is made possible by a new mathematical formulation of optimization, Optimization with Trace Oracle (OPTO), which can describe end-to-end optimization of AI systems with general feedback (such as numerical losses, natural language, and errors). Instead of propagating gradients, which are not well-defined for AI systems beyond neural networks, Trace propagates Minimal Subgraphs which can then be used to also recover gradients where applicable. Trace is implemented as a PyTorch-like Python library with which users can easily create AI systems and refine them, akin to training neural networks.

In this blog post, we are excited to announce the release of the Trace Python library (opens in new tab). With the help of demos, we’ll show you how this powerful tool can be used to build AI agents that learn and adapt from their experiences, eliminating the need for specialized engineering.

Microsoft Research blog

Microsoft at FAccT 2024: Advancing responsible AI research and practice

From studying how to identify gender bias in Hindi to uncovering AI-related risks for workers, Microsoft is making key contributions towards advancing the state of the art in responsible AI research. Check out their work at ACM FAccT 2024.


Warm up: Building a Battleship game AI agent through learning

To start, consider building an AI agent for the classic Battleship board game. In Battleship, a player needs to devise strategies to cleverly locate and attack the opponent’s ships on a hidden board as fast as possible. To build an AI agent with Trace, one simply needs to program the workflow and declare the parameters, like programming a neural network architecture. Here we will design an agent with two components: a reason function and an act function, as illustrated in Figure 1a. We provide a basic description of what these two functions should do as docstrings. We leave the functions’ content to be blank and set them to be trainable. At this point, the agent doesn’t know how the Battleship API works. It must not only learn how to play the game, but also learn how to use the unknown API.

The agent’s policy is defined as the composition of a reason step and an act step. The codes of both steps are marked as trainable and are initialized as trivial functions. A basic description of what each function is supposed to behave is provided as docstrings in the function definition.
Figure 1a: Write a Trace-trainable policy.
The agent’s policy is optimized by a simple but generic training loop, which mimics neural network training. First the agent’s policy and an iterative optimizer for it are declared. In each iteration, the agent’s policy takes a board configuration as input and outputs a target location. The environment returns feedback on whether the target successfully hits a ship or not. Alternatively, when the agent’s policy triggers any execution error, the error is used as feedback. Then the feedback is propagated to the parameters in the trainable policy for updates.
Figure 1b: Optimize using a PyTorch-like API.

We iteratively train this AI agent to play the game through a simple Python for loop, seen in Figure 1b. In each iteration, the agent (that is, policy) sees the board configuration and tries to shoot at a target location on a training board. The environment returns in text whether it’s a hit or a miss. Then, we run Trace to propagate this environment feedback through agent’s decision logic to update the parameters (for example, the policy is like a two-layer network with a reason layer and an act layer). These iterations mimic how a human programmer might approach the problem. They run the policy and change the code based on the observed feedback, try different heuristics to solve this problem, and may rewrite the code a few times to fix any execution errors by using stack traces.

In Figure 2, we show the results of this learning agent, where the agent is trained by an LLM-based optimizer OptoPrime in Trace. The performance is measured as the scores of the agent playing on new randomly generated games (different from the training board). We see that the agent understands the Battleship game and proposes the enumeration strategy after one iteration; then, after a few more tries, it starts to develop complex strategies for playing the game.

The experimental results show that Trace can quickly learn complex behaviors for Battleship in a few iterations. At iteration 0, the agent is initialized to output a constant coordinate. At iteration 1, the agent learns the simple strategy of enumerating the board. After a few more iterations (e.g., iteration 7), the agent learns a complex strategy to balance unexplored squares vs. adjacent squares to previous hits. In comparison, the state-of-the-art LLM optimizer OPRO only achieves less than 1/3 of Trace’s performance in this problem.
Figure 2: Trace optimizes Code-as-Parameter to create a complex Battleship AI from scratch, compared with state-of-the-art LLM-based optimizer OPRO.

Super-fast reinforcement learning agent for robot control

We can extend the same idea of end-to-end optimization to train more complicated AI systems. In this example, we want to learn a policy code to control a robotic manipulator. Compared to the Battleship example, the problem here has a longer horizon, since the policy would need to drive the robot for multiple time steps before receiving any feedback. Traditionally, such a problem is framed as a reinforcement learning (RL) problem, and usually learning a policy with RL requires tens of thousands of training episodes. We show Trace can be used to effectively solve such a problem, with just dozens of episodes — a 1,000 times speed-up. We trace an entire episode and perform end-to-end updates through these steps (using the same OptoPrime optimizer). In this way, effectively, Trace performs back-propagation through time (BPTT (opens in new tab)).

We conduct experiments using a simulated Sawyer robot arm in the Meta-World (opens in new tab) environment of LLF-Bench (opens in new tab), as shown in Figure 3. The agent needs to decide a target pose for the robot, which will then be used as a set point for a position controller, to perform a pick-and-place task. Each episode has 10 timesteps, which results in a graph of depth around 30. The agent receives language feedback as intermediate observations (from LLF-Bench) and finally feedback about success and episode return (i.e. cumulative reward for RL) in texts at the end. Like the Battleship example, we initialize the policy code to be a dummy function and let it adapt through interactions, demonstrated in Figure 4. We repetitively train the agent starting from one initial condition, then test it on 10 new held-out initial conditions for generalization. Very quickly, after 13 episodes, we see that the agent learns complex rules to solve the problem, as shown in Figure 3 and Figure 4.

The video shows how the robot agent performs on new configurations which are not seen during training. At iteration 0, the robot’s policy is initialized to stay at its initial position.
The video shows how the robot agent performs on new configurations which are not seen during training. At iteration 1, the robot learns to reach the goal but does not grasp the object, which leads to failure in this pick and place task.
The video shows how the robot agent performs on new configurations which are not seen during training. The robot learns to grasp the object starting from iteration 3 but fails to successfully place and drop the object at the goal correctly. Nonetheless, after dropping the object incorrectly, the robot would attempt to pick up the object and try again. This behavior continues until iteration 12.
The video shows how the robot agent performs on new configurations which are not seen during training. The robot learns to grasp the object starting from iteration 3 but fails to successfully place and drop the object at the goal correctly. Nonetheless, after dropping the object incorrectly, the robot would attempt to pick up the object and try again. This behavior continues until iteration 12.
The video shows how the robot agent performs on new configurations which are not seen during training. At iteration 13, the robot learns a generalizable policy to perform pick and place successfully.

Figure 3: Trace rapidly learns a robot controller in the MetaWorld simulated environment, that generalizes to new initial conditions. The video shows Trace learns a policy to successfully perform the pick-place task after 13 episodes.From left to right, iteration 0, iteration 1, iteration 3, iteration 9, iteration 13.

The robot’s control policy is initialized to simply output a zero vector, which would make the robot stay at the initial configuration.
Initial control code
The control policy learned after 13 iterations is complex decision logic, with many rules to decide when to grasp, how to grasp, and when to released. The decision boundary is never told to the robot and is learned through trial and error in the environment.
Learned control code after 13 episodes 

Figure 4. Trace adapts an initial dummy control policy into a complex, generalizable control policy.

Finale: Self-adapting multi-agent LLM systems

Trace is not limited to code optimization. The Trace framework supports optimizing heterogenous parameters, including codes, prompts, and hyperparameters. Here we demonstrate Trace’s ability to optimize prompts of multiple LLM agents in solving complex household tasks in the VirtualHome (opens in new tab) simulated environment. 

Many tasks require multi-agent collaboration to solve efficiently. But crafting the right prompts for multiple LLM agents requires careful engineering. Trace can seamlessly optimize agents’ behaviors based on environmental feedback. Trace automatically constructs the interaction graph of agents and updates each agent’s behavior factoring in the behavior of other agents. Then the agents can automatically evolve to acquire specialized capabilities such as behavioral roles, freeing system designers from the painstaking process of hand-tuning multiple LLM prompts.

We use Trace and OptoPrime to improve ReAct agents that have been carefully orchestrated (opens in new tab) to complete the VirtualHome tasks. IIn each step, the agent can interact with the environment (like opening a cabinet) or send a message to another agent when they see each other. We declare the plan of each LLM-based agent (a part of their prompt) as a trainable parameter and use reward as feedback. The experimental results are shown in Figure 5 where agents optimized by Trace can complete the tasks using fewer actions and environment interactions. We observed fascinating emergent pro-social behaviors from agents without being explicitly told to communicate as illustrated in Figure 6. This pro-social interaction behavior changes with different tasks. For example, agents did not communicate with each other for the task of “book reading,” but they collaborated when asked to “put forks and plates into a dishwasher,” which we show in Figure 7. We also observed other patterns such as role specialization, where one agent became the lead in a given task, and was followed by another agent to assist.

The multi agent system optimized by Trace requires a smaller number of steps to complete each task (Read Book from 22 to 10 steps; Put Dishwasher from 21 to 19 steps; Prepare Food from 21 to 18 steps).
Figure 5: We show the number of environmental interaction actions taken to succeed in each task. Trace optimized agent takes fewer steps to succeed, thus more efficient in this environment.
The video shows example behaviors of the agents in the three tasks in VirtualHome.
The video shows example behaviors of the agents in the three tasks in VirtualHome.
The video shows example behaviors of the agents in the three tasks in VirtualHome.

Figure 6: Demo videos of how Trace agents behave to finish each of the three tasks.

[send_message]  to : I am handing you the . Please grab another piece of cutlery or plate to help! 
[send_message]  to : Can you also hand me the  you are holding?
[send_message]  to : Here's the . I'll go grab the  now. 
...
[send_message]  to : Let's head to the kitchen and put the  and  into the dishwasher.

Figure 7: Trace learns pro-social behavior in the Dishwasher task. Trace optimized agents send messages to attempt to collaborate while simple ReAct agent will only carry out the tasks.

Trace heralds a new era of interactive agents that adapt automatically using various feedback types. This innovation could be the key to unlocking the full potential of AI systems, making them more efficient and responsive than ever before. After witnessing the awesome power of Deep Neural Networks, stay tuned for the next revolution in AI design — Deep Agent Networks!

The post Tracing the path to self-adapting AI agents appeared first on Microsoft Research.

Read More

Find answers accurately and quickly using Amazon Q Business with the SharePoint Online connector

Find answers accurately and quickly using Amazon Q Business with the SharePoint Online connector

Amazon Q Business is a fully managed, generative artificial intelligence (AI)-powered assistant that helps enterprises unlock the value of their data and knowledge. With Amazon Q, you can quickly find answers to questions, generate summaries and content, and complete tasks by using the information and expertise stored across your company’s various data sources and enterprise systems. At the core of this capability are native data source connectors that seamlessly integrate and index content from multiple repositories into a unified index. This enables the Amazon Q large language model (LLM) to provide accurate, well-written answers by drawing from the consolidated data and information. The data source connectors act as a bridge, synchronizing content from disparate systems like Salesforce, Jira, and SharePoint into a centralized index that powers the natural language understanding and generative abilities of Amazon Q.

To make this integration process as seamless as possible, Amazon Q Business offers multiple pre-built connectors to a wide range of data sources, including Atlassian Jira, Atlassian Confluence, Amazon Simple Storage Service (Amazon S3), Microsoft SharePoint, Salesforce, and many more. This allows you to create your generative AI solution with minimal configuration. For a full list of Amazon Q supported data source connectors, see Supported connectors.

One of the key integrations for Amazon Q is with Microsoft SharePoint Online. SharePoint is a widely used collaborative platform that allows organizations to manage and share content, knowledge, and applications to improve productivity and decision-making. By integrating Amazon Q with SharePoint, businesses can empower their employees to access information and insights from SharePoint more efficiently and effectively.

With the Amazon Q and SharePoint Online integration, business users can do the following:

  • Get instant answers – Users can ask natural language questions and Amazon Q will provide accurate, up-to-date answers by searching and synthesizing information from across the organization’s SharePoint sites and content.
  • Accelerate research and analysis – Instead of manually searching through SharePoint documents, users can use Amazon Q to quickly find relevant information, summaries, and insights to support their research and decision-making.
  • Streamline content creation – Amazon Q can assist in generating drafts, outlines, and even complete content pieces (such as reports, articles, or presentations) by drawing on the knowledge and data stored in SharePoint.
  • Automate workflows and tasks – Amazon Q can be configured to complete routine tasks and queries (such as generating status reports, answering FAQs, or requesting information) by interacting with the relevant SharePoint data and applications.
  • Enhance collaboration – By making SharePoint content more accessible and actionable through Amazon Q, the integration facilitates better knowledge sharing, problem-solving, and collaboration across the organization.

In this post, we guide you through the process of setting up the SharePoint Online connector in Amazon Q Business. This will enable your organization to use the power of generative AI to unlock the full value of your SharePoint investment and empower your workforce to work smarter and more efficiently.

Find accurate answers from content in Microsoft SharePoint using Amazon Q Business

After you integrate Amazon Q Business with Microsoft SharePoint, users can ask questions from the body of the document. For this post, we use a SharePoint Online site named HR Policies that has information about the travel policy, state disability insurance policy, payroll taxes, and paid family leave program for California stored in document libraries. Some of the questions you can ask Amazon Q Business might include the following:

  • Is there a leave plan in California for new parents?
  • Can I claim disability insurance during this time?
  • Before applying for leave, I want to submit my submit expense report, how can I do it?
  • Is there any limit on spending on a business trip?
  • How can I calculate UI and ETT?

Overview of the data source

SharePoint is a website-based collaboration system that is used as a secure place to store, organize, share, and access information from any device. SharePoint empowers teamwork with dynamic and productive team sites for every project team, department, and division.

SharePoint is available in two options: SharePoint Server and SharePoint Online. SharePoint Server is a locally hosted platform that your company owns and operates. You’re responsible for everything from server architecture, active directory, to file storage. SharePoint Server 2016, SharePoint Server 2019, and SharePoint Server Subscription Edition are the active SharePoint Server releases. SharePoint Online is a cloud-based service provided directly from Microsoft. They take care of identity management architecture, and site management. SharePoint Sever and SharePoint Online contain pages, files, attachments, links, events, and comments that can be crawled by Amazon Q SharePoint connectors for SharePoint Server and SharePoint Online.

SharePoint Online and SharePoint Server offer a site content space where site owners can view a list of all pages, libraries, and lists for their site. The site content space also provides access to add lists, pages, document libraries, and more.

HR Policies SharePoint Site document library folder structure

Pages are the contents stored on webpages; these are meant to display information to the end-user.

SharePoint Site Pages

A document library provides a secure place to store files where you and your coworkers can find them easily. You can work on them together and access them from any device at any time.

document library with files

A list is one of the data storage mechanisms within SharePoint. It provides the UI to view the items in a list. You can add, edit, and delete items or view individual items.

SharePoint List

Overview of the SharePoint Online connector for Amazon Q Business

To crawl and index contents from SharePoint Online, you can configure the Amazon Q Business SharePoint Online connector as a data source in your Amazon Q business application. When you connect Amazon Q Business to a data source and initiate the sync process, Amazon Q Business crawls and indexes documents from the data source into its index.

Let’s look at what are considered as documents in the context of Amazon Q business SharePoint Online connector. A document is a collection of information that consists of a title, the content (or the body), metadata (data about the document), and access control list (ACL) information to make sure answers are provided from documents that the user has access to.

The following entities in SharePoint are crawled and indexed as documents along with their metadata and access control information:

  • Files
  • Events
  • Pages
  • Links
  • Attachments
  • Comments

Amazon Q Business crawls data source document attributes or metadata and maps them to fields in your Amazon Q index. Refer to Amazon Q Business SharePoint Online data source connector field mappings for more details.

Configure and prepare the Amazon Q connector

Before you index the content from Microsoft SharePoint online, your need to first establish a secure connection between the Amazon Q Business connector for SharePoint Online with your SharePoint Online instance. To establish a secure connection, you need to authenticate with the data source.

The following are the supported authentication mechanisms for the SharePoint connector:

  • Basic Authentication
  • OAuth 2.0 with Resource Owner Password Credentials Flow
  • Azure AD App-Only (OAuth 2.0 Certificate)
  • SharePoint App-Only with Client Credentials Flow
  • OAuth 2.0 with Refresh Token Flow

Secure querying with ACL crawling, identity crawling, and user store

Secure querying is when a user runs a query and is returned answers from documents that the user has access to and not from documents that the user does not have access to. To enable users to do secure querying, Amazon Q Business honors ACLs of the documents. Amazon Q Business does this by first supporting the indexing of ACLs. Indexing documents with ACLs is crucial for maintaining data security, because documents without ACLs are considered public. At query time, the user’s credentials (email address) are passed along with the query so that answers from documents that are relevant to the query and which the user is authorized to access are displayed.

A document’s ACL contains information such as the user’s email address and the local groups or federated groups (if Microsoft SharePoint is integrated with an identity provider (IdP) such as Azure Active Directory/Entra ID) that have access to the document. The SharePoint online data source can be optionally connected to an IdP such as Okta or Microsoft Entra ID. In this case, the documents in SharePoint Online can have the federated group information.

When a user logs in to a web application to conduct a search, the user’s credentials (such as an email address) need to match that’s in the ACL of the document to return results from that document. The web application that the user uses to retrieve answers would be connected to an IdP or AWS IAM Identity Center. The user’s credentials from the IdP or IAM Identity Center are referred to here as the federated user credentials. The federated user credentials such as the email address are passed along with the query so that Amazon Q can return the answers from the documents that this user has access to. However, sometimes this user’s federated credentials may not be present in the SharePoint Online data source or the SharePoint document’s ACLs. Instead, the user’s local user alias, local groups that this local user alias is a part of, or the federated groups that the federated user is a part of are available in the document’s ACL. Therefore, there is a need to map the federated user credential to the local user alias, local groups, or federated groups in the document ACL.

To map this federated user’s email address to the local user aliases, local groups, or federated groups, certain Amazon Q Business connectors, including the SharePoint Online connector, provide an identity crawler to load the identity information (local user alias, local groups, federated groups, and their mappings, along with any other mappings to a federated user) from the connected data sources into a user store. At query time, Amazon Q Business retrieves the associated local user aliases, local groups, and any federated groups from the user store and uses that along with the query for securely retrieving passages from documents that the user has access to.

If you need to index documents without ACLs, you must make sure they’re explicitly marked as public in your data source.

Refer to How Amazon Q Business connector crawls SharePoint (Online) ACLs for more details.

Amazon Q indexes the documents with ACLs and sets the user’s email address or user principal name for the user and the group name [site URL hash value | group name] for the local group in the ACL. If the SharePoint Online data source is connected to an IdP such as Azure AD/Entra ID or Okta, the AD group name visible in the SharePoint site is set as the federated group ACL. The identity crawler sets these the same as the principals along with the available mappings in the user store. Any additional mappings need to be set in the user store using the user store APIs.

Overview of solution

This post presents the steps to create a certificate and private key, configure Azure AD (either using the Azure AD console or a PowerShell script), and configure Amazon Q Business.

For this post, we use a SharePoint Online site named HR Policies that hosts policy documents in a Documents library and payroll tax documents in a Payroll Taxes library to walk you through the solution.

In one of the scenarios that we validate, a SharePoint user (Carlos Salazar) is part of the SharePoint site members group, and he has access only to policy documents in the Documents library.

SharePoint Document Library with HR Travel policy and other policy document

Carlos Salazar can receive responses for queries related to HR policies, as shown in the following example.

Amazon Q Business Web application with question and response on leave plan in California for new parents

However, for questions related to payroll tax, he did not receive any response.

Amazon Q Business Web application with question and response

Another SharePoint user (John Doe) is part of the SharePoint site owners group and has access to both the Documents and Payroll Taxes libraries.

document library with Payroll tax files

John Doe receives responses for queries related to payroll taxes, as shown in the following example.

Amazon Q Business Web application with question and response on "how can i calculate UI and ETT"

Prerequisites

You should meet the following prerequisites:

  • The user performing these steps should be a global administrator on Azure AD/Entra ID.
  • Configure Microsoft Entra ID and IAM Identity Center integration.
  • You need a Microsoft Windows instance to run PowerShell scripts and commands with PowerShell 7.4.1+. Details of the required PowerShell modules are described later in this post.
  • The user should have administrator permissions on the Windows instance.
  • Make sure that the user running these PowerShell commands has the right M365 license (for example, M365 E3).

Create the certificate and private key

In Azure AD, when configuring App-Only authentication, you typically use a certificate to request access. Anyone with the certificate’s private key can use the app and the permissions granted to the app. We create and configure a self-signed X.509 certificate that will be used to authenticate Amazon Q against Azure AD, while requesting the App-Only access token. The following steps walk you through the setup of this model.

For this post, we use Windows PowerShell to run a few PowerShell commands. You can use an existing Windows instance or spin up a Windows EC2 instance or Windows workstation to run the PowerShell commands.

You can use the following PowerShell script to create a self-signed certificate. You can also generate the self-signed certificate through the New-PnPAzureCertificate command.

  1. Run the following command:
.Create-SelfSignedCertificate.ps1 -CommonName "<amazonqbusinessdemo>" -StartDate <StartDate in yyyy-mm-dd format> -EndDate <EndDate in yyyy-mm-dd format>

You will be asked to give a password to encrypt your private key, and both the .PFX file and the .CER file will be exported to the current folder (where you ran the PowerShell script from). Verify that you now have a .cer and .pfx file.

  1. Upload this .cer file to an S3 location that your Amazon Q IAM role has GetObject permissions for. You can let Amazon Q create this role for you in future steps outlined later in this post, and the correct permissions will be added for you if you choose.

Now you extract the private key contents from the .pfx file and save it for Amazon Q connector configuration. This .pfx file will be present in the folder where you have saved the certificate.

  1. Run the following command to extract the private key:
openssl pkcs12 -in [amazonqbusinessdemo.pfx] -nocerts -out [amazonqbusinessdemo.key]

You will be prompted for the import password. Enter the password that you used to protect your key pair when you created the .pfx file (client ID, in our case). You will be prompted again to provide a new password to protect the .key file that you are creating. Store the password to your key file in a secure place to avoid misuse. (When you enter a password, the window shows nothing if you’re using the Windows CMD window. Enter your password and choose Enter.)

  1. Run the following command to decrypt the private key:
openssl rsa -in [amazonqbusinessdemo.key] -out [amazonqbusinessdemo-decrypted.key]
  1. Run the following command to extract the certificate:
openssl pkcs12 -in [amazonqbusinessdemo.pfx] -clcerts -nokeys -out [amazonqbusinessdemo.crt]

This decrypted key and certificate will be used by the connector for authentication purposes.

  1. Upload the X.509 certificate (ending with .crt) to an S3 bucket. This will be used when configuring the SharePoint Online connector for Amazon Q.
    1. Verify the contents of the file amazonqbusinessdemo-decrypted.key starts with the standard BEGIN PRIVATE KEY header.
    2. Copy and paste the contents of the amazonqbusinessdemo-decrypted.key for use later in our Amazon Q setup.

Configure Azure AD

You can configure Azure AD using either of the following methods:

  • Using the Azure AD console GUI. This is a manual step-by-step process.
  • Using the provided PowerShell script. This is an automated process that takes in the inputs and configures the required permissions.

Follow the steps for either option to complete the Azure AD configuration.

Configure Azure AD using the Azure AD console

To configure Azure AD using the GUI, you first register an Azure AD application in the Azure AD tenant that is linked to the SharePoint Online/O365 tenant. For more details, see Granting access via Azure AD App-Only.

  1. Open the Office 365 Admin Center using the account of a user member of the Tenant Global Admins group.
  2. Navigate to Microsoft Azure Portal.
  3. Search for and choose App registrations.

Azure Portal for App registration

  1. Choose New registration.

Azure Portal for App registration step

  1. Enter a name for your application, select who can use this application, and choose Register.

Azure Portal for App registration with name and account types field

An application will be created. You will see a page like the following screenshot.

  1. Note the application (client) ID and the directory (tenant) ID.

These IDs will be different than what is shown in the screenshot.

Azure App wiht client ID and Tenant ID

Now you can configure the newly registered application for SharePoint permissions.

  1. Choose API permissions in the navigation pane.
  2. Choose Add a permission to add the permissions to your application.

Azure App API Permission tab

  1. Choose SharePoint from the list of applications.

Azure App API Permission tab Request API Permission

  1. Configure permissions.

There are two different ways to configure SharePoint permissions.

To configure permissions to access multiple SharePoint Site collections (using Azure AD App-Only permissions), select Site.FullControl.All to allow full control permissions to all the SharePoint site collections and to read the ACLs from these site collections.

azure app registration request API permission tab

This permission requires admin consent in a tenant before it can be used. To do so, choose Grant admin consent for <organization name> and choose Yes to confirm.

azure app Grant Admin consent confirmation

Alternatively, to configure permissions to access specific SharePoint site collections, select Sites.Selected to allow access to a subset of site collections without a signed-in user. The specific site collections and the permissions granted will be configured in SharePoint Online.

Request API permission for Sites and Add permission

This permission requires admin consent in a tenant before it can be used. To do so, choose Grant admin consent for <organization name> and choose Yes to confirm.

Azure App Grand Admin consent confirmation page

Next, you grant Azure AD app permissions to one or more SharePoint site collections. Make sure the following prerequisites are in place:

  • You must have Windows Server/Workstation with PowerShell 7.4.1+.
  • The user running these PowerShell commands must have the right M365 license (for example, M365 E3).
  • Install the PowerShell modules using Install-Module -Name PnP.PowerShell -AllPreRelease.
  • If this is your first-time running PowerShell commands, run the Connect-PnPOnline -Url <site collection url> -PnPManagementShell PowerShell command and complete the consent process to use PnP cmdlets. Alternatively, run the Register-PnPManagementShellAccess cmdlet, which grants access to the tenant for the PnP management shell multi-tenant Azure AD application.
  1. Open PowerShell and connect to SharePoint Online using the Connect-PnPOnline command:
Connect-PnPOnline -Url <sitecollectionUrl> -PnPManagementShell
  1. Add the Azure AD app to one or more specific site collection permissions using Grant-PnPAzureADAppSitePermission:
Grant-PnPAzureADAppSitePermission -AppId <app-id> -DisplayName <displayname> -Site [<sitecollectionurl>] -Permissions <FullControl> 

If you want to configure permissions to more than one SharePoint Online site collection, then you must repeat the preceding PowerShell commands for every collection.

Now you’re ready to connect the certificate.

  1. Choose Certificates & secrets in the navigation pane.
  2. On the Certificates tab, choose Upload certificate.

Azure App registration Certificate and Secrets page

  1. Choose the .cer file you generated earlier and choose Add to upload it.

Upload Certificate by Add option

This completes the configuration on the Azure AD side.

Configure Azure AD using the provided PowerShell script

The user running this PowerShell script should be an Azure AD tenant admin or have tenant admin permissions. Additionally, as a prerequisite, install the MS Graph PowerShell SDK.

Complete the following steps to run the PowerShell script:

  1. Run the PowerShell script and follow the instructions.

This script will do the following:

  • Register a new application in Azure AD/Entra ID
  • Configure the required SharePoint permissions
  • Provide admin consent for the permissions

The output from the PowerShell script will look like the following screenshot.

PowerShell Script for Certificate

  1. If you chose Selected as the permission to target a specific SharePoint Site collection, continue with the steps to configure a specific SharePoint Site collection as mentioned earlier.
  2. If you have more than one SharePoint site collection to be crawled, repeat the previous step to configure each collection.

Configure Amazon Q

Make sure you have set up Amazon Q Business with Entra ID as IdP as mentioned in the prerequisites. Also, make sure the email ID is in lowercase letters while creating the users in Entra ID.

Follow the instructions in Connecting Amazon Q Business to SharePoint (Online) using the console.

For Step 9 (Authentication), we choose Azure AD App-Only authentication and configure it as follows:

  • For Tenant ID, enter the tenant ID of your SharePoint account. This will be directory (tenant) ID in your registered Azure application, in the Azure Portal, as shown in the following screenshot (the IDs will be different for your setup).

Azure App for Application client id and Tenant ID

  • For Certificate path, enter the full S3 path to your certificate (for example, s3://certBucket/azuread.crt). This is the Azure AD self-signed X.509 certificate to authenticate the connector for Azure AD. This certificate was created earlier.
  • For AWS Secrets Manager secret, create a secret in AWS Secrets Manager to store your SharePoint authentication credentials:
    • For Secret name, enter a name for your secret.
    • For Client ID, enter the Azure AD client ID generated when you registered SharePoint in Azure AD. This is the application (client) ID created in the Azure Portal when registering the SharePoint application in Azure, as described earlier.
    • For Private key, enter a private key to authenticate the connector for Azure AD. This is the contents of the .pfx file you created when registering your Azure SharePoint application, as described earlier. Enter the decrypted contents of that .pfx file in its entirety. Choose Show private key to verify it matches the contents for your .pfx file.

Secret created in AWS Secret Manager

Continue with the rest of the steps in Connecting Amazon Q Business to SharePoint (Online) using the console.

Access the web experience on Amazon Q

To access the web experience, complete the following steps:

  1. On the Amazon Q Business console, choose Applications in the navigation pane.
  2. Choose the application you created.
  3. Choose the link under Web experience URL to browse Amazon Q.

Getting Web Application URL from Amazon Q Business Application page

  1. When prompted, authenticate with Entra ID/Azure AD.

After you’re authenticated, you can access Amazon Q. You can ask Amazon Q a question and get a response based on the permissions of the logged-in user.

References

param(
  [Parameter(Mandatory=$true,
  HelpMessage="The friendly name of the app registration")]
  [String]
  $AppName,

  [Parameter(Mandatory=$true,
  HelpMessage="The file path to your public key file")]
  [String]
  $CertPath,

  [Parameter(Mandatory=$false,
  HelpMessage="Your Azure Active Directory tenant ID")]
  [String]
  $TenantId,

  [Parameter(Mandatory=$false)]
  [Switch]
  $StayConnected = $false
)

# Display the options for permission
$validOptions = @('R', 'F', 'S')
Write-Host "Select the permissions: [F]-sites.FullControl.All [S]-sites.Selected"

# Loop to prompt the user until a valid option is selected
do {
    foreach ($option in $validOptions) {
        Write-Host "[$option]"
    }
    $selectedPermission = Read-Host "Enter your choice (F or S)"
} while ($selectedPermission -notin $validOptions)

# Map user input to corresponding permissions
$permissionMapping = @{
    'F' = '678536fe-1083-478a-9c59-b99265e6b0d3'
    'S' = '20d37865-089c-4dee-8c41-6967602d4ac8'
}

$selectedPermissionValue = $permissionMapping[$selectedPermission]

# Requires an admin
if ($TenantId)
{
  Connect-MgGraph -Scopes "Application.ReadWrite.All User.Read AppRoleAssignment.ReadWrite.All" -TenantId $TenantId
}
else
{
  Connect-MgGraph -Scopes "Application.ReadWrite.All User.Read AppRoleAssignment.ReadWrite.All"
}

# Graph permissions constants
$sharePointResourceId = "00000003-0000-0ff1-ce00-000000000000"
$SitePermission = @{
  Id=$selectedPermissionValue
  Type="Role"
}

# Get context for access to tenant ID
$context = Get-MgContext

# Load cert
$cert = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2($CertPath)
Write-Host -ForegroundColor Cyan "Certificate loaded"

# Create app registration
$appRegistration = New-MgApplication -DisplayName $AppName -SignInAudience "AzureADMyOrg" `
 -Web @{ RedirectUris="http://localhost"; } `
 -RequiredResourceAccess @{ ResourceAppId=$sharePointResourceId; ResourceAccess=$UserReadAll, $GroupReadAll, $SitePermission } `
 -AdditionalProperties @{} -KeyCredentials @(@{ Type="AsymmetricX509Cert"; Usage="Verify"; Key=$cert.RawData })
Write-Host -ForegroundColor Cyan "App registration created with app ID" $appRegistration.AppId

# Create corresponding service principal
$servicePrincipal= New-MgServicePrincipal -AppId $appRegistration.AppId -AdditionalProperties @{} | Out-Null
Write-Host -ForegroundColor Cyan "Service principal created"
Write-Host
Write-Host -ForegroundColor Green "Success"
Write-Host

# Providing admin consent
$scp = Get-MgServicePrincipal -Filter "DisplayName eq '$($AppName)'" 
$app = Get-MgServicePrincipal -Filter "AppId eq '$sharePointResourceId'" 
New-MgServicePrincipalAppRoleAssignment -ServicePrincipalId $scp.id -PrincipalId $scp.Id -ResourceId $app.Id -AppRoleId $selectedPermissionValue  

# Generate Connect-MgGraph command
$connectGraph = "Connect-MgGraph -ClientId """ + $appRegistration.AppId + """ -TenantId """`
 + $context.TenantId + """ -CertificateName """ + $cert.SubjectName.Name + """"
Write-Host $connectGraph

if ($StayConnected -eq $false)
{
  Disconnect-MgGraph
  Write-Host "Disconnected from Microsoft Graph"
}
else
{
  Write-Host
  Write-Host -ForegroundColor Yellow "The connection to Microsoft Graph is still active. To disconnect, use Disconnect-MgGraph"
  • You can test if the Grant-PnPAzureADAppSitePermission cmdlet worked by connecting to the SharePoint site using the Azure AD app that has the SharePoint.Sites.Selected permission and run a few SharePoint API calls:
    1. Make a note of the certificate thumbprint as shown earlier.
    2. Install the certificate for the current user in the Windows Certificate Management Store.
    3. Run the following PowerShell cmdlet to connect to the SharePoint site collection using PnPOnline:
Connect-PnPOnline -Url “<SharePoint site collection url> -ClientId "<client id>" -Thumbprint "<certificate thumbprint>" -Tenant "<tenant id>
    1. Run Get-PnPList to list all the SharePoint lists in the site collection and confirm that the permissions are configured correctly:
Get-PnPList

Troubleshooting

For troubleshooting guidance, refer to Troubleshooting your SharePoint (Online) connector.

Clean up

Complete the following steps to clean up your resources:

  1. Open the Office 365 Admin Center using the account of a user member of the Tenant Global Admins group.
  2. Navigate to the Microsoft Azure Portal.
  3. Search for and choose App registrations.
  4. Select the app you created earlier, then choose Delete.
  5. On the Amazon Q Business console, choose Applications in the navigation pane.
  6. Select the application you created, and on the Actions menu, choose Delete.

Conclusion

In this post, we explored how Amazon Q Business can seamlessly integrate with SharePoint Online to help enterprises unlock the value of their data and knowledge. With the SharePoint Online connector, organizations can empower their employees to find answers quickly, accelerate research and analysis, streamline content creation, automate workflows, and enhance collaboration.

We walked you through the process of setting up the SharePoint Online connector, including configuring the necessary Azure AD integration and authentication mechanisms. With these foundations in place, you can start unlocking the full potential of your SharePoint investment and drive greater productivity, efficiency, and innovation across your business.

Now that you’ve learned how to integrate Amazon Q Business with your Microsoft SharePoint Online content, it’s time to unlock the full potential of your organization’s knowledge and data. To get started, sign up for an Amazon Q Business account and follow the steps in this post to set up the SharePoint Online connector. Then you can start asking Amazon Q natural language questions and watch as it surfaces the most relevant information from your company’s SharePoint sites and documents.

Don’t miss out on the transformative power of generative AI and the Amazon Q Business platform. Sign up today and experience the difference that Amazon Q can make for your organization’s SharePoint-powered knowledge and content management.


About the Authors

Vijai Gandikota is a Principal Product Manager on the Amazon Q and Amazon Kendra team of Amazon Web Services. He is responsible for the Amazon Q and Amazon Kendra connectors, ingestion, security, and other aspects of Amazon Q and Amazon Kendra.

Satveer Khurpa is a Senior Solutions Architect on the GenAI Labs team at Amazon Web Services. In this role, he uses his expertise in cloud-based architectures to develop innovative generative AI solutions for clients across diverse industries. Satveer’s deep understanding of generative AI technologies enables him to design scalable, secure, and responsible applications that unlock new business opportunities and drive tangible value.

Vijai Anand Ramalingam is a Senior Modernization Architect at Amazon Web Services, specialized in enabling and accelerating customers’ application modernization, transitioning from legacy monolith applications to microservices.

Ramesh Jatiya is a Senior Solutions Architect in the Independent Software Vendor (ISV) team at Amazon Web Services. He is passionate about working with ISV customers to design, deploy, and scale their applications in the cloud to derive business value. He is also pursuing an MBA in Machine Learning and Business Analytics from Babson College, Boston. Outside of work, he enjoys running, playing tennis, and cooking.

Neelam Rana is a Software Development Engineer on the Amazon Q and Amazon Kendra engineering team. She works on Amazon Q connector design, development, integration, and test operations.

Dipti Kulkarni is a Software Development Manager on the Amazon Q and Amazon Kendra engineering team of Amazon Web Services, where she manages the connector development and integration teams.

Read More

Evaluate conversational AI agents with Amazon Bedrock

Evaluate conversational AI agents with Amazon Bedrock

As conversational artificial intelligence (AI) agents gain traction across industries, providing reliability and consistency is crucial for delivering seamless and trustworthy user experiences. However, the dynamic and conversational nature of these interactions makes traditional testing and evaluation methods challenging. Conversational AI agents also encompass multiple layers, from Retrieval Augmented Generation (RAG) to function-calling mechanisms that interact with external knowledge sources and tools. Although existing large language model (LLM) benchmarks like MT-bench evaluate model capabilities, they lack the ability to validate the application layers. The following are some common pain points in developing conversational AI agents:

  • Testing an agent is often tedious and repetitive, requiring a human in the loop to validate the semantics meaning of the responses from the agent, as shown in the following figure.
  • Setting up proper test cases and automating the evaluation process can be difficult due to the conversational and dynamic nature of agent interactions.
  • Debugging and tracing how conversational AI agents route to the appropriate action or retrieve the desired results can be complex, especially when integrating with external knowledge sources and tools.Testing an agent is a repetitive work

Agent Evaluation, an open source solution using LLMs on Amazon Bedrock, addresses this gap by enabling comprehensive evaluation and validation of conversational AI agents at scale.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Agent Evaluation provides the following:

  • Built-in support for popular services, including Agents for Amazon Bedrock, Knowledge Bases for Amazon Bedrock, Amazon Q Business, and Amazon SageMaker endpoints
  • Orchestration of concurrent, multi-turn conversations with your agent while evaluating its responses
  • Configurable hooks to validate actions triggered by your agent
  • Integration into continuous integration and delivery (CI/CD) pipelines to automate agent testing
  • A generated test summary for performance insights including conversation history, test pass rate, and reasoning for pass/fail results
  • Detailed traces to enable step-by-step debugging of the agent interactions

In this post, we demonstrate how to streamline virtual agent testing at scale using Amazon Bedrock and Agent Evaluation.

Solution overview

To use Agent Evaluation, you need to create a test plan, which consists of three configurable components:

  • Target – A target represents the agent you want to test
  • Evaluator – An evaluator represents the workflow and logic to evaluate the target on a test
  • Test – A test defines the target’s functionality and how you want your end-user to interact with the target, which includes:
    • A series of steps representing the interactions between the agent and the end-user
    • Your expected results of the conversation

The following figure illustrates how Agent Evaluation works on a high level. The framework implements an LLM agent (evaluator) that will orchestrate conversations with your own agent (target) and evaluate the responses during the conversation.

How Agent Evaluation works on a high level

The following figure illustrates the evaluation workflow. It shows how the evaluator reasons and assesses responses based on the test plan. You can either provide an initial prompt or instruct the evaluator to generate one to initiate the conversation. At each turn, the evaluator engages the target agent and evaluates its response. This process continues until the expected results are observed or the maximum number of conversation turns is reached.

Agent Evaluation evaluator workflow

By understanding this workflow logic, you can create a test plan to thoroughly assess your agent’s capabilities.

Use case overview

To illustrate how Agent Evaluation can accelerate the development and deployment of conversational AI agents at scale, let’s explore an example scenario: developing an insurance claim processing agent using Agents for Amazon Bedrock. This insurance claim processing agent is expected to handle various tasks, such as creating new claims, sending reminders for pending documents related to open claims, gathering evidence for claims, and searching for relevant information across existing claims and customer knowledge repositories.

For this use case, the goal is to test the agent’s capability to accurately search and retrieve relevant information from existing claims. You want to make sure the agent provides correct and reliable information about existing claims to end-users. Thoroughly evaluating this functionality is crucial before deployment.

Begin by creating and testing the agent in your development account. During this phase, you interact manually with the conversational AI agent using sample prompts to do the following:

  • Engage the agent in multi-turn conversations on the Amazon Bedrock console
  • Validate the responses from the agent
  • Validate all the actions invoked by the agent
  • Debug and check traces for any routing failures

With Agent Evaluation, the developer can streamline this process through the following steps:

  1. Configure a test plan:
    1. Choose an evaluator from the models provided by Amazon Bedrock.
    2. Configure the target, which should be a type that Agent Evaluation supports. For this post, we use an Amazon Bedrock agent.
    3. Define the test steps and expected results. In the following example test plan, you have a claim with the ID claim-006 in your test system. You want to confirm that your agent can accurately answer questions about this specific claim.
    evaluator:
      type: bedrock-claude
      model: claude-haiku
    target:
      type: bedrock-agent
      bedrock_agent_alias_id:xxxxxxx
      bedrock_agent_id:xxxxxxx
    tests:
      - name: GetOpenClaimsWithDetails
        steps:
          - Ask the agent which claims are open.
          - Ask the agent for details on claim-006.
        expected_results:
          - The agent returns a list of open claims.
          - The agent returns the details on claim-006.
    

  2. Run the test plan from the command line:
    agenteval run

    The Agent Evaluation test runner will automatically orchestrate the test based on the test plan, and use the evaluator to determine if the responses from the target match the expected results.

  3. View the result summary.
    A result summary will be provided in markdown format. In the following example, the summary indicates that the test failed because the agent was unable to provide accurate information about the existing claim claim-006.Agent Evaluation test summary
  4.  Debug with the trace files of the failed tests.
    Agent Evaluation provides detailed trace files for the tests. Each trace file meticulously records every prompt and interaction between the target and the evaluator.For instance, in the _invoke_target step, you can gain valuable insights into the rationale behind the Amazon Bedrock agent’s responses, allowing you to delve deeper into the decision-making process:

    {
        "timestamp": "2024-04-17 12:52:11.231906+00:00",
        "step_name": "_invoke_target",
        "preProcessingTrace": {
            "modelInvocationOutput": {
                "parsedResponse": {
                    "isValid": false,
                    "rationale": "Let's analyze the user's input step-by-step: 1. The user is asking for details on a specific claim with ID 'claim-006'. 2. This does not seem malicious or harmful in any way, so we can rule out Category A. 3. The user is not trying to get information about the agent's functions or manipulate its behavior. So we can rule out Category B. 4. We don't have enough information yet to determine if the agent can actually answer this question. We need to check the conversation history and available functions first. 5. Looking at the conversation history, the agent has access to a function called 'get: :send-reminder: :getAllOpenClaims' that returns a list of open claim IDs. 6. However, 'claim-006' is not in the list of open claim IDs returned. So the agent does not have direct access to details on this specific claim. 7. Therefore, this question falls into Category C - the agent will be unable to answer or assist with this question using only the functions it has access to. Given this analysis, I categorize this input as: <category>C</category>"
                }
            }
        }
    }

    The trace shows that after reviewing the conversation history, the evaluator concludes, “the agent will be unable to answer or assist with this question using only the functions it has access to.” Consequently, it ends the conversation with the target agent and proceeds to generate the test status.

    In the _generate_test_status step, the evaluator generates the test status with reasoning based on the responses from the target.

    { 
        "timestamp": "2024-04-17 12:52:12.976985+00:00", 
        "step_name": "_generate_test_status", 
        "system_prompt": "You are a quality assurance engineer evaluating a conversation between an USER and an AGENT. You will be given an ordered list of steps wrapped in <steps> tags. Each step represents a task that the USER wants to perform when interacting with the AGENT. Your job is analyze the running conversation in <conversation> tags and classify it into the following categories: - A: The USER has attempted all the steps. - B: The USER has not yet attempted all the steps. Please think hard about the response in <thinking> tags before providing only the category letter within <category> tags.", 
        "prompt": "Here are the steps and conversation: <steps> 1. Ask the agent which claims are open. 2. Ask the agent for details on claim-006. <steps> <conversation> USER: Which claims are currently open? AGENT: The open claims are: 2s34w-8x, 5t16u-7v, 3b45c-9d USER: Can you please provide me with the details on claim-006? AGENT: Sorry, I don't have enough information to answer that. </conversation>", 
        "test_status": "B", 
        "reasoning": "The user has attempted the first step of asking which claims are open, and the agent has provided a list of open claims. However, the user has not yet attempted the second step of asking for details on claim-006, as the agent has indicated that they do not have enough information to provide those details." 
    }

    The test plan defines the expected result as the target agent accurately providing details about the existing claim claim-006. However, after testing, the target agent’s response doesn’t meet the expected result, and the test fails.

  5. After identifying and addressing the issue, you can rerun the test to validate the fix. In this example, it’s evident that the target agent lacks access to the claim claim-006. From there, you can continue investigating and verify if claim-006 exists in your test system.

Integrate Agent Evaluation with CI/CD pipelines

After validating the functionality in the development account, you can commit the code to the repository and initiate the deployment process for the conversational AI agent to the next stage. Seamless integration with CI/CD pipelines is a crucial aspect of Agent Evaluation, enabling comprehensive integration testing to make sure no regressions are introduced during new feature development or updates. This rigorous testing approach is vital for maintaining the reliability and consistency of conversational AI agents as they progress through the software delivery lifecycle.

By incorporating Agent Evaluation into CI/CD workflows, organizations can automate the testing process, making sure every code change or update undergoes thorough evaluation before deployment. This proactive measure minimizes the risk of introducing bugs or inconsistencies that could compromise the conversational AI agent’s performance and the overall user experience.

A standard agent CI/CD pipeline includes the following steps:

  1.  The source repository stores the agent configuration, including agent instructions, system prompts, model configuration, and so on. You should always commit your changes to provide quality and reproducibility.
  2. When you commit your changes, a build step is invoked. This is where unit tests should run and validate the changes, including typo and syntax checks.
  3. When the changes are deployed to the staging environment, Agent Evaluation runs with a series of test cases for runtime validation.
  4. The runtime validation on the staging environment can help build confidence to deploy the fully tested agent to production.

The following figure illustrates this pipeline.

Conversational AI agent CICD pipeline

In the following sections, we provide step-by-step instructions to set up Agent Evaluation with GitHub Actions.

Prerequisites

Complete the following prerequisite steps:

  1. Follow the GitHub user guide to get started with GitHub.
  2. Follow the GitHub Actions user guide to understand GitHub workflows and Actions.
  3. Follow the insurance claim processing agent using Agents for Amazon Bedrock example to set up an agent.

Set up GitHub Actions

Complete the following steps to deploy the solution:

  1. Write a series of test cases following the agent-evaluation test plan syntax and store test plans in the GitHub repository. For example, a test plan to test an Amazon Bedrock agent target is written as follows, with BEDROCK_AGENT_ALIAS_ID and BEDROCK_AGENT_ID as placeholders:
    evaluator:
      model: claude-3
    target:
      bedrock_agent_alias_id: BEDROCK_AGENT_ALIAS_ID
      bedrock_agent_id: BEDROCK_AGENT_ID
      type: bedrock-agent
    tests:
      InsuranceClaimQuestions:
        ...

  2. Create an AWS Identity and Access Management (IAM) user with the proper permissions:
    1. The principal must have InvokeModel permission to the model specified in the configuration.
    2. The principal must have the permissions to call the target agent. Depending on the target type, different permissions are required. Refer to the agent-evaluation target documentation for details.
  3. Store the IAM credentials (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY) in GitHub Actions secrets.
  4. Configure a GitHub workflow as follows:
    name: Update Agents for Bedrock
    
    on:
      push:
        branches: [ "main" ]
    
    env:
      AWS_REGION: <Deployed AWS region>                   
      
    
    permissions:
      contents: read
    
    jobs:
      build:
        runs-on: ubuntu-latest
    
        steps:
        - name: Checkout
          uses: actions/checkout@v4
    
        - name: Configure AWS credentials
          uses: aws-actions/configure-aws-credentials@v4
          with:
            aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
            aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
            aws-region: ${{ env.AWS_REGION }}
    
        - name: Install agent-evaluation
          run: |
            pip install agent-evaluation
            agenteval --help
    
        - name: Test Bedrock Agent
          id: test-bedrock-agent
          env:
            BEDROCK_AGENT_ALIAS_ID: ${{ vars.BEDROCK_AGENT_ALIAS_ID }}
            BEDROCK_AGENT_ID: ${{ vars.BEDROCK_AGENT_ID }}
          run: |
            sed -e "s/BEDROCK_AGENT_ALIAS_ID/$BEDROCK_AGENT_ALIAS_ID/g" -e "s/BEDROCK_AGENT_ID/$BEDROCK_AGENT_ID/g" test_plans/agenteval.yml > agenteval.yml
            agenteval run
    
        - name: Test Summary
          if: always()
          id: test-summary
          run: |
            cat agenteval_summary.md >> $GITHUB_STEP_SUMMARY
    

    When you push new changes to the repository, it will invoke the GitHub Action, and an example workflow output is displayed, as shown in the following screenshot.

    GitHub Action Agent Evaluation test output

    A test summary like the following screenshot will be posted to the GitHub workflow page with details on which tests have failed.

    GitHub Action Agent Evaluation test summary

    The summary also provides the reasons for the test failures.

    GitHub Action Agent Evaluation test details

Clean up

Complete the following steps to clean up your resources:

  1. Delete the IAM user you created for the GitHub Action.
  2. Follow the insurance claim processing agent using Agents for Amazon Bedrock example to delete the agent.

Evaluator considerations

By default, evaluators use the InvokeModel API with On-Demand mode, which will incur AWS charges based on input tokens processed and output tokens generated. For the latest pricing details for Amazon Bedrock, refer to Amazon Bedrock pricing.

The cost of running an evaluator for a single test is influenced by the following:

  • The number and length of the steps
  • The number and length of expected results
  • The length of the target agent’s responses

You can view the total number of input tokens processed and output tokens generated by the evaluator using the --verbose flag when you perform a run (agenteval run --verbose).

Conclusion

This post introduced Agent Evaluation, an open source solution that enables developers to seamlessly integrate agent evaluation into their existing CI/CD workflows. By taking advantage of the capabilities of LLMs on Amazon Bedrock, Agent Evaluation enables you to comprehensively evaluate and debug your agents, achieving reliable and consistent performance. With its user-friendly test plan configuration, Agent Evaluation simplifies the process of defining and orchestrating tests, allowing you to focus on refining your agents’ capabilities. The solution’s built-in support for popular services makes it a versatile tool for testing a wide range of conversational AI agents. Moreover, Agent Evaluation’s seamless integration with CI/CD pipelines empowers teams to automate the testing process, making sure every code change or update undergoes rigorous evaluation before deployment. This proactive approach minimizes the risk of introducing bugs or inconsistencies, ultimately enhancing the overall user experience.

The following are some recommendations to consider:

  • Don’t use the same model to evaluate the results that you use to power the agent. Doing so may introduce biases and lead to inaccurate evaluations.
  • Block your pipelines on accuracy failures. Implement strict quality gates to help prevent deploying agents that fail to meet the expected accuracy or performance thresholds.
  • Continuously expand and refine your test plans. As your agents evolve, regularly update your test plans to cover new scenarios and edge cases, and provide comprehensive coverage.
  • Use Agent Evaluation’s logging and tracing capabilities to gain insights into your agents’ decision-making processes, facilitating debugging and performance optimization.

Agent Evaluation unlocks a new level of confidence in your conversational AI agents’ performance by streamlining your development workflows, accelerating time-to-market, and delivering exceptional user experiences. To further explore the best practices of building and testing conversational AI agent evaluation at scale, get started by trying Agent Evaluation and provide your feedback.


About the Authors

Sharon Li is an AI/ML Specialist Solutions Architect at Amazon Web Services (AWS) based in Boston, Massachusetts. With a passion for leveraging cutting-edge technology, Sharon is at the forefront of developing and deploying innovative generative AI solutions on the AWS cloud platform.

Bobby Lindsey is a Machine Learning Specialist at Amazon Web Services. He’s been in technology for over a decade, spanning various technologies and multiple roles. He is currently focused on combining his background in software engineering, DevOps, and machine learning to help customers deliver machine learning workflows at scale. In his spare time, he enjoys reading, research, hiking, biking, and trail running.

Tony Chen is a Machine Learning Solutions Architect at Amazon Web Services, helping customers design scalable and robust machine learning capabilities in the cloud. As a former data scientist and data engineer, he leverages his experience to help tackle some of the most challenging problems organizations face with operationalizing machine learning.

Suyin Wang is an AI/ML Specialist Solutions Architect at AWS. She has an interdisciplinary education background in Machine Learning, Financial Information Service and Economics, along with years of experience in building Data Science and Machine Learning applications that solved real-world business problems. She enjoys helping customers identify the right business questions and building the right AI/ML solutions. In her spare time, she loves singing and cooking.

Curt Lockhart is an AI/ML Specialist Solutions Architect at AWS. He comes from a non-traditional background of working in the arts before his move to tech, and enjoys making machine learning approachable for each customer. Based in Seattle, you can find him venturing to local art museums, catching a concert, and wandering throughout the cities and outdoors of the Pacific Northwest.

Read More

Node problem detection and recovery for AWS Neuron nodes within Amazon EKS clusters

Node problem detection and recovery for AWS Neuron nodes within Amazon EKS clusters

Implementing hardware resiliency in your training infrastructure is crucial to mitigating risks and enabling uninterrupted model training. By implementing features such as proactive health monitoring and automated recovery mechanisms, organizations can create a fault-tolerant environment capable of handling hardware failures or other issues without compromising the integrity of the training process.

In the post, we introduce the AWS Neuron node problem detector and recovery DaemonSet for AWS Trainium and AWS Inferentia on Amazon Elastic Kubernetes Service (Amazon EKS). This component can quickly detect rare occurrences of issues when Neuron devices fail by tailing monitoring logs. It marks the worker nodes in a defective Neuron device as unhealthy, and promptly replaces them with new worker nodes. By accelerating the speed of issue detection and remediation, it increases the reliability of your ML training and reduces the wasted time and cost due to hardware failure.

This solution is applicable if you’re using managed nodes or self-managed node groups (which use Amazon EC2 Auto Scaling groups) on Amazon EKS. At the time of writing this post, automatic recovery of nodes provisioned by Karpenter is not yet supported.

Solution overview

The solution is based on the node problem detector and recovery DaemonSet, a powerful tool designed to automatically detect and report various node-level problems in a Kubernetes cluster.

The node problem detector component will continuously monitor the kernel message (kmsg) logs on the worker nodes. If it detects error messages specifically related to the Neuron device (which is the Trainium or AWS Inferentia chip), it will change NodeCondition to NeuronHasError on the Kubernetes API server.

The node recovery agent is a separate component that periodically checks the Prometheus metrics exposed by the node problem detector. When it finds a node condition indicating an issue with the Neuron device, it will take automated actions. First, it will mark the affected instance in the relevant Auto Scaling group as unhealthy, which will invoke the Auto Scaling group to stop the instance and launch a replacement. Additionally, the node recovery agent will publish Amazon CloudWatch metrics for users to monitor and alert on these events.

The following diagram illustrates the solution architecture and workflow.

In the following walkthrough, we create an EKS cluster with Trn1 worker nodes, deploy the Neuron plugin for the node problem detector, and inject an error message into the node. We then observe the failing node being stopped and replaced with a new one, and find a metric in CloudWatch indicating the error.

Prerequisites

Before you start, make sure you have installed the following tools on your machine:

Deploy the node problem detection and recovery plugin

Complete the following steps to configure the node problem detection and recovery plugin:

  1. Create an EKS cluster using the data on an EKS Terraform module:
    git clone https://github.com/awslabs/data-on-eks.git
    
    export TF_VAR_region=us-east-2
    export TF_VAR_trn1_32xl_desired_size=4
    export TF_VAR_trn1_32xl_min_size=4
    cd data-on-eks/ai-ml/trainium-inferentia/ && chmod +x install.sh
    ./install.sh
    
    aws eks --region us-east-2 describe-cluster --name trainium-inferentia
    
    # Creates k8s config file to authenticate with EKS
    aws eks --region us-east-2 update-kubeconfig --name trainium-inferentia
    
    kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    ip-100-64-161-213.us-east-2.compute.internal Ready 31d v1.29.0-eks-5e0fdde
    ip-100-64-227-31.us-east-2.compute.internal Ready 31d v1.29.0-eks-5e0fdde
    ip-100-64-70-179.us-east-2.compute.internal Ready 31d v1.29.0-eks-5e0fdde

  2. Install the required AWS Identity and Access Management (IAM) role for the service account and the node problem detector plugin.
  3. Create a policy as shown below. Update the Resource key value to match your node group ARN that contains the Trainium and AWS Inferentia nodes, and update the ec2:ResourceTag/aws:autoscaling:groupName key value to match the Auto Scaling group name.

You can get these values from the Amazon EKS console. Choose Clusters in the navigation pane, open the trainium-inferentia cluster, choose Node groups, and locate your node group.

# To create the policy, aws cli can be used as shown below where npd-policy-trimmed.json is the policy json constructed from the template above.

# Create npd-policy-trimmed.json
cat << EOF > npd-policy-trimmed.json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "autoscaling:SetInstanceHealth",
                "autoscaling:DescribeAutoScalingInstances"
            ],
            "Effect": "Allow",
            "Resource": <arn of the Auto Scaling group corresponding to the Neuron nodes for the cluster>
        },
        {
            "Action": [
                "ec2:DescribeInstances"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Condition": {
                "ForAllValues:StringEquals": {
                    "ec2:ResourceTag/aws:autoscaling:groupName": <name of the Auto Scaling group corresponding to the Neuron nodes for the cluster>
                }
            }
        },
        {
            "Action": [
                "cloudwatch:PutMetricData"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "cloudwatch:Namespace": "NeuronHealthCheck"
                }
            }
        }
    ]
}
EOF

This component will be installed as a DaemonSet in your EKS cluster.

# To create the policy, aws cli can be used as shown below where npd-policy-trimmed.json is the policy json constructed from the template above.

aws iam create-policy  
--policy-name NeuronProblemDetectorPolicy 
--policy-document file://npd-policy-trimmed.json

# Note the ARN

CLUSTER_NAME=trainium-inferentia # Your EKS Cluster Name 
AWS_REGION=us-east-2
ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
POLICY_ARN=arn:aws:iam::$ACCOUNT_ID:policy/NeuronProblemDetectorPolicy

eksctl create addon --cluster $CLUSTER_NAME --name eks-pod-identity-agent 
  --region $AWS_REGION

eksctl create podidentityassociation 
    --cluster $CLUSTER_NAME 
    --namespace neuron-healthcheck-system 
    --service-account-name node-problem-detector 
    --permission-policy-arns="$POLICY_ARN" 
    --region $AWS_REGION
    
# Install the Neuron NPD and recovery plugin 

kubectl create ns neuron-healthcheck-system
curl https://raw.githubusercontent.com/aws-neuron/aws-neuron-sdk/215b421ac448d85f89be056e27e29842a6b03c9c/src/k8/neuron-problem-detector/k8s-neuron-problem-detector-and-recovery.yml | kubectl apply -f - 
curl https://raw.githubusercontent.com/aws-neuron/aws-neuron-sdk/215b421ac448d85f89be056e27e29842a6b03c9c/src/k8/neuron-problem-detector/k8s-neuron-problem-detector-and-recovery-rbac.yml | kubectl apply -f - 
curl https://raw.githubusercontent.com/aws-neuron/aws-neuron-sdk/215b421ac448d85f89be056e27e29842a6b03c9c/src/k8/neuron-problem-detector/k8s-neuron-problem-detector-and-recovery-config.yml | kubectl apply -f -

# Expected result (with 4 Neuron nodes in cluster):

kubectl get pod -n neuron-healthcheck-system
NAME READY STATUS RESTARTS AGE
node-problem-detector-49p6w 2/2 Running 0 31s
node-problem-detector-j7wct 2/2 Running 0 31s
node-problem-detector-qr6jm 2/2 Running 0 31s
node-problem-detector-vwq8x 2/2 Running 0 31s

The container images in the Kubernetes manifests are stored in public repository such as registry.k8s.io and public.ecr.aws. For production environments, it’s recommended that customers limit external dependencies that impact these areas and host container images in a private registry and sync from images public repositories. For detailed implementation, please refer to the blog post: Announcing pull through cache for registry.k8s.io in Amazon Elastic Container Registry.

By default, the node problem detector will not take any actions on failed node. If you would like the EC2 instance to be terminated automatically by the agent, update the DaemonSet as follows:

kubectl edit -n neuron-healthcheck-system ds/node-problem-detector

...
   env:
   - name: ENABLE_RECOVERY
     value: "true"

Test the node problem detector and recovery solution

After the plugin is installed, you can see Neuron conditions show up by running kubectl describe node. We simulate a device error by injecting error logs in the instance:

# Verify node conditions on any node. Neuron conditions should show up.

kubectl describe node ip-100-64-58-151.us-east-2.compute.internal | grep Conditions: -A7

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  NeuronHealth     False   Fri, 29 Mar 2024 15:52:08 +0800   Thu, 28 Mar 2024 13:59:19 +0800   NeuronHasNoError             Neuron has no error
  MemoryPressure   False   Fri, 29 Mar 2024 15:51:03 +0800   Thu, 28 Mar 2024 13:58:39 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 29 Mar 2024 15:51:03 +0800   Thu, 28 Mar 2024 13:58:39 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 29 Mar 2024 15:51:03 +0800   Thu, 28 Mar 2024 13:58:39 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Fri, 29 Mar 2024 15:51:03 +0800   Thu, 28 Mar 2024 13:59:08 +0800   KubeletReady                 kubelet is posting ready status
# To get provider id
kubectl describe node ip-100-64-58-151.us-east-2.compute.internal | grep -i provider | sed -E 's/.*/([^/]+)$/1/'

i-0381404aa69eae3f6

# SSH into to the worker node and simulate the hardware error on the neuron device
aws ssm start-session --target i-0381404aa69eae3f6 --region us-east-2

Starting session with SessionId: lindarr-0069460593240662a

sh-4.2$
sh-4.2$ sudo bash
[root@ip-192-168-93-211 bin]# echo "test NEURON_HW_ERR=DMA_ERROR test" >> /dev/kmsg

Around 2 minutes later, you can see that the error has been identified:

kubectl describe node ip-100-64-58-151.us-east-2.compute.internal | grep 'Conditions:' -A7
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  NeuronHealth     True    Fri, 29 Mar 2024 17:42:43 +0800   Fri, 29 Mar 2024 17:42:38 +0800   NeuronHasError_DMA_ERROR     test NEURON_HW_ERR=DMA_ERROR test

...

Events:
  Type     Reason                    Age   From            Message
  ----     ------                    ----  ----            -------
  Warning  NeuronHasError_DMA_ERROR  36s   kernel-monitor  Node condition NeuronHealth is now: True, reason: NeuronHasError_DMA_ERROR, message: "test NEURON_HW_ERR=DMA_ERROR test"

Now that the error has been detected by the node problem detector, and the recovery agent has automatically taken the action to set the node as unhealthy, Amazon EKS will cordon the node and evict the pods on the node:

# Verify the Node scheduling is disabled.
kubectl get node 
NAME                                           STATUS                        ROLES    AGE    VERSION
ip-100-64-1-48.us-east-2.compute.internal      Ready                         <none>   156m   v1.29.0-eks-5e0fdde
ip-100-64-103-26.us-east-2.compute.internal    Ready                         <none>   94s    v1.29.0-eks-5e0fdde
ip-100-64-239-245.us-east-2.compute.internal   Ready                         <none>   154m   v1.29.0-eks-5e0fdde
ip-100-64-52-40.us-east-2.compute.internal     Ready                         <none>   156m   v1.29.0-eks-5e0fdde
ip-100-64-58-151.us-east-2.compute.internal    NotReady,SchedulingDisabled   <none>   27h    v1.29.0-eks-5e0fdde

You can open the CloudWatch console and verify the metric for NeuronHealthCheck. You can see the CloudWatch NeuronHasError_DMA_ERROR metric has the value 1.

After replacement, you can see a new worker node has been created:

# The new node with age 28s is the new node

kubectl get node 
NAME                                           STATUS   ROLES    AGE   VERSION
ip-192-168-65-77.us-east-2.compute.internal    Ready    <none>   28s   v1.29.0-eks-5e0fddev1.28.5-eks-5e0fdde
ip-192-168-81-176.us-east-2.compute.internal   Ready    <none>   9d    v1.29.5-eks-5e0fdde
ip-192-168-91-218.us-east-2.compute.internal   Ready    <none>   9d    v1.29.0-eks-5e0fdde
ip-192-168-94-83.us-east-2.compute.internal    Ready    <none>   9d    v1.29.0-eks-5e0fdde

Let’s look at a real-world scenario, in which you’re running a distributed training job, using an MPI operator as outlined in Llama-2 on Trainium, and there is an irrecoverable Neuron error in one of the nodes. Before the plugin is deployed, the training job will become stuck, resulting in wasted time and computational costs. With the plugin deployed, the node problem detector will proactively remove the problem node from the cluster. In the training scripts, it saves checkpoints periodically so that the training will resume from the previous checkpoint.

The following screenshot shows example logs from a distributed training job.

The training has been started. (You can ignore loss=nan for now; it’s a known issue and will be removed. For immediate use, refer to the reduced_train_loss metric.)

The following screenshot shows the checkpoint created at step 77.

Training stopped after one of the nodes has a problem at step 86. The error was injected manually for testing.

After the faulty node was detected and replaced by the Neuron plugin for node problem and recovery, the training process resumed at step 77, which was the last checkpoint.

Although Auto Scaling groups will stop unhealthy nodes, they may encounter issues preventing the launch of replacement nodes. In such cases, training jobs will stall and require manual intervention. However, the stopped node will not incur further charges on the associated EC2 instance.

If you want to take custom actions in addition to stopping instances, you can create CloudWatch alarms watching the metrics NeuronHasError_DMA_ERROR,NeuronHasError_HANG_ON_COLLECTIVES, NeuronHasError_HBM_UNCORRECTABLE_ERROR, NeuronHasError_SRAM_UNCORRECTABLE_ERROR, and NeuronHasError_NC_UNCORRECTABLE_ERROR, and use a CloudWatch Metrics Insights query like SELECT AVG(NeuronHasError_DMA_ERROR) FROM NeuronHealthCheck to sum up these values to evaluate the alarms. The following screenshots show an example.

Clean up

To clean up all the provisioned resources for this post, run the cleanup script:

# neuron-problem-detector-role-$CLUSTER_NAME
eksctl delete podidentityassociation 
--service-account-name node-problem-detector 
--namespace neuron-healthcheck-system 
--cluster $CLUSTER_NAME 
--region $AWS_REGION

# delete the EKS Cluster
cd data-on-eks/ai-ml/trainium-inferentia
./cleanup.sh

Conclusion

In this post, we showed how the Neuron problem detector and recovery DaemonSet for Amazon EKS works for EC2 instances powered by Trainium and AWS Inferentia. If you’re running Neuron based EC2 instances and using managed nodes or self-managed node groups, you can deploy the detector and recovery DaemonSet in your EKS cluster and benefit from improved reliability and fault tolerance of your machine learning training workloads in the event of node failure.


About the authors

Harish Rao is a senior solutions architect at AWS, specializing in large-scale distributed AI training and inference. He empowers customers to harness the power of AI to drive innovation and solve complex challenges. Outside of work, Harish embraces an active lifestyle, enjoying the tranquility of hiking, the intensity of racquetball, and the mental clarity of mindfulness practices.

Ziwen Ning is a software development engineer at AWS. He currently focuses on enhancing the AI/ML experience through the integration of AWS Neuron with containerized environments and Kubernetes. In his free time, he enjoys challenging himself with badminton, swimming and other various sports, and immersing himself in music.

Geeta Gharpure is a senior software developer on the Annapurna ML engineering team. She is focused on running large scale AI/ML workloads on Kubernetes. She lives in Sunnyvale, CA and enjoys listening to Audible in her free time.

Darren Lin is a Cloud Native Specialist Solutions Architect at AWS who focuses on domains such as Linux, Kubernetes, Container, Observability, and Open Source Technologies. In his spare time, he likes to work out and have fun with his family.

Read More