Elevate workforce productivity through seamless personalization in Amazon Q Business

Elevate workforce productivity through seamless personalization in Amazon Q Business

Personalization can improve the user experience of shopping, entertainment, and news sites by using our past behavior to recommend the products and content that best match our interests. You can also apply personalization to conversational interactions with an AI-powered assistant. For example, an AI assistant for employee onboarding could use what it knows about an employee’s work location, department, or job title to provide information that is more relevant to the employee. In this post, we explore how Amazon Q Business uses personalization to improve the relevance of responses and how you can align your use cases and end-user data to take full advantage of this capability.

Amazon Q Business is a fully managed generative AI-powered assistant that can answer questions, provide summaries, generate content, and complete tasks based on the data and information that is spread across your enterprise systems. Amazon Q Business provides more than 40 built-in connectors that make it effortless to connect the most popular enterprise data sources and systems into a unified and powerful search index that the AI assistant can use to help answer natural language questions from your workforce. This allows end-users to find the information and answers they’re looking for quickly, which leads to increased productivity and job satisfaction. Amazon Q Business preserves the access permissions in the source systems so that users are only able to access the information through Amazon Q Business that they have access to directly within these systems.

Solution overview

Responses are personalized by Amazon Q Business by determining if the user’s query could be enhanced by augmenting the query with known attributes of the user and transparently using the personalized query to retrieve documents from its search index. User attributes, such as work location, department, and job title, are made available to Amazon Q Business by the system used to authenticate user identities that is configured with the Amazon Q Business application. Depending on the documents available in the index, the personalized query should improve the relevancy of the returned documents, which in turn can improve the relevancy of the generated response based on those documents. The process by which user attributes flow to an Amazon Q Business application varies based on the identity federation mechanism used to authenticate your workforce for the application:

The following diagram illustrates the process by which user attributes flow to Amazon Q Business for both identity federation mechanisms.

The steps of the process are as follows:

  1. When a user accesses the Amazon Q Business web experience or a custom client that integrates with the Amazon Q Business API, they must be authenticated. If not already authenticated, the user is redirected to the IdP configured for the Amazon Q Business application.
  2. After the user authenticates with the IdP, they’re redirected back to the client with an authorization code. Then the Amazon Q Business web experience or custom client makes an API call to the IdP with the client secret to exchange the authorization code for an ID token. When an IAM IdP is configured for the Amazon Q Business application, the ID token includes the user attributes that are configured in the IdP. Otherwise, with IAM Identity Center, the user attributes are synchronized from the IdP to IAM Identity Center. This process only has to be done one time during the user’s session or when the user’s session expires.
  3. The user is now able to interact with the AI assistant by submitting a question.
  4. Before the Amazon Q Business web experience or custom client can send the user’s question to the Amazon Q Business ChatSync API, it must exchange the ID token for AWS credentials. If the Amazon Q Business application is configured with IAM Identity Center, the Amazon Q Business application or custom client calls the CreateTokenWithIAM API to exchange the ID token for an IAM Identity Center token. This token includes the user attributes synchronized from the IdP to IAM Identity Center as described earlier. If the Amazon Q Business application is configured with an IAM IdP, this step is skipped.
  5. The last step to obtain AWS credentials is to call AWS Secure Token Service (AWS STS). If the Amazon Q Business application is configured with IAM Identity Center, the AssumeRole API is called passing the IAM Identity Center token. For an Amazon Q Business application configured with an IAM IdP, the AssumeRoleWithSAML or AssumeRoleWithWebIdentity API is called depending on whether SAML 2.0 or OIDC is used for the provider. The credentials returned from AWS STS can be cached and reused until they expire.
  6. The Amazon Q Business web experience or custom client can now call the ChatSync API with the credentials obtained in the previous step using AWS Signature Version 4. Because the credentials include the user attributes configured in the IdP, they’re available to Amazon Q Business to personalize the user’s query.

Amazon Q Business personalization use case

To demonstrate how personalization works in practice, let’s take an example of internal training made available to employees of a multi-national company. Imagine you lead the training department for an enterprise company and you’re tasked with improving the access to training opportunities offered to employees. You’ve done a great job documenting this information for all locations where training is provided and published it on your company’s Microsoft SharePoint site, but the feedback from employees is that they don’t know where to find the information. The confusion stems from the fact that your company also publishes internal company information and documentation on Confluence, Box, and a wiki. Additionally, your department uses ServiceNow for training support, which has developed into another source of valuable but under-utilized information.

The first challenge to solve is discoverability of the information spread across these disparate and disconnected systems. Through the connectors described earlier, Amazon Q Business can bring together the information in these systems and provide a conversational user interface that allows employees to ask questions in natural language, such as, “What training is available?”

With the discoverability challenge solved, there is still an opportunity to further optimize the user experience. This is where personalization comes in. Consider the basic question, “What training is available?” from a user who works out of the San Francisco, CA, office. Based on this question, Amazon Q Business can find documents that describe the training classes available across all corporate locations, but lacks the knowledge of the user’s home office location to be more precise in its answer. Providing an answer based on the location, or even a blend of multiple locations, isn’t as accurate as if the answer were based on where the employee worked. The employee could be more explicit in their question by including their location, but the goal of AI assistants is to better understand the user’s intent and context to be able to provide the most accurate information possible for even the most basic questions. Knowing key information about the user allows Amazon Q Business to seamlessly personalize the retrieval of documents and therefore lead to a more accurate response. Let’s see how it works in more detail.

At the core of Amazon Q Business is a technique called Retrieval Augmented Generation (RAG). At a high level, RAG involves taking a user’s request and finding passages from a set of documents in a searchable index that are most similar to the request and then asking a large language model (LLM) to generate a response that provides an answer using the retrieved passages. Given the question, “What training is available?” and the number of locations for the company, the top document passages returned from the index and provided to the LLM may not even include the user’s location. Therefore, the more precise the query to the retrieval layer, the more accurate and relevant the ultimate response will be. For example, modifying the query to include details on the user’s location should result in document passages specific to the user being returned at or near the top of the list rather than buried further down the list.

Configure user attributes in your IdP

Let’s look at how you would configure your IdP to pass along the attributes of your users to your Amazon Q Business application. Regardless of the identity federation mechanism configured for your Amazon Q Business application, attributes for your users need to be maintained in your IdP’s directory. The following is a partial screenshot of some of the location-related fields available in the profile editor for the Okta IdP.

Besides the administrative UI for editing individual profiles, Okta also provides mechanisms for updating profiles in bulk or through APIs. These tools make it straightforward to keep your user profiles synchronized with source systems such as employee directories.

After your user profiles are updated in your IdP, the process for making user attributes available to your Amazon Q Business application varies based on the identity federation configuration.

Federation with IAM Identity Center

If you configure your Amazon Q Business application with IAM Identity Center (recommended) and you use an external IdP such as Okta or Entra ID to manage your workforce, you simply need to maintain user attributes in your IdP. Because IAM Identity Center supports the SCIM standard, you can set up user profiles and their attributes to be automatically synchronized with IAM Identity Center. After the users and attributes are synchronized to IAM Identity Center, they can be accessed by Amazon Q Business from either the web experience or through a custom client integration as described earlier.

A less common variation of using IAM Identity Center with Amazon Q Business that is suitable for basic testing is to use IAM Identity Center as the identity source (without an external IdP). In this case, you would add users and manage their attributes directly in IAM Identity Center through the AWS Management Console or the CreateUser and UpdateUser APIs.

Federation with IAM

If you configure your Amazon Q Business application to use IAM federation, user attributes are also maintained in your IdP. However, the attributes are passed to your Amazon Q Business application from your IdP using either a SAML 2.0 assertion or an OIDC claim depending on the provider type that you set up as your IAM IdP. Your IdP must be configured to pass the specific attributes that you intend to expose for personalization. How this configuration is done depends again on whether you’re using SAML 2.0 or OIDC. For this post, we describe how this is done in Okta. The process should be similar with other IdPs.

SAML 2.0 provider type

When you create a SAML 2.0 application in Okta for authenticating your users, you have the option to create attribute statements. The attribute statements are included in the SAML 2.0 assertion that is provided by Okta when a user authenticates. The first three attribute statements shown in the following table are required for SAML 2.0 authentication to work with Amazon Q Business. The others are examples of how you would pass optional attributes that can be used for personalization.

Name Name format Value
https://aws.amazon.com/SAML/Attributes/PrincipalTag:Email Unspecified user.email
https://aws.amazon.com/SAML/Attributes/Role Unspecified [WebExpRoleArn],[IdentityProviderArn]
https://aws.amazon.com/SAML/Attributes/RoleSessionName Unspecified user.email
https://aws.amazon.com/SAML/Attributes/PrincipalTag:countryCode Unspecified user.countryCode != null ? user.countryCode : “”
https://aws.amazon.com/SAML/Attributes/PrincipalTag:city Unspecified user.city != null ? user.city : “”
https://aws.amazon.com/SAML/Attributes/PrincipalTag:title Unspecified user.title != null ? user.title : “”
https://aws.amazon.com/SAML/Attributes/PrincipalTag:department Unspecified user.department != null ? user.department : “”

Where the attribute statement value uses the Okta Expression Language, Okta resolves the value expression with the actual value for the user. For example, user.email resolves to the user’s email address, and user.city != null ? user.city : "" resolves to the user’s city (as specified in their user profile) or an empty string if not specified. And because these values are passed in the SAML assertion, you can also include any custom attributes for your users that are specific to your business or domain that may be relevant to personalization.

For [WebExpRoleArn],[IdentityProviderArn], you must substitute [WebExpRoleArn] for the web experience role ARN for your Amazon Q Business application and [IdentityProviderArn] for the IAM IdP ARN that you created in IAM for this SAML provider.

OIDC provider type

When you create an OIDC application in Okta for authenticating your users, the location where you configure the user attributes to include in the OIDC claim is a bit different. For OIDC, you must add the user attributes you want to expose for personalization to the claim for the authorization server. AWS STS supports an access token or ID token type. In this post, we demonstrate the ID token type. For more details, see Build private and secure enterprise generative AI applications with Amazon Q Business using IAM Federation.

Complete the following steps:

  1. In Okta, choose Security, API in the navigation pane.
  2. Choose the authorization server (which may be default) and then Claims.
  3. If you don’t see a claim type of ID, choose Add Claim to create one.
  4. For Claim name, enter https://aws.amazon.com/tags.
  5. For Include in token type, choose Access Token or ID Token (we use ID Token in this post).
  6. For Value type, choose Expression.
  7. For Value, enter a JSON document that uses the Okta Expression Language to resolve attributes for the user. The full expression is as follows:
    {
       "principal_tags": {
          "Email": {user.email}, 
          "countryCode": {user.countryCode != null ? user.countryCode : ""}, 
          "city": {user.city != null ? user.city : ""},
          "title" {user.title != null ? user.title : ""},
          "department": {user.department != null ? user.department : ""}
       }
    } 

  8. Choose Create.

Again, you are not limited to just these fields. You can also include custom fields that apply to your use case and documents in the expression.

Enable personalization in Amazon Q Business

After you have your preferred authentication mechanism configured in your IdP, IAM, and Amazon Q Business, you’re ready to see how it impacts responses in your Amazon Q Business application. Although personalization is enabled by default for Amazon Q Business applications, you can control whether personalization is enabled on the Update Global Controls settings page for your Amazon Q Business application. If necessary, select Enable response personalization and choose Save.

Amazon Q Business personalization in action

Now you’re ready to see how Amazon Q Business personalizes responses for each user. We continue with the same use case of asking Amazon Q Business “What training is available?” The documents added to the Amazon Q Business index include internal training schedules available to all employees as Word documents for two corporate offices: San Francisco and London. In addition, two users were created in the IdP, where one user is based in the San Francisco office and the other is based in the London office. The city and country fields were populated as well as each user’s title. The San Francisco employee is a software programmer and the London employee is the Director of Marketing.

When signed in to the application using an incognito (private) window as the San Francisco employee, the question “What training is available?” produces the following response.

The response includes content on the training classes being held at the San Francisco office. The citation in the Sources section also confirms that the “September Training Curriculum at San Francisco” document was used to generate the response.

We can close the incognito window, open a new incognito window, sign in as the London employee, and ask the same question: “What training is available?” This time, the response provides information on the training classes being held at the London office and the citation refers to the London curriculum document.

For one final test, we disable personalization for the Amazon Q Business application on the Update Global Controls settings page for the Amazon Q Business application, wait a few minutes for the change to take effect, and then ask the same question in a new conversation.

This time, Amazon Q Business includes information on classes being held at both offices, which is confirmed by the citations pulling in both documents. Although the question is still answered, the user must parse through the response to pick out the portions that are most relevant to them based on their location.

Use cases for Amazon Q Business personalization

Amazon Q Business can be very effective in supporting a wide variety of use cases. However, not all of these use cases can be enhanced with personalization. For example, asking Amazon Q Business to summarize a request for proposal (RFP) submission or compare credit card offers in a customer support use case are not likely to be improved based on attributes of the user. Fortunately, Amazon Q Business will automatically determine if a given user’s question would benefit from personalizing the retrieval query based on the attributes known for the user. When thinking about enabling and optimizing personalization for your use case, consider the availability of user attributes and the composition of data in your Amazon Q Business index.

Working backward from the personalization effect you want to implement, you first need to determine if the required user attributes for your use case exist in your IdP. This may require importing and synchronizing this data into your IdP from another system, such as an employee directory or payroll system. Then you should consider the documents and data in your Amazon Q Business index to determine if they are optimized for personalized retrieval. That is, determine whether the documents in your index have content that will be readily found by the retrieval step given the user attributes in your IdP. For example, the documents used for the training class example in this post have the city mentioned in the document title as well as the document body. Because Amazon Q Business boosts matches against the document title by default, we are taking advantage of built-in relevance tuning to further influence the documents that match the user’s city.

In this post, we focused on the user’s work location and information that was location-specific to add value through personalization. In other words, we used the user’s work location to transparently find what’s most relevant to them nearby. Another useful area of use cases to explore may use the user’s job title or job level and find content that is specific to their role. As you explore the possibilities, the intersection of user information and the composition of the data in the corpus of documents in your enterprise data stores are the best place to start.

Conclusion

In this post, we demonstrated how to use personalization to improve the relevancy and usefulness of the responses provided by an AI-powered assistant. Personalization is not going to dramatically improve every interaction with Amazon Q Business, but when it’s thoughtfully applied to use cases and data sources where it can deliver value, it can build trust with end-users by providing responses that are more relevant and meaningful.

What use cases do you have where attributes for your users and the information in your data sources can allow Amazon Q Business to deliver a more personalized user experience? Try out the solution for yourself, and leave your feedback and questions in the comments.


About the Authors

James Jory is a Principal Solutions Architect for Amazon Q Business. He has interests in generative AI, personalization, and recommender systems and has a background in ecommerce, marketing technology, and customer data analytics. In his spare time, he enjoys camping and motor sports.

Nihal HarishNihal Harish is a Software Development Engineer at AWS AI. He is passionate about generative AI and reinforcement learning. Outside of work, he enjoys playing tennis, tending to his garden, and exploring new culinary recipes.

Pranesh Anubhav is a Software Development Manager for Amazon Personalize. He is passionate about designing machine learning systems to serve customers at scale. Outside of his work, he loves playing soccer and is an avid follower of Real Madrid.

Gaurush Hiranandani is an Applied Scientist at AWS AI, where his research spans the fields of statistical machine learning, with a particular focus on preference elicitation and recommender systems. He is deeply passionate about advancing the personalization of generative AI services at AWS AI, aiming to enhance user experiences through tailored, data-driven insights.

Harsh Singh is a Principal Product Manager Technical at AWS AI. Harsh enjoys building products that bring AI to software developers and everyday users to improve their productivity.

Read More

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 1

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 1

Building intelligent agents that can accurately understand and respond to user queries is a complex undertaking that requires careful planning and execution across multiple stages. Whether you are developing a customer service chatbot or a virtual assistant, there are numerous considerations to keep in mind, from defining the agent’s scope and capabilities to architecting a robust and scalable infrastructure.

This two-part series explores best practices for building generative AI applications using Amazon Bedrock Agents. Agents helps you accelerate generative AI application development by orchestrating multistep tasks. Agents use the reasoning capability of foundation models (FMs) to break down user-requested tasks into multiple steps. In addition, they use the developer-provided instruction to create an orchestration plan and then carry out the plan by invoking company APIs and accessing knowledge bases using Retrieval Augmented Generation (RAG) to provide an answer to the user’s request.

In Part 1, we focus on creating accurate and reliable agents. Part 2 discusses architectural considerations and development lifecycle practices.

Laying the groundwork: Collecting ground truth data

The foundation of any successful agent is high-quality ground truth data—the accurate, real-world observations used as reference for benchmarks and evaluating the performance of a model, algorithm, or system. For an agent application, before you start building, it’s crucial to collect a set of ground truth interactions or conversations that will drive the entire agent lifecycle. This data provides a benchmark for expected agent behavior, including the interaction with existing APIs, knowledge bases, and guardrails connected with the agent. This enables accurate testing and evaluation and helps identify edge cases and potential pitfalls.

To build a robust ground truth dataset, focus on gathering diverse examples that cover various user intents and scenarios. Your dataset should include the input and expected output for both simple and complex interactions. It’s important to regularly update and expand your dataset as you learn more about user behavior. Ground your data in real customer interactions that reflect actual use cases but be sure to de-identify and anonymize the data.

The following table shows a subset of ground truth collected for a banking assistant agent that knows the check_account_balance and book_appointments APIs, uses the knowledge base bankingFAQ, and is connected with the guardrail BlockInvestmentAdvice. This example presents only a few of the queries used to test the agent, including the session information used from other systems and passed to the agent using sessionAttributes. It’s also a best practice to collect any extra information that would be shared with the agent in a production system. That includes information shared with the action group execution function only (session attributes), information used to augment the prompt (prompt session attributes), and information used by the knowledge base (knowledge base configuration).

User Query Session Attributes Session prompt Attributes Expected Response API, Knowledge Bases and Guardrails invoked
What is my account balance? None None Could you please provide the number of the account that you would like to check the balance for? None
What is the balance for the account 1234? user id 111 None Your balance is X Action Group: check_account_balance(111, 1234)
How can I open an account? None None To open a new banking account with ANY_BANK Corporation, you need to gather the following information and documentation: 1. Valid government-issued ID (driver’s license, passport, etc.) 2. Proof of address (utility bill, lease agreement, etc.) 3. Social Security number or Individual Taxpayer Identification Number (ITIN) 4. Initial deposit amount (if required) You can then fill out an application online or in person at an ANY_BANK Corporation branch. Once approved, you can make your initial deposit and start using your new account. Knowledge Base: bankingFAQ
Should I invest in bitcoins? None None Sorry, your query violates our usage policies. We do not provide investment advices. To discuss the best investment advice for your current situation, please contact us on (XXX) XXX-XXXX and we will be happy to support you. Guardrail: BlockInvestmentAdvice
Could you make an appointment for tomorrow at 2pm? user id 111 Today: 09/03/2024 Certainly! We’ve booked an appointment for you tomorrow, September 4th, 2024, at 2pm. Your appointment ID is XXXX. Action Group: book_appointment(111, 09/04/2024)

Defining scope and sample interactions

Now that you have your ground truth data, the next step is to clearly define the scope of each agent, including tasks it should and shouldn’t handle, and outline clear expected sample user interactions. This process involves identifying primary functions and capabilities, limitations and out-of-scope tasks, expected input formats and types, and desired output formats and styles.

For instance, when considering an HR assistant agent, a possible scope would be the following:

Primary functions:

– Provide information on company HR policies

– Assist with vacation requests and time-off management

– Answer basic payroll questions

Out of scope:

– Handling sensitive employee data

– Making hiring or firing decisions

– Providing legal advice

Expected inputs:

– Natural language queries about HR policies

– Requests for time-off or vacation information

– Basic payroll inquires

Desired outputs:

– Clear and concise responses to policy questions

– Step-by-step guidance for vacation requests

– Completion of tasks for book a new vacation, retrieve, edit and delete an existing request

– Referrals to appropriate HR personnel for complex issues

– Creation of an HR ticket for questions where the agent is not able to respond

By clearly defining your agent’s scope, you set clear boundaries and expectations, which will guide your development process and help create a focused, reliable AI agent.

Architecting your solution: Building small and focused agents that interact with each other

When it comes to agent architecture, the principle “divide and conquer” holds true. In our experience, it has proven to be more effective to build small, focused agents that interact with each other rather than a single large monolithic agent. This approach offers improved modularity and maintainability, straightforward testing and debugging, flexibility to use different FMs for specific tasks, and enhanced scalability and extensibility.

For example, consider an HR assistant that helps internal employees in an organization and a payroll team assistant that supports the employees of the payroll team. Both agents have common functionality such as answering payroll policy questions and scheduling meetings between employees. Although the functionalities are similar, they differ in scope and permissions. For instance, the HR assistant can only reply to questions based on the internally available knowledge, whereas the payroll agents can also handle confidential information only available for the payroll employees. Additionally, the HR agents can schedule meetings between employees and their assigned HR representative, whereas the payroll agent schedules meetings between the employees on their team. In a single-agent approach, those functionalities are handled in the agent itself, resulting in the duplication of the action groups available to each agent, as shown in the following figure.

In this scenario, when something changes in the meetings action group, the change needs to be propagated to the different agents. When applying the multi-agent collaboration best practice, the HR and payroll agents orchestrate smaller, task-focused agents that are focused on their own scope and have their own instructions. Meetings are now handled by an agent itself that is reused between the two agents, as shown in the following figure.

When a new functionality is added to the meeting assistant agent, the HR agent and payroll agent only need to be updated to handle those functionalities. This approach can also be automated in your applications to increase the scalability of your agentic solutions. The supervisor agents (HR and payroll agents) can set the tone of your application as well as define how each functionality (knowledge base or sub-agent) of the agent should be used. That includes enforcing knowledge base filters and parameter constraints as part of the agentic application.

Crafting the user experience: Planning agent tone and greetings

The personality of your agent sets the tone for the entire user interaction. Carefully planning the tone and greetings of your agent is crucial for creating a consistent and engaging user experience. Consider factors such as brand voice and personality, target audience preferences, formality level, and cultural sensitivity.

For instance, a formal HR assistant might be instructed to address users formally, using titles and last names, while maintaining a professional and courteous tone throughout the conversation. In contrast, a friendly IT support agent could use a casual, upbeat tone, addressing users by their first names and even incorporating appropriate emojis and tech-related jokes to keep the conversation light and engaging.

The following is an example prompt for a formal HR assistant:

You are an HR AI Assistant, helping employees understand company policies and manage 
their benefits. Always address users formally, using titles (Mr., Ms., Dr., etc.) and last names. 
Maintain a professional and courteous tone throughout the conversation.

The following is an example prompt for a friendly IT support agent:

You're the IT Buddy, here to help with tech issues. 
Use a casual, upbeat tone and address users by their first names. 
Feel free to use appropriate emojis and tech-related jokes to keep the conversation light and engaging.

Make sure your agent’s tone aligns with your brand identity and remains constant across different interactions. When collaborating between multiple agents, you should set the tone across the application and enforce it over the different sub-agents.

Maintaining clarity: Providing unambiguous instructions and definitions

Clear communication is the cornerstone of effective AI agents. When defining instructions, functions, and knowledge base interactions, strive for unambiguous language that leaves no room for misinterpretation. Use simple, direct language and provide specific examples for complex concepts. Define clear boundaries between similar functions and implement confirmation mechanisms for critical actions. Consider the following example of clear vs. ambiguous instructions.

The following is an example ambiguous prompt

Check if the user has time off available and book it if possible.

The following is a clearer prompt:

1. Verify the user's available time-off balance using the `checkTimeOffBalance` function. 
2. If the requested time off is available, use the `bookTimeOff` function to reserve it. 
3. If the time off is not available, inform the user and suggest alternative dates. 
4. Always confirm with the user before finalizing any time-off bookings.

By providing clear instructions, you reduce the chances of errors and make sure your agent behaves predictably and reliably.

The same advice is valid when defining the functions of your action groups. Avoid ambiguous function names and definitions and set clear descriptions for its parameters. The following figure shows how to change the name, description, and parameters of two functions in an action group to get the user details and information based on what is actually returned by the functions and the expected value formatting for the user ID.

Finally, the knowledge base instructions should clearily state what is available in the knowledge base and when to use it to answer user queries.

The following is an ambiguous prompt:

Knowledge Base 1: use this knowledge base to get information from documents

The following is a clearer prompt:

Knowledge Base 1: Knowledge base containing insurance policies and internal documents. Use this knowledge base when the user asks about a policy term or regarding an internal system

Using organizational knowledge: Integrating knowledge bases

To make sure you provide your agents with enterprise knowledge, integrate them with your organization’s existing knowledge bases. This allows your agents to use vast amounts of information and provide more accurate, context-aware responses. By accessing up-to-date organizational data, your agents can improve response accuracy and relevance, cite authoritative sources, and reduce the need for frequent model updates.

Complete the following steps when integrating a knowledge base with Amazon Bedrock:

  1. Index your documents into a vector database using Amazon Bedrock Knowledge Bases.
  2. Configure your agent to access the knowledge base during interactions.
  3. Implement citation mechanisms to reference source documents in responses.

Regularly update your knowledge base to make sure your agent has consistent access to the most current information. This can achieved by implementing event-based synchronization of your knowledge base data sources using the StartIngestionJob API and an Amazon EventBridge rule that is invoked periodically or based on updates of files in the knowledge base Amazon Simple Storage Service (Amazon S3) bucket.

Integrating Amazon Bedrock Knowledge Bases with your agent will allow you to add semantic search capabilities to your application. By using the knowledgeBaseConfigurations field in your agent’s SessionState during the InvokeAgent request, you can control how your agent interacts with your knowledge base by setting the desired number of results and any necessary filters.

Defining success: Establishing evaluation criteria

To measure the effectiveness of your AI agent, it’s essential to define specific evaluation criteria. These metrics will help you assess performance, identify areas for improvement, and track progress over time.

Consider the following key evaluation metrics:

  • Response accuracy – This metric measures how your responses compare to your ground truth data. It provides information such as if the answers are correct and if the agent shows good performance and high quality.
  • Task completion rate – This measures the success rate of the agent. The core idea of this metric is to measure the percentage or proportion of the conversations or user interactions where the agent was able to successfully complete the requested tasks and fulfill the user’s intent.
  • Latency or response time – This metric measures how long a task took to run and the response time. Essentially, it measures how quickly the agent can provide a response or output after receiving an input or query. You can also set intermediate metrics that measure how long each step of the agent trace takes to run to identify the steps that need to be optimized in your system.
  • Conversation efficiency – These measures how efficiently the conversation was able to collect the required information.
  • Engagement – These measures how well the agent can understand the user’s intent, provide relevant and natural responses, and maintain an engagement with back-and-forth conversational flow.
  • Conversation coherence – This metric measures the logical progression and continuity between the responses. It checks if the context and relevance are kept during the session and if the appropriate pronouns and references are used.

Furthermore, you should define your use case-specific evaluation metrics that determine how well the agent is fulfilling the tasks for your use case. For instance, for the HR use case, a possible custom metric could be the number of tickets created, because those are created when the agent can’t answer the question by itself.

Implementing a robust evaluation process involves creating a comprehensive test dataset based on your ground truth data, developing automated evaluation scripts to measure quantitative metrics, implementing A/B testing to compare different agent versions or configurations, and establishing a regular cadence for human evaluation of qualitative factors. Evaluation is an ongoing process, so you should continuously refine your criteria and measurement methods as you learn more about your agent’s performance and user needs.

Using human evaluation

Although automated metrics are valuable, human evaluation plays a crucial role in assessing and improving your AI agent’s performance. Human evaluators can provide nuanced feedback on aspects that are difficult to quantify automatically, such as assessing natural language understanding and generation, evaluating the appropriateness of responses in context, identifying potential biases or ethical concerns, and providing insights into user experience and satisfaction.

To effectively use human evaluation, consider the following best practices:

  • Create a diverse panel of evaluators representing different perspectives
  • Develop clear evaluation guidelines and rubrics
  • Use a mix of expert evaluators (such as subject matter experts) and representative end-users
  • Collect quantitative ratings and qualitative feedback
  • Regularly analyze evaluation results to identify trends and areas for improvement

Continuous improvement: Testing, iterating, and refining

Building an effective AI agent is an iterative process. Now that you have a working prototype, it’s crucial to test extensively, gather feedback, and continuously refine your agent’s performance. This process should include comprehensive testing using your ground truth dataset; real-world user testing with a beta group; analysis of agent logs and conversation traces; regular updates to instructions, function definitions, and prompts; and performance comparison across different FMs.

To achieve thorough testing, consider using AI to generate diverse test cases. The following is an example prompt for generating HR assistant test scenarios:

Generate 10 diverse conversation scenarios between an employee and an HR AI assistant. Include a mix of common requests (e.g., vacation booking, policy questions) and edge cases (e.g., complex situations, out-of-scope queries). For each scenario, provide:
1. The initial user query
2. Expected agent responses
3. Potential follow-up questions
4. Desired final outcomes

One of the best tools of the testing phase is the agent trace. The trace provides you with the prompts used by the agent in each step taken during the agent’s orchestration. It gives insights on the agent’s chain of thought and reasoning process. You can enable the trace in your InvokeAgent call during the test process and disable it after your agent has been validated.

The next step after collecting a ground truth dataset is to evaluate the agent’s behavior. You first need to define evaluation criteria for assessing the agent’s behavior. For the HR assistant example, you can create a test dataset that compares the results provided by your agent with the results obtained by directly querying the vacations database. You can then manually evaluate the agent behavior using human evaluation, or you can automate the evaluation using agent evaluation frameworks such as Agent Evaluation. If model invocation logging is enabled, Amazon Bedrock Agents will also give you Amazon CloudWatch logs. You can use those logs to validate your agent’s behavior, debug unexpected outputs, and adjust the agent accordingly.

The last step of the agent testing phase is to plan for A/B testing groups during the deployment stage. You should define different aspects of agent behavior, such as formal or informal HR assistant tone, that can be tested with a smaller set of your user group. You can then make different agent versions available for each group during initial deployments and evaluate the agent behavior for each group. Amazon Bedrock Agents has built-in versioning capabilities to help you with this key part of testing.

Conclusions

Following these best practices and continuously refining your approach can significantly contribute to your success in developing powerful, accurate, and user-oriented AI agents using Amazon Bedrock. In Part 2 of this series, we explore architectural considerations, security best practices, and strategies for scaling your AI agents in production environments.

By following these best practices, you can build secure, accurate, scalable, and responsible generative AI applications using Amazon Bedrock. For examples to get started, check out the Amazon Bedrock Agents GitHub repository.

To learn more about Amazon Bedrock Agents, you can get started with the Amazon Bedrock Workshop and the standalone Amazon Bedrock Agents Workshop, which provides a deeper dive. Additionally, check out the service introduction video from AWS re:Invent 2023.


About the Authors

Maira Ladeira Tanke is a Senior Generative AI Data Scientist at AWS. With a background in machine learning, she has over 10 years of experience architecting and building AI applications with customers across industries. As a technical lead, she helps customers accelerate their achievement of business value through generative AI solutions on Amazon Bedrock. In her free time, Maira enjoys traveling, playing with her cat, and spending time with her family someplace warm.

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build generative AI solutions. His focus since early 2023 has been leading solution architecture efforts for the launch of Amazon Bedrock, the flagship generative AI offering from AWS for builders. Mark’s work covers a wide range of use cases, with a primary interest in generative AI, agents, and scaling ML across the enterprise. He has helped companies in insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services. Mark holds six AWS certifications, including the ML Specialty Certification.

Navneet Sabbineni is a Software Development Manager at AWS Bedrock. With over 9 years of industry experience as a software developer and manager, he has worked on building and maintaining scalable distributed services for AWS, including generative AI services like Amazon Bedrock Agents and conversational AI services like Amazon Lex. Outside of work, he enjoys traveling and exploring the Pacific Northwest with his family and friends.

Monica Sunkara is a Senior Applied Scientist at AWS, where she works on Amazon Bedrock Agents. With over 10 years of industry experience, including 6 years at AWS, Monica has contributed to various AI and ML initiatives such as Alexa Speech Recognition, Amazon Transcribe, and Amazon Lex ASR. Her work spans speech recognition, natural language processing, and large language models. Recently, she worked on adding function calling capabilities to Amazon Titan text models. Monica holds a degree from Cornell University, where she conducted research on object localization under the supervision of Prof. Andrew Gordon Wilson before joining Amazon in 2018.

Read More

AWS recognized as a first-time Leader in the 2024 Gartner Magic Quadrant for Data Science and Machine Learning Platforms

AWS recognized as a first-time Leader in the 2024 Gartner Magic Quadrant for Data Science and Machine Learning Platforms

Over the last 18 months, AWS has announced more than twice as many machine learning (ML) and generative artificial intelligence (AI) features into general availability than the other major cloud providers combined. This accelerated innovation is enabling organizations of all sizes, from disruptive AI startups like Hugging Face, AI21 Labs, and Articul8 AI to industry leaders such as NASDAQ and United Airlines, to unlock the transformative potential of generative AI. By providing a secure, high-performance, and scalable set of data science and machine learning services and capabilities, AWS empowers businesses to drive innovation through the power of AI.

At the heart of this innovation are Amazon Bedrock and Amazon SageMaker, both of which were mentioned in the recent Gartner Data Science and Machine Learning (DSML) Magic Quadrant evaluation. These services play a pivotal role in addressing diverse customer needs across the generative AI journey.

Amazon SageMaker, the foundational service for ML and generative AI model development, provides the fine-tuning and flexibility that makes it simple for data scientists and machine learning engineers to build, train, and deploy machine learning and foundation models (FMs) at scale. For application developers, Amazon Bedrock is the simplest way to build and scale generative AI applications with FMs for a wide variety of use cases. Whether leveraging the best FMs out there or importing custom models from SageMaker, Bedrock equips development teams with the tools they need to accelerate innovation.

We believe continued innovations for both services and our positioning as a Leader in the 2024 Gartner Data Science and Machine Learning (DSML) Magic Quadrant reflects our commitment to meeting evolving customer needs, particularly in data science and ML. In our opinion, this recognition, coupled with our recent recognition in the Cloud AI Developer Services (CAIDS) Magic Quadrant, solidifies AWS as a provider of innovative AI solutions that drive business value and competitive advantage.

Review the Gartner Magic Quadrant and Methodology

For Gartner, the DSML Magic Quadrant research methodology provides a graphical competitive positioning of four types of technology providers in fast-growing markets: Leaders, Visionaries, Niche Players and Challengers. As companion research, Gartner Critical Capabilities notes provide deeper insight into the capability and suitability of providers’ IT products and services based on specific or customized use cases.

The following figure highlights where AWS lands in the DSML Magic Quadrant.

Access a complimentary copy of the full report to see why Gartner positioned AWS as a Leader, and dive deep into the strengths and cautions of AWS.

Further detail on Amazon Bedrock and Amazon SageMaker

Amazon Bedrock provides a straightforward way to build and scale applications with large language models (LLMs) and foundation models (FMs), empowering you to build generative AI applications with security and privacy. With Amazon Bedrock, you can experiment with and evaluate high performing FMs for your use case, import custom models, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that run tasks using your enterprise systems and data sources. Tens of thousands of customers across multiple industries are deploying new generative AI experiences for diverse use cases.

Amazon SageMaker is a fully managed service that brings together a broad set of tools to enable high-performance, low-cost ML for any use case. You can access a wide-ranging choices of ML tools, fully managed and scalable infrastructure, repeatable and responsible ML workflows and the power of human feedback across the ML lifecycle, including sophisticated tools that make it straightforward to work with data like Amazon SageMaker Canvas and Amazon SageMaker Data Wrangler.

In addition, Amazon SageMaker helps data scientists and ML engineers build FMs from scratch, evaluate and customize FMs with advanced techniques, and deploy FMs with fine-grained controls for generative AI use cases that have stringent requirements on accuracy, latency, and cost. Hundreds of thousands of customers from Perplexity to Thomson Reuters to Workday use SageMaker to build, train, and deploy ML models, including LLMs and other FMs.

Gartner does not endorse any vendor, product or service depicted in its research publications and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from AWS.

GARTNER is a registered trademark and service mark of Gartner and Magic Quadrant is a registered trademark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and are used herein with permission. All rights reserved.


About the author

Susanne Seitinger leads AI and ML product marketing at Amazon Web Services (AWS), including the introduction of critical generative AI services like Amazon Bedrock as well as coordinating generative AI marketing activities across AWS. Prior to AWS, Susanne was the director of public sector marketing at Verizon Business Group, and previously drove public sector marketing in the United States for Signify, after holding various positions in R&D, innovation, and segment management and marketing. She holds a BA from Princeton University, as well as a master’s in city planning and a PhD from MIT.

Read More

Build a serverless voice-based contextual chatbot for people with disabilities using Amazon Bedrock

Build a serverless voice-based contextual chatbot for people with disabilities using Amazon Bedrock

At Amazon and AWS, we are always finding innovative ways to build inclusive technology. With voice assistants like Amazon Alexa, we are enabling more people to ask questions and get answers on the spot without having to type. Whether you’re a person with a motor disability, juggling multiple tasks, or simply away from your computer, getting search results without typing is a valuable feature. With modern voice assistants, you can now ask your questions conversationally and get verbal answers instantly.

In this post, we discuss voice-guided applications. Specifically, we focus on chatbots. Chatbots are no longer a niche technology. They are now ubiquitous on customer service websites, providing around-the-clock automated assistance. Although AI chatbots have been around for years, recent advances of large language models (LLMs) like generative AI have enabled more natural conversations. Chatbots are proving useful across industries, handling both general and industry-specific questions. Voice-based assistants like Alexa demonstrate how we are entering an era of conversational interfaces. Typing questions already feels cumbersome to many who prefer the simplicity and ease of speaking with their devices.

We explore how to build a fully serverless, voice-based contextual chatbot tailored for individuals who need it. We also provide a sample chatbot application. The application is available in the accompanying GitHub repository. We create an intelligent conversational assistant that can understand and respond to voice inputs in a contextually relevant manner. The AI assistant is powered by Amazon Bedrock. This chatbot is designed to assist users with various tasks, provide information, and offer personalized support based on their unique requirements. For our LLM, we use Anthropic Claude on Amazon Bedrock.

We demonstrate the process of integrating Anthropic Claude’s advanced natural language processing capabilities with the serverless architecture of Amazon Bedrock, enabling the deployment of a highly scalable and cost-effective solution. Additionally, we discuss techniques for enhancing the chatbot’s accessibility and usability for people with motor disabilities. The aim of this post is to provide a comprehensive understanding of how to build a voice-based, contextual chatbot that uses the latest advancements in AI and serverless computing.

We hope that this solution can help people with certain mobility disabilities. A limited level of interaction is still required, and specific identification of start and stop talking operations is required. In our sample application, we address this by having a dedicated Talk button that performs the transcription process while being pressed.

For people with significant motor disabilities, the same operation can be implemented with a dedicated physical button that can be pressed by a single finger or another body part. Alternatively, a special keyword can be said to indicate the beginning of the command. This approach is used when you communicate with Alexa. The user always starts the conversation with “Alexa.”

Solution overview

The following diagram illustrates the architecture of the solution.

Architecture of serverless components of the solution

To deploy this architecture, we need managed compute that can host the web application, authentication mechanisms, and relevant permissions. We discuss this later in the post.

All the services that we use are serverless and fully managed by AWS. You don’t need to provision the compute resources. You only consume the services through their API. All the calls to the services are made directly from the client application.

The application is a simple React application that we create using the Vite build tool. We use the AWS SDK for JavaScript to call the services. The solution uses the following major services:

  • Amazon Polly is a service that turns text into lifelike speech.
  • Amazon Transcribe is an AWS AI service that makes it straightforward to convert speech to text.
  • Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) along with a broad set of capabilities that you need to build generative AI applications.
  • Amazon Cognito is an identity service for web and mobile apps. It’s a user directory, an authentication server, and an authorization service for OAuth 2.0 access tokens and AWS credentials.

To consume AWS services, the user needs to obtain temporary credentials from AWS Identity and Access Management (IAM). This is possible due to the Amazon Cognito identity pool, which acts as a mediator between your application user and IAM services. The identity pool holds the information about the IAM roles with all permissions necessary to run the solution.

Amazon Polly and Amazon Transcribe don’t require additional setup from the client aside from what we have described. However, Amazon Bedrock requires named user authentication. This means that having an Amazon Cognito identity pool is not enough—you also need to use the Amazon Cognito user pool, which allows you to define users and bind them to the Amazon Cognito identity pool. To understand better how Amazon Cognito allows external applications to invoke AWS services, refer to refer to Secure API Access with Amazon Cognito Federated Identities, Amazon Cognito User Pools, and Amazon API Gateway.

The heavy lifting of provisioning the Amazon Cognito user pool and identity pool, including generating the sign-in UI for the React application, is done by AWS Amplify. Amplify consists of a set of tools (open source framework, visual development environment, console) and services (web application and static website hosting) to accelerate the development of mobile and web applications on AWS. We cover the steps of setting Amplify in the next sections.

Prerequisites

Before you begin, complete the following prerequisites:

  1. Make sure you have the following installed:
  2. Create an IAM role to use in the Amazon Cognito identity pool. Use the least privilege principal to provide only the minimum set of permissions needed to run the application.
    • To invoke Amazon Bedrock, use the following code:
      {
      					  "Version": "2012-10-17",
      					  "Statement": [
      						{
      						  "Sid": "VisualEditor1",
      						  "Effect": "Allow",
      						  "Action": "bedrock:InvokeModel",
      						  "Resource": "*"
      						}
      					  ]
      					}

    • To invoke Amazon Polly, use the following code:
      {
      					  "Version": "2012-10-17",
      					  "Statement": [
      						{
      						  "Sid": "VisualEditor2",
      						  "Effect": "Allow",
      						  "Action": "polly:SynthesizeSpeech",
      						  "Resource": "*"
      						}
      					  ]
      					}

    • To invoke Amazon Transcribe, use the following code:
      {
      				  "Version": "2012-10-17",
      				  "Statement": [
      					{
      					  "Sid": "VisualEditor3",
      					  "Effect": "Allow",
      					  "Action": "transcribe:StartStreamTranscriptionWebSocket",
      					  "Resource": "*"
      					}
      				  ]
      				}

The full policy JSON should look as follows:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "bedrock:InvokeModel",
      "Resource": "*"
    },
    {
      "Sid": "VisualEditor2",
      "Effect": "Allow",
      "Action": "polly:SynthesizeSpeech",
      "Resource": "*"
    },
    {
      "Sid": "VisualEditor3",
      "Effect": "Allow",
      "Action": "transcribe:StartStreamTranscriptionWebSocket",
      "Resource": "*"
    }
  ]
}
  1. Run the following command to clone the GitHub repository:
    git clone https://github.com/aws-samples/serverless-conversational-chatbot.git

  2. To use Amplify, refer to Set up Amplify CLI to complete the initial setup.
  3. To be consistent with the values that you use later in the instructions, call your AWS profile amplify when you see the following prompt.
    Creation of the AWS profile "amplify"
  4. Create the role amplifyconsole-backend-role with the AdministratorAccess-Amplify managed policy, which allows Amplify to create the necessary resources.
    IAM Role with "AdministratorAccess-Amplify" policy
  5. For this post, we use the Anthropic Claude 3 Haiku LLM. To enable the LLM in Amazon Bedrock, refer to Access Amazon Bedrock foundation models.

Deploy the solution

There are two options to deploy the solution:

  • Use Amplify to deploy the application automatically
  • Deploy the application manually

We provide the steps for both options in this section.

Deploy the application automatically using Amplify

Amplify can deploy the application automatically if it’s stored in GitHub, Bitbucket, GitLab, or AWS CodeCommit. Upload the application that you downloaded earlier to your preferred repository (from the aforementioned options). For instructions, see Getting started with deploying an app to Amplify Hosting.

You can now continue to the next section of this post to set up IAM permissions.

Deploy the application manually

If you don’t have access to one of the storage options that we mentioned, you can deploy the application manually. This can also be useful if you want to modify the application to better fit your use case.

We tested the deployment on AWS Cloud9, a cloud integrated development environment (IDE) for writing, running, and debugging code, with Ubuntu Server 22.04 and Amazon Linux 2023.

We use the Visual Studio Code IDE and run all the following commands directly in the terminal window inside the IDE, but you can also run the commands in the terminal of your choice.

  1. From the directory where you checked out the application on GitHub, run the following command:
    cd serverless-conversational-chatbot

  2. Run the following commands:
    npm i
    
    amplify init

  3. Follow the prompts as shown in the following screenshot.
    • For authentication, choose the AWS profile amplify that you created as part of the prerequisite steps.
      Initial AWS Amplify setup in React application: 1. Do you want to use an existing environment? No 2. Enter a name for the environment: sampleenv 3. Select the authentication method you want to use: AWS Profile 4. Please choose the profile you want to use: amplify
    • Two new files will appear in the project under the src folder:
      • amplifyconfiguration.json
      • aws-exports.js

      New objects created by AWS Amplify: 1. aws-exports.js 2. amplifyconfiguration.json

  1. Next run the following command:
    amplify configure project

Then select “Project Information”

Project Configuration of AWS Amplify in React Applications

  1.  Enter the following information:
    Which setting do you want to configure? Project information
    
    Enter a name for the project: servrlsconvchat
    
    Choose your default editor: Visual Studio Code
    
    Choose the type of app that you're building: javascript
    
    What javascript framework are you using: react
    
    Source Directory Path: src
    
    Distribution Directory Path: dist
    
    Build Command: npm run-script build
    
    Start Command: npm run-script start

You can use an existing Amazon Cognito identity pool and user pool or create new objects.

  1. For our application, run the following command:
    amplify add auth

If you get the following message, you can ignore it:

Auth has already been added to this project. To update run amplify update auth
  1. Choose Default configuration.
    Selecting "default configuration" when adding authentication objects
  2. Accept all options proposed by the prompt.
  3. Run the following command:
    amplify add hosting

  4. Choose your hosting option.

You have two options to host the application. The application can be hosted to the Amplify console or to Amazon Simple Storage Service (Amazon S3) and then exposed through Amazon CloudFront.

Hosting with the Amplify console differs from CloudFront and Amazon S3. The Amplify console is a managed service providing continuous integration and delivery (CI/CD) and SSL certificates, prioritizing swift deployment of serverless web applications and backend APIs. In contrast, CloudFront and Amazon S3 offer greater flexibility and customization options, particularly for hosting static websites and assets with features like caching and distribution. CloudFront and Amazon S3 are preferable for intricate, high-traffic web applications with specific performance and security needs.

For this post, we use the Amplify console. To learn more about the deployment with Amazon S3 and Amazon CloudFront, refer to documentation.
Selecting the deployment option for the React application on the Amplify Console. Selected option: Hosting with Amplify Console

Now you’re ready to publish the application. There is an option to publish the application to GitHub to support CI/CD pipelines. Amplify has built-in integration with GitHub and can redeploy the application automatically when you push the changes. For simplicity, we use manual deployment.

  1. Choose Manual deployment.
    Selecting "Manual Deployment" when publishing the project
  2. Run the following command:
    amplify publish

After the application is published, you will see the following output. Note down this URL to use in a later step.
Result of the Deployment of the React Application on the Amplify Console. The URL that the user should use to enter the Amplify application

  1. Log in to the Amplify console, navigate to the servrlsconvchat application, and choose General under App settings in the navigation pane.
    Service Role attachment to the deployed application. First step. Select the deployed application. Seelct “General” option
  2. Edit the app settings and enter amplifyconsole-backend-role for Service role (you created this role in the prerequisites section).
    Service Role attachment to the deployed application. Second step. Setting “amplifyconsole-backend-role” in the “Service role” field

Now you can proceed to the next section to set up IAM permissions.

Configure IAM permissions

As part of the publishing method you completed, you provisioned a new identity pool. You can view this on the Amazon Cognito console, along with a new user pool. The names will be different from those presented in this post.

As we explained earlier, you need to attach policies to this role to allow interaction with Amazon Bedrock, Amazon Polly, and Amazon Transcribe. To set up IAM permissions, complete the following steps:

  1. On the Amazon Cognito console, choose Identity pools in the navigation pane.
  2. Navigate to your identity pool.
  3. On the User access tab, choose the link for the authenticated role.
    Identifying the IAM Authentication Role in the Cognitive Identity Pool. Select “Identity pools” option in the console. Select “User access” tab. Click on the link under “Authentication role”
  4. Attach the policies that you defined in the prerequisites section.
    IAM Policies Attached to Cognito Identity Pool Authenticated Roles. Textual data presaented in “Prerequisites” section, item 2.

Amazon Bedrock can only be used with a named user, so we create a sample user in the Amazon Cognito user pool that was provisioned as part of the application publishing process.

  1. On the user pool details page, on the Users tab, choose Create user.
    User Creation in the Cognito User Pool. Select relevant user pool in “User pools” section. Select “Users” tab. Click on “Create user” button
  2. Provide your user information.
    Sample user definition in the Cognito User Pool. Enter email address and temporary password.

You’re now ready to run the application.

Use the sample serverless application

To access the application, navigate to the URL you saved from the output at the end of the application publishing process. Sign in to the application with the user you created in the previous step. You might be asked to change the password the first time you sign in.
Application Login Page. Enter user name and password

Use the Talk button and hold it while you’re asking the question. (We use this approach for the simplicity of demonstrating the abilities of the tool. For people with motor disabilities, we propose using a dedicated button that can be operated with different body parts, or a special keyword to initiate the conversation.)

When you release the button, the application sends your voice to Amazon Transcribe and returns the transcription text. This text is used as an input for an Amazon Bedrock LLM. For this example, we use Anthropic Claude 3 Haiku, but you can modify the code and use another model.

The response from Amazon Bedrock is displayed as text and is also spoken by Amazon Polly.
Instructions on how to invoke the "Talk" operation, by using “Talk” operation

The conversation history is also stored. This means that you can ask follow-up questions, and the context of the conversation is preserved. For example, we asked, “What is the most famous tower there?” without specifying the location, and our chatbot was able to understand that the context of the question is Paris based on our previous question.
Demonstration of context preservation during conversation. Continues question-answer conversation with chatbot.

We store the conversation history inside a JavaScript variable, which means that if you refresh the page, the context will be lost. We discuss how to preserve the conversation context in a persistent way later in this post.

To identify that the transcription process is happening, choose and hold the Talk button. The color of the button changes and a microphone icon appears.

"Talk" operation indicator. “Talk” button changes color to orche

Clean up

To clean up your resources, run the following command from the same directory where you ran the Amplify commands:

amplify delete

Result of the "Cleanup" operation after running “amplify delete” command

This command removes the Amplify settings from the React application, Amplify resources, and all Amazon Cognito objects, including the IAM role and Amazon Cognito user pool’s user.

Conclusion

In this post, we presented how to create a fully serverless voice-based contextual chatbot using Amazon Bedrock with Anthropic Claude.

This serves a starting point for a serverless and cost-effective solution. For example, you could extend the solution to have persistent conversational memory for your chats, such as Amazon DynamoDB. If you want to use a Retrieval Augmented Generation (RAG) approach, you can use Amazon Bedrock Knowledge Bases to securely connect FMs in Amazon Bedrock to your company data.

Another approach is to customize the model you use in Amazon Bedrock with your own data using fine-tuning or continued pre-training to build applications that are specific to your domain, organization, and use case. With custom models, you can create unique user experiences that reflect your company’s style, voice, and services.

For additional resources, refer to the following:


About the Author

Michael Shapira is a Senior Solution Architect covering general topics in AWS and part of the AWS Machine Learning community. He has 16 years’ experience in Software Development. He finds it fascinating to work with cloud technologies and help others on their cloud journey.

Eitan Sela is a Machine Learning Specialist Solutions Architect with Amazon Web Services. He works with AWS customers to provide guidance and technical assistance, helping them build and operate machine learning solutions on AWS. In his spare time, Eitan enjoys jogging and reading the latest machine learning articles.

Read More

Maintain access and consider alternatives for Amazon Monitron

Maintain access and consider alternatives for Amazon Monitron

Amazon Monitron, the Amazon Web Services (AWS) machine learning (ML) service for industrial equipment condition monitoring, will no longer be available to new customers effective October 31, 2024. Existing customers of Amazon Monitron will be able to purchase devices and use the service as normal. We will continue to sell devices until July 2025 and will honor the 5-year device warranty, including service support. AWS continues to invest in security, availability, and performance improvements for Amazon Monitron, but we do not plan to introduce new features to Amazon Monitron.

This post discusses how customers can maintain access to Amazon Monitron after it is closed to new customers and what some alternatives are to Amazon Monitron.

Maintaining access to Amazon Monitron

Customers will be considered an existing customer if they have commissioned an Amazon Monitron sensor through a project any time in the 30 days prior to October 31, 2024. In order to maintain access to the service after October 31, 2024, customers should create a project and commission at least one sensor.

For any questions or support needed, you may contact your assigned account manager, solutions architect, or create a case from the AWS Management Console.

Ordering Amazon Monitron hardware

For existing Amazon business customers, we will allowlist your account with the existing Amazon Monitron devices. For existing Amazon.com retail customers, the Amazon Monitron team will provide specific ordering instructions according to individual request.

Alternatives to Amazon Monitron

For customers interested in an alternative for your condition monitoring needs, we recommend exploring alternative solutions provided by our AWS Partners: Tactical Edge, IndustrAI, and Factory AI.

Summary

While new customers will no longer have access to Amazon Monitron after October 31, 2024, AWS offers a range of partner solutions through the AWS Partner Network finder. Customers should explore these options to understand what works best for their specific needs.

More details can be found in the following resources at AWS Partner Network.


About the author

Stuart Gillen is a Sr. Product Manager for Monitron, at AWS. Stuart has held a variety of roles in engineering management, business development, product management, and consulting. Most of his career has been focused on industrial applications specifically in reliability practices, maintenance systems, and manufacturing.

Read More

Import a question answering fine-tuned model into Amazon Bedrock as a custom model

Import a question answering fine-tuned model into Amazon Bedrock as a custom model

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Common generative AI use cases, including but not limited to chatbots, virtual assistants, conversational search, and agent assistants, use FMs to provide responses. Retrieval Augment Generation (RAG) is a technique to optimize the output of FMs by providing context around the questions for these use cases. Fine-tuning the FM is recommended to further optimize the output to follow the brand and industry voice or vocabulary.

Custom Model Import for Amazon Bedrock, in preview now, allows you to import customized FMs created in other environments, such as Amazon SageMaker, Amazon Elastic Compute Cloud (Amazon EC2) instances, and on premises, into Amazon Bedrock. This post is part of a series that demonstrates various architecture patterns for importing fine-tuned FMs into Amazon Bedrock.

In this post, we provide a step-by-step approach of fine-tuning a Mistral model using SageMaker and import it into Amazon Bedrock using the Custom Import Model feature. We use the OpenOrca dataset to fine-tune the Mistral model and use the SageMaker FMEval library to evaluate the fine-tuned model imported into Amazon Bedrock.

Key Features

Some of the key features of Custom Model Import for Amazon Bedrock are:

  1. This feature allows you to bring your fine-tuned models and leverage the fully managed serverless capabilities of Amazon Bedrock
  2. Currently we are supporting Llama 2, Llama 3, Flan, Mistral Model architectures using this feature with a precisions of FP32, FP16 and BF16 with further quantizations coming soon.
  3. To leverage this feature you can run the import process (covered later in the blog) with your model weights being in Amazon Simple Storage Service (Amazon S3).
  4. You can even leverage your models created using Amazon SageMaker by referencing the Amazon SageMaker model Amazon Resource Names (ARN) which provides for a seamless integration with SageMaker.
  5. Amazon Bedrock will automatically scale your model as your traffic pattern increases and when not in use, scale your model down to 0 thus reducing your costs.

Let us dive into a use-case and see how easy it is to use this feature.

Solution overview

At the time of writing, the Custom Model Import feature in Amazon Bedrock supports models following the architectures and patterns in the following figure.

In this post, we walk through the following high-level steps:

  1. Fine-tune the model using SageMaker.
  2. Import the fine-tuned model into Amazon Bedrock.
  3. Test the imported model.
  4. Evaluate the imported model using the FMEval library.

The following diagram illustrates the solution architecture.

The process includes the following steps:

  1. We use a SageMaker training job to fine-tune the model using a SageMaker JupyterLab notebook. This training job reads the dataset from Amazon Simple Storage Service (Amazon S3) and writes the model back into Amazon S3. This model will then be imported into Amazon Bedrock.
  2. To import the fine-tuned model, you can use the Amazon Bedrock console, the Boto3 library, or APIs.
  3. An import job orchestrates the process to import the model and make the model available from the customer account.
    1. The import job copies all the model artifacts from the user’s account into an AWS managed S3 bucket.
  4. When the import job is complete, the fine-tuned model is made available for invocation from your AWS account.
  5. We use the SageMaker FMEval library in a SageMaker notebook to evaluate the imported model.

The copied model artifacts will remain in the Amazon Bedrock account until the custom imported model is deleted from Amazon Bedrock. Deleting model artifacts in your AWS account S3 bucket doesn’t delete the model or the related artifacts in the Amazon Bedrock managed account. You can delete an imported model from Amazon Bedrock along with all the copied artifacts using either the Amazon Bedrock console, Boto3 library, or APIs.

Additionally, all data (including the model) remains within the selected AWS Region. The model artifacts are imported into the AWS operated deployment account using a virtual private cloud (VPC) endpoint, and you can encrypt your model data using an AWS Key Management Service (AWS KMS) customer managed key.

In the following sections, we dive deep into each of these steps to deploy, test, and evaluate the model.

Prerequisites

We use Mistral-7B-v0.3 in this post because it uses an extended vocabulary compared to its prior version produced by Mistral AI. This model is straightforward to fine-tune, and Mistral AI has provided example fine-tuned models. We use Mistral for this use case because this model supports a 32,000-token context capacity and is fluent in English, French, Italian, German, Spanish, and coding languages. With the Mixture of Experts (MoE) feature, it can achieve higher accuracy for customer support use cases.

Mistral-7B-v0.3 is a gated model on the Hugging Face model repository. You need to review the terms and conditions and request access to the model by submitting your details.

We use Amazon SageMaker Studio to preprocess the data and fine-tune the Mistral model using a SageMaker training job. To set up SageMaker Studio, refer to Launch Amazon SageMaker Studio. Refer to the SageMaker JupyterLab documentation to set up and launch a JupyterLab notebook. You will submit a SageMaker training job to fine-tune the Mistral model from the SageMaker JupyterLab notebook, which can found on the GitHub repo.

Fine-tune the model using QLoRA

To fine-tune the Mistral model, we apply QLoRA and Parameter-Efficient Fine-Tuning (PEFT) optimization techniques. In the provided notebook, you use the Fully Sharded Data Parallel (FSDP) PyTorch API to perform distributed model tuning. You use supervised fine-tuning (SFT) to fine-tune the Mistral model.

Prepare the dataset

The first step in the fine-tuning process is to prepare and format the dataset. After you transform the dataset into the Mistral Default Instruct format, you upload it as a JSONL file into the S3 bucket used by the SageMaker session, as shown in the following code:

# Load dataset from the hub
dataset = load_dataset("Open-Orca/OpenOrca")
flan_dataset = dataset.filter(lambda example, indice: "flan" in example["id"], with_indices=True)
flan_dataset = flan_dataset["train"].train_test_split(test_size=0.01, train_size=0.035)

columns_to_remove = list(dataset["train"].features)
flan_dataset = flan_dataset.map(create_conversation, remove_columns=columns_to_remove, batched=False)

# save datasets to s3
flan_dataset["train"].to_json(f"{training_input_path}/train_dataset.json", orient="records", force_ascii=False)
flan_dataset["test"].to_json(f"{training_input_path}/test_dataset.json", orient="records", force_ascii=False)

You transform the dataset into Mistral Default Instruct format within the SageMaker training job as instructed in the training script (run_fsdp_qlora.py):

    ################
    # Dataset
    ################
    
    train_dataset = load_dataset(
        "json",
        data_files=os.path.join(script_args.dataset_path, "train_dataset.json"),
        split="train",
    )
    test_dataset = load_dataset(
        "json",
        data_files=os.path.join(script_args.dataset_path, "test_dataset.json"),
        split="train",
    )

    ################
    # Model & Tokenizer
    ################

    # Tokenizer        
    tokenizer = AutoTokenizer.from_pretrained(script_args.model_id, use_fast=True)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.chat_template = MISTRAL_CHAT_TEMPLATE
    
    # template dataset
    def template_dataset(examples):
        return{"text":  tokenizer.apply_chat_template(examples["messages"], tokenize=False)}
    
    train_dataset = train_dataset.map(template_dataset, remove_columns=["messages"])
    test_dataset = test_dataset.map(template_dataset, remove_columns=["messages"])

Optimize fine-tuning using QLoRA

You optimize your fine-tuning using QLoRA and with the precision provided as input into the training script as SageMaker training job parameters. QLoRA is an efficient fine-tuning approach that reduces memory usage to fine-tune a 65-billion-parameter model on a single 48 GB GPU, preserving the full 16-bit fine-tuning task performance. In this notebook, you use the bitsandbytes library to set up quantization configurations, as shown in the following code:

    # Model    
    torch_dtype = torch.bfloat16 if training_args.bf16 else torch.float32
    quant_storage_dtype = torch.bfloat16

    if script_args.use_qlora:
        print(f"Using QLoRA - {torch_dtype}")
        quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_use_double_quant=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_compute_dtype=torch_dtype,
                bnb_4bit_quant_storage=quant_storage_dtype,
            )
    else:
        quantization_config = None

You use the LoRA config based on the QLoRA paper and Sebastian Raschka experiment, as shown in the following code. Two key points to consider from the Raschka experiment are that QLoRA offers 33% memory savings at the cost of an 39% increase in runtime, and to make sure LoRA is applied to all layers to maximize model performance.

################
# PEFT
################
# LoRA config based on QLoRA paper & Sebastian Raschka experiment
peft_config = LoraConfig(
    lora_alpha=8,
    lora_dropout=0.05,
    r=16,
    bias="none",
    target_modules="all-linear",
    task_type="CAUSAL_LM",
    )

You use SFTTrainer to fine-tune the Mistral model:

    ################
    # Training
    ################
    trainer = SFTTrainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        dataset_text_field="text",
        eval_dataset=test_dataset,
        peft_config=peft_config,
        max_seq_length=script_args.max_seq_length,
        tokenizer=tokenizer,
        packing=True,
        dataset_kwargs={
            "add_special_tokens": False,  # We template with special tokens
            "append_concat_token": False,  # No need to add additional separator token
        },
    )

At the time of writing, only merged adapters are supported using the Custom Model Import feature for Amazon Bedrock. Let’s look at how to merge the adapter with the base model next.

Merge the adapters

Adapters are new modules added between layers of a pre-trained network. Creation of these new modules is possible by back-propagating gradients through a frozen, 4-bit quantized pre-trained language model into low-rank adapters in the fine-tuning process. To import the Mistral model into Amazon Bedrock, the adapters need to be merged with the base model and saved in Safetensors format. Use the following code to merge the model adapters and save them in Safetensors format:

        # load PEFT model in fp16
        model = AutoPeftModelForCausalLM.from_pretrained(
            training_args.output_dir,
            low_cpu_mem_usage=True,
            torch_dtype=torch.float16
        )
        # Merge LoRA and base model and save
        model = model.merge_and_unload()
        model.save_pretrained(
            sagemaker_save_dir, safe_serialization=True, max_shard_size="2GB"
        )

To import the Mistral model into Amazon Bedrock, the model needs to be in an uncompressed directory within an S3 bucket accessible by the Amazon Bedrock service role used in the import job.

Import the fine-tuned model into Amazon Bedrock

Now that you have fine-tuned the model, you can import the model into Amazon Bedrock. In this section, we demonstrate how to import the model using the Amazon Bedrock console or the SDK.

Import the model using the Amazon Bedrock console

To import the model using the Amazon Bedrock console, see Import a model with Custom Model Import. You use the Import model page as shown in the following screenshot to import the model from the S3 bucket.

After you successfully import the fine-tuned model, you can see the model listed on the Amazon Bedrock console.

Import the model using the SDK

The AWS Boto3 library supports importing custom models into Amazon Bedrock. You can use the following code to import a fine-tuned model from within the notebook into Amazon Bedrock. This is an asynchronous method.

import boto3
import datetime
br_client = boto3.client('bedrock', region_name='<aws-region-name>')
pt_model_nm = "<bedrock-custom-model-name>"
pt_imp_jb_nm = f"{pt_model_nm}-{datetime.datetime.now().strftime('%Y%m%d%M%H%S')}"
role_arn = "<<bedrock_role_with_custom_model_import_policy>>"
pt_model_src = {"s3DataSource": {"s3Uri": f"{pt_pubmed_model_s3_path}"}}
resp = br_client.create_model_import_job(jobName=pt_imp_jb_nm,
                                  importedModelName=pt_model_nm,
                                  roleArn=role_arn,
                                  modelDataSource=pt_model_src)

Test the imported model

Now that you have imported the fine-tuned model into Amazon Bedrock, you can test the model. In this section, we demonstrate how to test the model using the Amazon Bedrock console or the SDK.

Test the model on the Amazon Bedrock console

You can test the imported model using an Amazon Bedrock playground, as illustrated in the following screenshot.

Test the model using the SDK

You can also use the Amazon Bedrock Invoke Model API to run the fine-tuned imported model, as shown in the following code:

client = boto3.client("bedrock-runtime", region_name="us-west-2")
model_id = "<<replace with the imported bedrock model arn>>"


def call_invoke_model_and_print(native_request):
    request = json.dumps(native_request)

    try:
        # Invoke the model with the request.
        response = client.invoke_model(modelId=model_id, body=request)
        model_response = json.loads(response["body"].read())

        response_text = model_response["outputs"][0]["text"]
        print(response_text)
    except (ClientError, Exception) as e:
        print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
        exit(1)

prompt = "will there be a season 5 of shadowhunters"
formatted_prompt = f"[INST] {prompt} [/INST]</s>"
native_request = {
"prompt": formatted_prompt,
"max_tokens": 64,
"top_p": 0.9,
"temperature": 0.91
}
call_invoke_model_and_print(native_request)

The custom Mistral model that you imported using Amazon Bedrock supports temperature, top_p, and max_gen_len parameters when invoking the model for inferencing. The inference parameters top_k, max_seq_len, max_batch_size, and max_new_tokens are not supported for a custom Mistral fine-tuned model.

Evaluate the imported model

Now that you have imported and tested the model, let’s evaluate the imported model using the SageMaker FMEval library. For more details, refer to Evaluate Bedrock Imported Models. To evaluate the question answering task, we use the metrics F1 Score, Exact Match Score, Quasi Exact Match Score, Precision Over Words, and Recall Over Words. The key metrics for the question answering tasks are Exact Match, Quasi-Exact Match, and F1 over words evaluated by comparing the model predicted answers against the ground truth answers. The FMEval library supports out-of-the-box evaluation algorithms for metrics such as accuracy, QA Accuracy, and others detailed in the FMEval documentation. Because you fine-tuned the Mistral model for question answering, you can use the QA Accuracy algorithm, as shown in the following code. The FMEval library supports these metrics for the QA Accuracy algorithm.

config = DataConfig(
    dataset_name="trex_sample",
    dataset_uri="data/test_dataset.json",
    dataset_mime_type=MIME_TYPE_JSONLINES,
    model_input_location="question",
    target_output_location="answer"
)
bedrock_model_runner = BedrockModelRunner(
    model_id=model_id,
    output='outputs[0].text',
    content_template='{"prompt": $prompt, "max_tokens": 500}',
)

eval_algo = QAAccuracy()
eval_output = eval_algo.evaluate(model=bedrock_model_runner, dataset_config=config, 
                                    prompt_template="[INST]$model_input[/INST]", save=True)

You can get the consolidated metrics for the imported model as follows:

for op in eval_output:
    print(f"Eval Name: {op.eval_name}")
    for score in op.dataset_scores:
        print(f"{score.name} : {score.value}")

Clean up

To delete the imported model from Amazon Bedrock, navigate to the model on the Amazon Bedrock console. On the options menu (three dots), choose Delete.

To delete the SageMaker domain along with the SageMaker JupyterLab space, refer to Delete an Amazon SageMaker domain. You may also want to delete the S3 buckets where the data and model are stored. For instructions, see Deleting a bucket.

Conclusion

In this post, we explained the different aspects of fine-tuning a Mistral model using SageMaker, importing the model into Amazon Bedrock, invoking the model using both an Amazon Bedrock playground and Boto3, and then evaluating the imported model using the FMEval library. You can use this feature to import base FMs or FMs fine-tuned either on premises, on SageMaker, or on Amazon EC2 into Amazon Bedrock and use the models without any heavy lifting in your generative AI applications. Explore the Custom Model Import feature for Amazon Bedrock to deploy FMs fine-tuned for code generation tasks in a secure and scalable manner. Visit our GitHub repository to explore samples prepared for fine-tuning and importing models from various families.


About the Authors

Jay Pillai is a Principal Solutions Architect at Amazon Web Services. In this role, he functions as the Lead Architect, helping partners ideate, build, and launch Partner Solutions. As an Information Technology Leader, Jay specializes in artificial intelligence, generative AI, data integration, business intelligence, and user interface domains. He holds 23 years of extensive experience working with several clients across supply chain, legal technologies, real estate, financial services, insurance, payments, and market research business domains.

Rupinder Grewal is a Senior AI/ML Specialist Solutions Architect with AWS. He currently focuses on serving of models and MLOps on Amazon SageMaker. Prior to this role, he worked as a Machine Learning Engineer building and hosting models. Outside of work, he enjoys playing tennis and biking on mountain trails.

Evandro Franco is a Sr. AI/ML Specialist Solutions Architect at Amazon Web Services. He helps AWS customers overcome business challenges related to AI/ML on top of AWS. He has more than 18 years of experience working with technology, from software development, infrastructure, serverless, to machine learning.

Felipe Lopez is a Senior AI/ML Specialist Solutions Architect at AWS. Prior to joining AWS, Felipe worked with GE Digital and SLB, where he focused on modeling and optimization products for industrial applications.

Sandeep Singh is a Senior Generative AI Data Scientist at Amazon Web Services, helping businesses innovate with generative AI. He specializes in generative AI, artificial intelligence, machine learning, and system design. He is passionate about developing state-of-the-art AI/ML-powered solutions to solve complex business problems for diverse industries, optimizing efficiency and scalability.

Ragha Prasad is a Principal Engineer and a founding member of Amazon Bedrock, where he has had the privilege to listen to customer needs first-hand and understands what it takes to build and launch scalable and secure Gen AI products. Prior to Bedrock, he worked on numerous products in Amazon, ranging from devices to Ads to Robotics.

Paras Mehra is a Senior Product Manager at AWS. He is focused on helping build Amazon SageMaker Training and Processing. In his spare time, Paras enjoys spending time with his family and road biking around the Bay Area.

Read More

Using task-specific models from AI21 Labs on AWS

Using task-specific models from AI21 Labs on AWS

In this blog post, we will show you how to leverage AI21 Labs’ Task-Specific Models (TSMs) on AWS to enhance your business operations. You will learn the steps to subscribe to AI21 Labs in the AWS Marketplace, set up a domain in Amazon SageMaker, and utilize AI21 TSMs via SageMaker JumpStart.

AI21 Labs is a foundation model (FM) provider focusing on building state-of-the-art language models. AI21 Task Specific Models (TSMs) are built for answering questions, summarization, condensing lengthy texts, and so on. AI21 TSMs are available in Amazon SageMaker Jumpstart.

Here are the AI21 TSMs that can be accessed and customized in SageMaker JumpStart: AI21 Contextual Answers, AI21 Summarize, AI21 Paraphrase, and AI21 Grammatical Error Correction.

AI21 FMs (Jamba-Instruct, AI21 Jurassic-2 Ultra, AI21 Jurassic-2 Mid) are available in Amazon Bedrock and can be used for large language model (LLM) use cases. We used AI21 TSMs available in SageMaker Jumpstart for this post. SageMaker Jumpstart enables you to select, compare, and evaluate available AI21 TSMs.

AI21’s TSMs

Foundation models can solve many tasks, but not every task is unique. Some commercial tasks are common across many applications. AI21 Labs’ TSMs are specialized models built to solve a particular problem. They’re built to deliver out-of-box value, cost effectiveness, and higher accuracy for the common tasks behind many commercial use-cases. In this post, we will explore three of AI21 Labs’ TSMs and their unique capabilities.

Foundation models are built and trained on massive datasets to perform a variety of tasks. Unlike FMs, TSMs are trained to perform unique tasks.

When your use case is supported by a TSM, you quickly realize benefits such as improved refusal rates when you don’t want the model to provide answers unless they’re grounded in actual document content.

  • Paraphrase: This model is used to enhance content creation and communication by generating varied versions of text while maintaining a consistent tone and style. This model is ideal for creating multiple product descriptions, marketing materials, and customer support responses, improving clarity and engagement. It also simplifies complex documents, making information more accessible.
  • Summarize: This model is used to condense lengthy texts into concise summaries while preserving the original meaning. This model is particularly useful for processing large documents, such as financial reports, legal documents, and technical papers, making critical information more accessible and comprehensible.
  • Contextual answers: This model is used to significantly enhance information retrieval and customer support processes. This model excels at providing accurate and relevant answers based on specific document contexts, making it particularly useful in customer service, legal, finance, and educational sectors. It streamlines workflows by quickly accessing relevant information from extensive databases, reducing response times and improving customer satisfaction.

Prerequisites

To follow the steps in this post, you must have the following prerequisites in place:

AWS account setup

Completing the labs in this post requires an AWS account and SageMaker environments set up. If you don’t have an AWS account, see Complete your AWS registration for the steps to create one.

AWS Marketplace opt-in

AI21 TSMs can also be accessed through Amazon Marketplace for subscription. Using AWS Marketplace, you can subscribe to AI21 TSMs and deploy SageMaker endpoints.

To do these exercises you must subscribe to the following offerings in the AWS Marketplace

Service quota limits

To use some of the GPU’s required to run AI21’s task specific models, you must have the required service quota limits. You can request a service quota limit increase in the AWS Management Console. Limits are account and resource specific.

To create a service request, search for service quotas in the console search bar. Select the service to land go to the dashboard and enter the name of the GPU (for example, ml.g5.48xlarge). Ensure the quota is for endpoint usage

Estimated cost

The following is the estimated cost to walk through the solution in this post.

Contextual answers:

  • We used an ml.g5.48xlarge
    • By default, AWS accounts don’t have access to this GPU. You must request a service quota limit increase (see the previous section: Service Quota Limits).
  • The notebook runtime was approximately 15 minutes.
  • The cost was $20.41 (billed on an hourly basis).

Summarize notebook

  • We used an ml.g4dn.12xlarge GPU.
    • You must request a service quota limit increase (see the previous section: Service Quota Limits).
  • The notebook runtime was approximately 10 minutes.
  • The cost was $4.94 (billed on an hourly basis).

Paraphrase notebook

  • We used the ml.g4dn.12xlarge GPU.
    • You must request a service quota limit increase (see the previous section: Service Quota Limits).
  • The notebook runtime approximately 10 minutes.
  • The cost was $4.94 (billed on an hourly basis).

Total cost: $30.29 (1 hour charge for each deployed endpoint)

Using AI21 models on AWS

Getting started

In this section, you will access AI21 TSMs in SageMaker Jumpstart.  These interactive notebooks contain code to deploy TSM endpoints and will also provide example code blocks to run inference.  These first few steps are pre-requisites to deploying the same notebooks.  If you already have a SageMaker domain and username set up, you may skip to Step 7.

  1. Use the search bar in the AWS Management Console to navigate to Amazon SageMaker , as shown in the following figure.


If you don’t already have one set up, you must create a SageMaker domain. A domain consists of an associated Amazon Elastic File System (Amazon EFS) volume; a list of authorized users, and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations.

Users within a domain can share notebook files and other artifacts with each other. For more information, see Learn about Amazon SageMaker domain entities and statuses. For today’s exercises, you will use Quick Set-Up to deploy an environment.

  1. Choose Create a SageMaker domain as shown in the following figure.
  2. Select Quick setup. After you choose Set up the domain will begin creation
  3. After a moment, your domain will be created.
  4. Choose Add user.
  5. You can keep the default user profile values.
  6. Launch Studio by choosing Launch button and then selecting Studio.
  7. Choose JumpStart in the navigation pane as shown in the following figure.

Here you can see the model providers for our JumpStart notebooks.

You will see the model providers for JumpStart notebooks.

  1. Select AI21 Labs to see their available models.

Each of AI21’s models has an associated model card. A model card provides key information about the model such as its intended use cases, training, and evaluation details. For this example, you will use the Summarize, Paraphrase, and Contextual Answers TSMs.

  1. Start with Contextual Answers. Select the AI21 Contextual Answers model card.

A sample notebook is included as part of the model. Jupyter Notebooks are a popular way to interact with code and LLMs.

  1. Choose Notebooks to explore the notebook.
  2. To run the notebook’s code blocks, choose Open in JupyterLab.
  3. If you do not already have an existing space, choose Create new space and enter an appropriate name. When ready, choose Create space and open notebook.

It can take up to 5 minutes to open your notebook.
SageMaker Spaces are used to manage the storage and resource needs of some SageMaker Studio applications. Each space has a 1:1 relationship with an instance of an application.

  1. After the notebook opens, you will be prompted to select a kernal. Ensure Python 3 is selected and choose Select.

Navigating the notebook exercises

Repeat the preceding process to import the remaining notebooks.

Each AI21 notebook demonstrates required code imports, version checks, model selection, endpoint creation, and inferences showcasing the TSM’s unique strengths through code blocks and example prompts

Each notebook will have a clean up step at the end to delete your deployed endpoints. It’s important to terminate any running endpoints to avoid additional costs.

Contextual Answers JumpStart Notebook

AWS customers and partners can use AI21 Labs’s Contextual Answers model to significantly enhance their information retrieval and customer support processes. This model excels at providing accurate and relevant answers based on specific context, making it useful in customer service, legal, finance, and educational sectors.

The following are code snippets from AI21’s Contextual Answers TSM through JumpStart. Notice that there is no prompt engineering required. The only input is the question and the context provided.

Input:

financial_context = """In 2020 and 2021, enormous QE — approximately $4.4 trillion, or 18%, of 2021 gross domestic product (GDP) — and enormous fiscal stimulus (which has been and always will be inflationary) — approximately $5 trillion, or 21%, of 2021 GDP — stabilized markets and allowed companies to raise enormous amounts of capital. In addition, this infusion of capital saved many small businesses and put more than $2.5 trillion in the hands of consumers and almost $1 trillion into state and local coffers. These actions led to a rapid decline in unemployment, dropping from 15% to under 4% in 20 months — the magnitude and speed of which were both unprecedented. Additionally, the economy grew 7% in 2021 despite the arrival of the Delta and Omicron variants and the global supply chain shortages, which were largely fueled by the dramatic upswing in consumer spending and the shift in that spend from services to goods. Fortunately, during these two years, vaccines for COVID-19 were also rapidly developed and distributed.
In today's economy, the consumer is in excellent financial shape (on average), with leverage among the lowest on record, excellent mortgage underwriting (even though we've had home price appreciation), plentiful jobs with wage increases and more than $2 trillion in excess savings, mostly due to government stimulus. Most consumers and companies (and states) are still flush with the money generated in 2020 and 2021, with consumer spending over the last several months 12% above pre-COVID-19 levels. (But we must recognize that the account balances in lower-income households, smaller to begin with, are going down faster and that income for those households is not keeping pace with rising inflation.)
Today's economic landscape is completely different from the 2008 financial crisis when the consumer was extraordinarily overleveraged, as was the financial system as a whole — from banks and investment banks to shadow banks, hedge funds, private equity, Fannie Mae and many other entities. In addition, home price appreciation, fed by bad underwriting and leverage in the mortgage system, led to excessive speculation, which was missed by virtually everyone — eventually leading to nearly $1 trillion in actual losses.
"""
question = "Did the economy shrink after the Omicron variant arrived?"
response = client.answer.create(
    context=financial_context,
    question=question,
)

print(response.answer)

Output:

No, the economy did not shrink after the Omicron variant arrived. In fact, the economy grew 7% in 2021, despite the arrival of the Delta and Omicron variants and the global supply chain shortages, which were largely fueled by the dramatic upswing in consumer spending and the shift in that spend from services to goods.

As mentioned in our introduction, AI21’s Contextual Answers model does not provide answers to questions outside of the context provided. If the prompt includes a question unrelated to 2020/2021 economy, you will get a response as shown in the following example.

Input:

irrelevant_question = "How did COVID-19 affect the financial crisis of 2008?"

response = client.answer.create(
context=financial_context,
question=irrelevant_question,
)

print(response.answer)

Output:

None

When finished, you can delete your deployed endpoint by running the final two cells of the notebook.

model.sagemaker_session.delete_endpoint(endpoint_name)
model.sagemaker_session.delete_endpoint_config(endpoint_name)
model.delete_model()

You can import the other notebooks by navigating to SageMaker JumpStart and repeating the same process you used to import this first notebook.

Summarize JumpStart Notebook

AWS customers and partners can uses AI21 Labs’ Summarize model to condense lengthy texts into concise summaries while preserving the original meaning. This model is particularly useful for processing large documents, such as financial reports, legal documents, and technical papers, making critical information more accessible and comprehensible.

The following are highlight code snippets from AI21’s Summarize TSM using JumpStart. Notice that the  input must include the full text that the user wants to summarize.

Input:

text = """The error affected a number of international flights leaving the terminal on Wednesday, with some airlines urging passengers to travel only with hand luggage.
Virgin Atlantic said all airlines flying out of the terminal had been affected.
Passengers have been warned it may be days before they are reunited with luggage.
An airport spokesperson apologised and said the fault had now been fixed.
Virgin Atlantic said it would ensure all bags were sent out as soon as possible.
It added customers should retain receipts for anything they had bought and make a claim to be reimbursed.
Passengers, who were informed by e-mail of the problem, took to social media to vent their frustrations.
One branded the situation "ludicrous" and said he was only told 12 hours before his flight.
The airport said it could not confirm what the problem was, what had caused it or how many people had been affected."""

response = client.summarize.create(
    source=text,
    source_type=DocumentType.TEXT,
)

print("Original text:")
print(text)
print("================")
print("Summary:")
print(response.summary)

Output:
Original text:
The error affected a number of international flights leaving the terminal on Wednesday, with some airlines urging passengers to travel only with hand luggage.
Virgin Atlantic said all airlines flying out of the terminal had been affected.
Passengers have been warned it may be days before they are reunited with luggage.
An airport spokesperson apologised and said the fault had now been fixed.
Virgin Atlantic said it would ensure all bags were sent out as soon as possible.
It added customers should retain receipts for anything they had bought and make a claim to be reimbursed.
Passengers, who were informed by e-mail of the problem, took to social media to vent their frustrations.
One branded the situation "ludicrous" and said he was only told 12 hours before his flight.
The airport said it could not confirm what the problem was, what had caused it or how many people had been affected.
================
Summary:
A number of international flights leaving the terminal were affected by the error on Wednesday, with some airlines urging passengers to travel only with hand luggage. Passengers were warned it may be days before they are reunited with their luggage.

Paraphrase JumpStart Notebook

AWS customers and partners can use AI21 Labs’s Paraphrase TSM through JumpStart to enhance content creation and communication by generating varied versions of text.

The following are highlight code snippets from AI21’s Paraphrase TSM using JumpStart. Notice that there is no extensive prompt engineering required. The only input required is the full text that the user wants to paraphrase and a chosen style, for example casual, formal, and so on.

Input:

text = "Throughout this page, we will explore the advantages and features of the Paraphrase model."

response = client.paraphrase.create(
text=text,
style="formal"

)

print(response.suggestions) Output: 
[Suggestion(text='We will examine the advantages and features of the Paraphrase model throughout this page.'), Suggestion(text='The purpose of this page is to examine the advantages and features of the Paraphrase model.'), Suggestion(text='On this page, we will discuss the advantages and features of the Paraphrase model.'), Suggestion(text='This page will provide an overview of the advantages and features of the Paraphrase model.'), Suggestion(text='In this article, we will examine the advantages and features of the Paraphrase model.'), Suggestion(text='Here we will explore the advantages and features of the Paraphrase model.'), Suggestion(text='The purpose of this page is to describe the advantages and features of the Paraphrase model.'), Suggestion(text='In this page, we will examine the advantages and features of the Paraphrase model.'), Suggestion(text='The Paraphrase model will be reviewed on this page with an emphasis on its advantages and features.'), Suggestion(text='Our goal on this page will be to explore the benefits and features of the Paraphrase model.')]

Input:

print("Original sentence:")
print(text)
print("============================")
print("Suggestions:")
print("n".join(["- " + x.text for x in response.suggestions]))

Output:

Original sentence:
Throughout this page, we will explore the advantages and features of the Paraphrase model.
============================
Suggestions:
- We will examine the advantages and features of the Paraphrase model throughout this page.
- The purpose of this page is to examine the advantages and features of the Paraphrase model.
- On this page, we will discuss the advantages and features of the Paraphrase model.
- This page will provide an overview of the advantages and features of the Paraphrase model.
- In this article, we will examine the advantages and features of the Paraphrase model.
- Here we will explore the advantages and features of the Paraphrase model.
- The purpose of this page is to describe the advantages and features of the Paraphrase model.
- In this page, we will examine the advantages and features of the Paraphrase model.
- The Paraphrase model will be reviewed on this page with an emphasis on its advantages and features.
- Our goal on this page will be to explore the benefits and features of the Paraphrase model.

Less prompt engineering

A key advantage of AI21’s task-specific models is the reduced need for complex prompt engineering compared to foundation models. Let’s consider how you might approach a summarization task using a foundation model compared to using AI21’s specialized Summarize TSM.

For a foundation model, you might need to craft an elaborate prompt template with detailed instructions:

python prompt_template = "You are a highly capable summarization assistant. Concisely summarize the given text while preserving key details and overall meaning. Use clear language tailored for human readers.nnText: 

[INPUT_TEXT]nnSummary:" ``` To summarize text with this foundation model, you'd populate the template and pass the full prompt: ```python input_text = "Insert text to summarize here..." prompt = prompt_template.replace("[INPUT_TEXT]", input_text) summary = model(prompt)

 In contrast, using AI21's Summarize TSM is more straightforward:
python input_text = "Insert text to summarize here..." summary = summarize_model(input_text)

That’s it! With the Summarize TSM, you pass the input text directly to the model; there’s no need for an intricate prompt template.

Lower cost and higher accuracy

By using TSMs, you can achieve lower costs and higher accuracy. As demonstrated previously in the Contextual Notebook, TSMs have a higher refusal rate than most mainstream models, which can lead to higher accuracy. This characteristic of TSMs is beneficial in use cases where wrong answers are less acceptable.

Conclusion

In this post, we explored AI21 Labs’s approach to generative AI using task-specific models (TSMs). Through guided exercises, you walked through the process of setting up a SageMaker domain and importing sample JumpStart Notebooks to experiment with AI21’s TSMs, including Contextual Answers, Paraphrase, and Summarize.

Throughout the exercises, you saw the potential benefits of task-specific models compared to foundation models. When asking questions outside the context of the intended use case, the AI21 TSMs refused to answer, making them less prone to hallucinating or generating nonsensical outputs beyond their intended domain—a critical factor for applications that require precision and safety. Lastly, we highlighted how task-specific models are designed from the outset to excel at specific tasks, streamlining development and reducing the need for extensive prompt engineering and fine-tuning, which can them a more cost-effective solution.

Whether you’re a data scientist, machine learning practitioner, or someone curious about AI advancements, we hope this post has provided valuable insights into the advantages of AI21 Labs’s task-specific approach. As the field of generative AI continues to evolve rapidly, we encourage you to stay curious, experiment with various approaches, and ultimately choose the one that best aligns with your project’s unique requirements and goals. Visit AWS GitHub for other example use cases and codes to experiment in your own environment.

Additional resources


About the Authors

Joe Wilson is a Solutions Architect at Amazon Web Services supporting nonprofit organizations. He has core competencies in data analytics, AI/ML and GenAI. Joe background is in data science and international development. He is passionate about leveraging data and technology for social good.

Pat Wilson is a Solutions Architect at Amazon Web Services with a focus on AI/ML workloads and security. He currently supports Federal Partners. Outside of work Pat enjoys learning, working out, and spending time with family/friends.

Josh Famestad is a Solutions Architect at Amazon Web Services. Josh works with public sector customers to build and execute cloud based approaches to deliver on business priorities.

Read More

Architecture to AWS CloudFormation code using Anthropic’s Claude 3 on Amazon Bedrock

Architecture to AWS CloudFormation code using Anthropic’s Claude 3 on Amazon Bedrock

The Anthropic’s Claude 3 family of models, available on Amazon Bedrock, offers multimodal capabilities that enable the processing of images and text. This capability opens up innovative avenues for image understanding, wherein Anthropic’s Claude 3 models can analyze visual information in conjunction with textual data, facilitating more comprehensive and contextual interpretations. By taking advantage of its multimodal prowess, we can ask the model questions like “What objects are in the image, and how are they relatively positioned to each other?” We can also gain an understanding of data presented in charts and graphs by asking questions related to business intelligence (BI) tasks, such as “What is the sales trend for 2023 for company A in the enterprise market?” These are just some examples of the additional richness Anthropic’s Claude 3 brings to generative artificial intelligence (AI) interactions.

Architecting specific AWS Cloud solutions involves creating diagrams that show relationships and interactions between different services. Instead of building the code manually, you can use Anthropic’s Claude 3’s image analysis capabilities to generate AWS CloudFormation templates by passing an architecture diagram as input.

In this post, we explore some ways you can use Anthropic’s Claude 3 Sonnet’s vision capabilities to accelerate the process of moving from architecture to the prototype stage of a solution.

Use cases for architecture to code

The following are relevant use cases for this solution:

  • Converting whiteboarding sessions to AWS infrastructure To quickly prototype your designs, you can take the architecture diagrams created during whiteboarding sessions and generate the first draft of a CloudFormation template. You can also iterate over the CloudFormation template to develop a well-architected solution that meets all your requirements.
  • Fast deployment of architecture diagrams – You can generate boilerplate CloudFormation templates by using architecture diagrams you find on the web. This allows you to experiment quickly with new designs.
  • Streamlined AWS infrastructure design through collaborative diagramming – You might draw architecture diagrams on a diagramming tool during an all-hands meeting. These raw diagrams can generate boilerplate CloudFormation templates, quickly leading to actionable steps while speeding up collaboration and increasing meeting value.

Solution overview

To demonstrate the solution, we use Streamlit to provide an interface for diagrams and prompts. Amazon Bedrock invokes the Anthropic’s Claude 3 Sonnet model, which provides multimodal capabilities. AWS Fargate is the compute engine for web application. The following diagram illustrates the step-by-step process.

Architecture Overview

The workflow consists of the following steps:

  1. The user uploads an architecture image (JPEG or PNG) on the Streamlit application, invoking the Amazon Bedrock API to generate a step-by-step explanation of the architecture using the Anthropic’s Claude 3 Sonnet model.
  2. The Anthropic’s Claude 3 Sonnet model is invoked using a step-by-step explanation and few-shot learning examples to generate the initial CloudFormation code. The few-shot learning example consists of three CloudFormation templates; this helps the model understand writing practices associated with CloudFormation code.
  3. The user manually provides instructions using the chat interface to update the initial CloudFormation code.

*Steps 1 and 2 are executed once when architecture diagram is uploaded. To trigger changes to the AWS CloudFormation code (step 3) provide update instructions from the Streamlit app

The CloudFormation templates generated by the web application are intended for inspiration purposes and not for production-level applications. It is the responsibility of a developer to test and verify the CloudFormation template according to security guidelines.

Few-shot Prompting

To help Anthropic’s Claude 3 Sonnet understand the practices of writing CloudFormation code, we use few-shot prompting by providing three CloudFormation templates as reference examples in the prompt. Exposing Anthropic’s Claude 3 Sonnet to multiple CloudFormation templates will allow it to analyze and learn from the structure, resource definitions, parameter configurations, and other essential elements consistently implemented across your organization’s templates. This enables Anthropic’s Claude 3 Sonnet to grasp your team’s coding conventions, naming conventions, and organizational patterns when generating CloudFormation templates. The following examples used for few-shot learning can be found in the GitHub repo.

Few-shot prompting example 1

Few-shot prompting example 1

Few-shot prompting example 2

Few-shot prompting example 2

Few-shot prompting example 3

Few-shot prompting example 3

Furthermore, Anthropic’s Claude 3 Sonnet can observe how different resources and services are configured and integrated within the CloudFormation templates through few-shot prompting. It will gain insights into how to automate the deployment and management of various AWS resources, such as Amazon Simple Storage Service (Amazon S3), AWS Lambda, Amazon DynamoDB, and AWS Step Functions.

Inference parameters are preset, but they can be changed from the web application if desired. We recommend experimenting with various combinations of these parameters. By default, we set the temperature to zero to reduce the variability of outputs and create focused, syntactically correct code.

Prerequisites

To access the Anthropic’s Claude 3 Sonnet foundation model (FM), you must request access through the Amazon Bedrock console. For instructions, see Manage access to Amazon Bedrock foundation models. After requesting access to Anthropic’s Claude 3 Sonnet, you can deploy the following development.yaml CloudFormation template to provision the infrastructure for the demo. For instructions on how to deploy this sample, refer to the GitHub repo. Use the following table to launch the CloudFormation template to quickly deploy the sample in either us-east-1 or us-west-2.

Region Stack
us-east-1 development.yaml
us-west-2 development.yaml

When deploying the template, you have the option to specify the Amazon Bedrock model ID you want to use for inference. This flexibility allows you to choose the model that best suits your needs. By default, the template uses the Anthropic’s Claude 3 Sonnet model, renowned for its exceptional performance. However, if you prefer to use a different model, you can seamlessly pass its Amazon Bedrock model ID as a parameter during deployment. Verify that you have requested access to the desired model beforehand and that the model possesses the necessary vision capabilities required for your specific use case.

After you launch the CloudFormation stack, navigate to the stack’s Outputs tab on the AWS CloudFormation console and collect the Amazon CloudFront URL. Enter the URL in your browser to view the web application.

Web application screenshot

In this post, we discuss CloudFormation template generation for three different samples. You can find the sample architecture diagrams in the GitHub repo. These samples are similar to the few-shot learning examples, which is intentional. As an enhancement to this architecture, you can employ a Retrieval Augmented Generation (RAG)-based approach to retrieve relevant CloudFormation templates from a knowledge base to dynamically augment the prompt.

Due to the non-deterministic behavior of the large language model (LLM), you might not get the same response as shown in this post.

Let’s generate CloudFormation templates for the following sample architecture diagram.

Sample Architecture for CloudFormation generation

Uploading the preceding architecture diagram to the web application generates a step-by-step explanation of the diagram using Anthropic’s Claude 3 Sonnet’s vision capabilities.

Web application output screenshot

Let’s analyze the step-by-step explanation. The generated response is divided into three parts:

  1. The context explains what the architecture diagram depicts.
  2. The architecture diagram’s flow gives the order in which AWS services are invoked and their relationship with each other.
  3. We get a summary of the entire generated response.

In the following step-by-step explanation, we see a few highlighted errors.

Step-by-step explanation errors

The step-by-step explanation is augmented with few-shot learning examples to develop an initial CloudFormation template. Let’s analyze the initial CloudFormation template:

AWSTemplateFormatVersion: '2010-09-09'
Description: >
  This CloudFormation stack sets up a serverless data processing pipeline triggered by file uploads to an S3 bucket.
  It uses AWS Lambda to process the uploaded files, and Amazon SNS to send notifications upon successful processing.
  This template is not production ready and should only be used for inspiration
Parameters:
  S3BucketName:
    Type: String
    Description: Name of the S3 bucket for file uploads
    AllowedPattern: ^[a-z0-9][a-z0-9-]*[a-z0-9]$
    MinLength: 1
    MaxLength: 63

  EmailAddress:
    Type: String
    Description: Email address to receive notifications
    AllowedPattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$

Resources:

  # S3 Bucket
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Ref S3BucketName
      NotificationConfiguration:
        LambdaConfigurations:
          - Event: 's3:ObjectCreated:*'
            Function: !GetAtt ProcessingLambda.Arn

  # SNS Topic
  SNSTopic:
    Type: AWS::SNS::Topic
    Properties:
      Subscription:
        - Endpoint: !Ref EmailAddress
          Protocol: email

  # Lambda Function
  ProcessingLambda:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: ProcessingLambda
      Runtime: python3.9
      Handler: index.lambda_handler
      Role: !GetAtt LambdaRole.Arn
      Code:
        ZipFile: |
          import boto3

          def lambda_handler(event, context):
              s3 = boto3.client('s3')
              sns = boto3.client('sns')

              # Process the uploaded file
              for record in event['Records']:
                  bucket_name = record['s3']['bucket']['name']
                  object_key = record['s3']['object']['key']

                  # Process the file data
                  # ...

              # Send notification upon successful processing
              sns.publish(
                  TopicArn=!Ref SNSTopic,
                  Message='File processing completed successfully',
                  Subject='Data Processing Notification'
              )

              return {
                  'statusCode': 200,
                  'body': 'File processing completed successfully'
              }

  # Lambda Role
  LambdaRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: 'sts:AssumeRole'
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: S3Access
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - 's3:GetObject'
                Resource: !Join ['', ['arn:aws:s3:::', !Ref S3BucketName, '/*']]
        - PolicyName: SNSPublish
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - 'sns:Publish'
                Resource: !Ref SNSTopic

  # Lambda Permissions
  LambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !GetAtt ProcessingLambda.Arn
      Action: 'lambda:InvokeFunction'
      Principal: s3.amazonaws.com
      SourceAccount: !Ref AWS::AccountId
      SourceArn: !Join ['', ['arn:aws:s3:::', !Ref S3BucketName]]

Outputs:

  S3BucketName:
    Description: Name of the S3 bucket for file uploads
    Value: !Ref S3Bucket
    Export:
      Name: !Sub '${AWS::StackName}-S3BucketName'

  LambdaFunctionArn:
    Description: ARN of the Lambda function
    Value: !GetAtt ProcessingLambda.Arn
    Export:
      Name: !Sub '${AWS::StackName}-LambdaFunctionArn'

  SNSTopicArn:
    Description: ARN of the SNS topic for notifications
    Value: !Ref SNSTopic
    Export:
      Name: !Sub '${AWS::StackName}-SNSTopicArn'

After analyzing the CloudFormation template, we see that the Lambda code refers to an Amazon Simple Notification Service (Amazon SNS) topic using !Ref SNSTopic, which is not valid. We also want to add additional functionality to the template. First, we want to filter the Amazon S3 notification configuration to invoke Lambda only when *.csv files are uploaded. Second, we want to add metadata to the CloudFormation template. To do this, we use the chat interface to give the following update instructions to the web application:

Make the following updates:

Use environment variables for AWS Lambda to access SNS Topic ARN.

Add filter to S3 notification configuration to only invoke AWS lambda when *.csv files are uploaded

Add metadata to CloudFormation template

Chat interface web application screenshot

The updated CloudFormation template is as follows:

AWSTemplateFormatVersion: '2010-09-09'
Description: >
  This CloudFormation stack sets up a serverless data processing pipeline triggered by file uploads to an S3 bucket.
  It uses AWS Lambda to process the uploaded files, and Amazon SNS to send notifications upon successful processing.
  This template is not production ready and should only be used for inspiration.
Metadata:
  'AWS::CloudFormation::Interface':
    ParameterGroups:
      - Label:
          default: 'S3 Bucket Configuration'
        Parameters:
          - S3BucketName
      - Label:
          default: 'Notification Configuration'
        Parameters:
          - EmailAddress

Parameters:
  S3BucketName:
    Type: String
    Description: Name of the S3 bucket for file uploads
    AllowedPattern: ^[a-z0-9][a-z0-9-]*[a-z0-9]$
    MinLength: 1
    MaxLength: 63

  EmailAddress:
    Type: String
    Description: Email address to receive notifications
    AllowedPattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$

Resources:

  # S3 Bucket
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Ref S3BucketName
      NotificationConfiguration:
        LambdaConfigurations:
          - Event: 's3:ObjectCreated:*'
            Function: !GetAtt ProcessingLambda.Arn
            Filter:
              S3Key:
                Rules:
                  - Name: suffix
                    Value: .csv

  # SNS Topic
  SNSTopic:
    Type: AWS::SNS::Topic
    Properties:
      Subscription:
        - Endpoint: !Ref EmailAddress
          Protocol: email

  # Lambda Function
  ProcessingLambda:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: ProcessingLambda
      Runtime: python3.9
      Handler: index.lambda_handler
      Role: !GetAtt LambdaRole.Arn
      Environment:
        Variables:
          SNS_TOPIC_ARN: !Ref SNSTopic
      Code:
        ZipFile: |
          import boto3
          import os

          def lambda_handler(event, context):
              s3 = boto3.client('s3')
              sns = boto3.client('sns')
              sns_topic_arn = os.environ['SNS_TOPIC_ARN']

              # Process the uploaded file
              for record in event['Records']:
                  bucket_name = record['s3']['bucket']['name']
                  object_key = record['s3']['object']['key']

                  # Process the file data
                  # ...

              # Send notification upon successful processing
              sns.publish(
                  TopicArn=sns_topic_arn,
                  Message='File processing completed successfully',
                  Subject='Data Processing Notification'
              )

              return {
                  'statusCode': 200,
                  'body': 'File processing completed successfully'
              }

  # Lambda Role
  LambdaRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: 'sts:AssumeRole'
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: S3Access
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - 's3:GetObject'
                Resource: !Join ['', ['arn:aws:s3:::', !Ref S3BucketName, '/*']]
        - PolicyName: SNSPublish
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - 'sns:Publish'
                Resource: !Ref SNSTopic

  # Lambda Permissions
  LambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !GetAtt ProcessingLambda.Arn
      Action: 'lambda:InvokeFunction'
      Principal: s3.amazonaws.com
      SourceAccount: !Ref AWS::AccountId
      SourceArn: !Join ['', ['arn:aws:s3:::', !Ref S3BucketName]]

Outputs:

  S3BucketName:
    Description: Name of the S3 bucket for file uploads
    Value: !Ref S3Bucket
    Export:
      Name: !Sub '${AWS::StackName}-S3BucketName'

  LambdaFunctionArn:
    Description: ARN of the Lambda function
    Value: !GetAtt ProcessingLambda.Arn
    Export:
      Name: !Sub '${AWS::StackName}-LambdaFunctionArn'

  SNSTopicArn:
    Description: ARN of the SNS topic for notifications
    Value: !Ref SNSTopic
    Export:
      Name: !Sub '${AWS::StackName}-SNSTopicArn'

Additional examples

We have provided two more sample diagrams, their associated CloudFormation code generated by Anthropic’s Claude 3 Sonnet, and the prompts used to create them. You can see how diagrams in various forms, from digital to hand-drawn, or some combination, can be used. The end-to-end analysis of these samples can be found at sample 2 and sample 3 on the GitHub repo.

Best practices for architecture to code

In the demonstrated use case, you can observe how well the Anthropic’s Claude 3 Sonnet model could pull details and relationships between services from an architecture image. The following are some ways you can improve the performance of Anthropic’s Claude in this use case:

  • Implement a multimodal RAG approach to enhance the application’s ability to handle a wider variety of complex architecture diagrams, because the current implementation is limited to diagrams similar to the provided static examples.
  • Enhance the architecture diagrams by incorporating visual cues and features, such as labeling services, indicating orchestration hierarchy levels, grouping related services at the same level, enclosing services within clear boxes, and labeling arrows to represent the flow between services. These additions will aid in better understanding and interpreting the diagrams.
  • If the application generates an invalid CloudFormation template, provide the error as update instructions. This will help the model understand the mistake and make a correction.
  • Use Anthropic’s Claude 3 Opus or Anthropic’s Claude 3.5 Sonnet for greater performance on long contexts in order to support near-perfect recall
  • With careful design and management, orchestrate agentic workflows by using Agents for Amazon Bedrock. This enables you to incorporate self-reflection, tool use, and planning within your workflow to generate more relevant CloudFormation templates.
  • Use Amazon Bedrock Prompt Flows to accelerate the creation, testing, and deployment of workflows through an intuitive visual interface. This can reduce development effort and accelerate workflow testing.

Clean up

To clean up the resources used in this demo, complete the following steps:

  1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  2. Select the deployed yaml development.yaml stack and choose Delete.

Conclusion

With the pattern demonstrated with Anthropic’s Claude 3 Sonnet, developers can effortlessly translate their architectural visions into reality by simply sketching their desired cloud solutions. Anthropic’s Claude 3 Sonnet’s advanced image understanding capabilities will analyze these diagrams and generate boilerplate CloudFormation code, minimizing the need for initial complex coding tasks. This visually driven approach empowers developers from a variety of skill levels, fostering collaboration, rapid prototyping, and accelerated innovation.

You can investigate other patterns, such as including RAG and agentic workflows, to improve the accuracy of code generation. You can also explore customizing the LLM by fine-tuning it to write CloudFormation code with greater flexibility.

Now that you have seen Anthropic’s Claude 3 Sonnet in action, try designing your own architecture diagrams using some of the best practices to take your prototyping to the next level.

For additional resources, refer to the :


About the Authors

Author 1 Eashan KaushikEashan Kaushik is an Associate Solutions Architect at Amazon Web Services. He is driven by creating cutting-edge generative AI solutions while prioritizing a customer-centric approach to his work. Before this role, he obtained an MS in Computer Science from NYU Tandon School of Engineering. Outside of work, he enjoys sports, lifting, and running marathons.

Author 2 Chris PecoraChris Pecora is a Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions while also focusing on customer-obsessed science. When not running experiments and keeping up with the latest developments in generative AI, he loves spending time with his kids.

Read More

How Northpower used computer vision with AWS to automate safety inspection risk assessments

How Northpower used computer vision with AWS to automate safety inspection risk assessments

This post is co-written with Andreas Astrom from Northpower.

Northpower provides reliable and affordable electricity and fiber internet services to customers in the Northland region of New Zealand. As an electricity distributor, Northpower aims to improve access, opportunity, and prosperity for its communities by investing in infrastructure, developing new products and services, and giving back to shareholders. Additionally, Northpower is one of New Zealand’s largest infrastructure contractors, serving clients in transmission, distribution, generation, and telecommunications. With over 1,400 staff working across 14 locations, Northpower plays a crucial role in maintaining essential services for customers driven by a purpose of connecting communities and building futures for Northland.

The energy industry is at a critical turning point. There is a strong push from policymakers and the public to decarbonize the industry, while at the same time balancing energy resilience with health, safety, and environmental risk. Recent events including Tropical Cyclone Gabrielle have highlighted the susceptibility of the grid to extreme weather and emphasized the need for climate adaptation with resilient infrastructure. Electricity Distribution Businesses (EDBs) are also facing new demands with the integration of decentralized energy resources like rooftop solar as well as larger-scale renewable energy projects like solar and wind farms. These changes call for innovative solutions to ensure operational efficiency and continued resilience.

In this post, we share how Northpower has worked with their technology partner Sculpt to reduce the effort and carbon required to identify and remediate public safety risks. Specifically, we cover the computer vision and artificial intelligence (AI) techniques used to combine datasets into a list of prioritized tasks for field teams to investigate and mitigate. The resulting dashboard highlighted that 141 power pole assets required action, out of a network of 57,230 poles.

Northpower challenge

Utility poles have stay wires that anchor the pole to the ground for extra stability. These stay wires are meant to have an inline insulator to avoid the situation of the stay wire becoming live, which would create a safety risk for person or animal in the area.

Northpower faced a significant challenge in determining how many of their 57,230 power poles have stay wires without insulators. Without reliable historical data, manual inspections of such a vast and predominantly rural network is labor-intensive and costly. Alternatives like helicopter surveys or field technicians require access to private properties for safety inspections, and are expensive. Moreover, the travel requirement for technicians to physically visit each pole across such a large network posed a considerable logistical challenge, emphasizing the need for a more efficient solution.

Thankfully, some asset datasets were available in digital format, and historical paper-based inspection reports, dating back 20 years, were available in scanned format. This archive, along with 765,933 varied-quality inspection photographs, some over 15 years old, presented a significant data processing challenge. Processing these images and scanned documents is not a cost- or time-efficient task for humans, and requires highly performant infrastructure that can reduce the time to value.

Solution overview

Amazon SageMaker is a fully managed service that helps developers and data scientists build, train, and deploy machine learning (ML) models. In this solution, the team used Amazon SageMaker Studio to launch an object detection model available in Amazon SageMaker JumpStart using the PyTorch framework.

The following diagram illustrates the high-level workflow.

Northpower chose SageMaker for a number of reasons:

  • SageMaker Studio is a managed service with ready-to-go development environments, saving time otherwise used for setting up environments manually
  • SageMaker JumpStart took care of the setup and deployed the required ML jobs involved in the project with minimal configuration, further saving development time
  • The integrated labeling solution with Amazon SageMaker Ground Truth was suitable for large-scale image annotations and simplified the collaboration with a Northpower labeling workforce

In the following sections, we discuss the key components of the solution as illustrated in the preceding diagram.

Data preparation

SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images. The workforce created a bounding box around stay wires and insulators and the output was subsequently used to train an ML model.

Model training, validation, and storage

This component uses the following services:

  • SageMaker Studio is used to access and deploy a pre-trained object detection model and develop code on managed Jupyter notebooks. The model was then fine-tuned with training data from the data preparation stage. For a step-by-step guide to set up SageMaker Studio, refer to Amazon SageMaker simplifies the Amazon SageMaker Studio setup for individual users.
  • SageMaker Studio runs custom Python code to augment the training data and transform the metadata output from SageMaker Ground Truth into a format supported by the computer vision model training job. The model is then trained using a fully managed infrastructure, validated, and published to the Amazon SageMaker Model Registry.
  • Amazon Simple Storage Service (Amazon S3) stores the model artifacts and creates a data lake to host the inference output, document analysis output, and other datasets in CSV format.

Model deployment and inference

In this step, SageMaker hosts the ML model on an endpoint used to run inferences.

A SageMaker Studio notebook was used again post-inference to run custom Python code to simplify the datasets and render bounding boxes on objects based on criteria. This step also applied a custom scoring system that was also rendered onto the final image, and this allowed for an additional human QA step for low confidence images.

Data analytics and visualization

This component includes the following services:

  • An AWS Glue crawler is used to understand the dataset structures stored in the data lake so that it can be queried by Amazon Athena
  • Athena allows the use of SQL to combine the inference output and asset datasets to find highest risk items
  • Amazon QuickSight was used as the tool for both the human QA process and for determining which assets needed a field technician to be sent for physical inspection

Document understanding

In the final step, Amazon Textract digitizes historical paper-based asset assessments and stores the output in CSV format.

Results

The trained PyTorch object detection model enabled the detection of stay wires and insulators on utility poles, and a SageMaker postprocessing job calculated a risk score using an m5.24xlarge Amazon Elastic Compute Cloud (EC2) instance with 200 concurrent Python threads. This instance was also responsible for rendering the score information along with an object bounding box onto an output image, as shown in the following example.

Writing the confidence scores into the S3 data lake alongside the historical inspection results allowed Northpower to run analytics using Athena to understand each classification of image. The sunburst graph below is a visualization of this classification.

Northpower categorized 1,853 poles as high priority risks, 3,922 as medium priority, 36,260 as low priority, and 15,195 as the lowest priority. These were viewable in the QuickSight dashboard and used as an input for humans to review the highest risk assets first.

At the conclusion of the analysis, Northpower found that 31 poles needed stay wire insulators installed and a further 110 poles needed investigation in the field. This significantly reduced the cost and carbon usage involved in manually checking every asset.

Conclusion

Remote asset inspecting remains a challenge for regional EDBs, but using computer vision and AI to uncover new value from data that was previously unused was key to Northpower’s success in this project. SageMaker JumpStart provided deployable models that could be trained for object detection use cases with minimal data science knowledge and overhead.

Discover the publicly available foundation models offered by SageMaker JumpStart and fast-track your own ML project with the following step-by-step tutorial.


About the authors

Scott Patterson is a Senior Solutions Architect at AWS.

Andreas Astrom is the Head of Technology and Innovation at Northpower

Read More