How SailPoint uses Anthropic’s Claude on Amazon Bedrock to automatically generate TypeScript code for SaaS connectors

How SailPoint uses Anthropic’s Claude on Amazon Bedrock to automatically generate TypeScript code for SaaS connectors

Generative AI is revolutionizing the way developers approach programming by providing intelligent assistance and automation throughout the coding process. With the power of advanced language models and machine learning (ML) algorithms, generative AI can understand the context and intent behind a programmer’s code, offering valuable suggestions, completing code snippets, and even generating entire functions or modules based on high-level descriptions. This technology empowers developers to focus on higher-level problem-solving and architecture, while the AI handles the tedious and repetitive aspects of coding. One of the key advantages of large language models (LLMs) in programming is their ability to learn from the vast amounts of existing code and programming patterns they were trained on. This knowledge allows them to generate context-aware code, detect potential bugs or vulnerabilities, and offer optimizations to improve code quality and performance.

In this post, we highlight how the AWS Generative AI Innovation Center collaborated with SailPoint Technologies to build a generative AI-based coding assistant that uses Anthropic’s Claude Sonnet on Amazon Bedrock to help accelerate the development of software as a service (SaaS) connectors.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

SailPoint specializes in enterprise identity security solutions. Over 3,000 enterprises worldwide use SailPoint to help defend against today’s dynamic, identity-centric cyber threats while enhancing productivity and efficiency. Their products are designed to manage and secure access to applications and data through the lens of identity, at speed and scale, for users inside an organization and for external parties such as non-employees. SailPoint’s unified, intelligent, and extensible environment provides comprehensive identity governance capabilities, including access certifications, policy management, access request and provisioning, password management, and data access governance. This helps organizations make sure the right individuals have the right access to the right resources at the right times, thereby enforcing security policies and compliance requirements. Founded in 2005, SailPoint has grown to be a key player in identity security, serving customers globally across various industries.

SailPoint connectors and SaaS connectivity

SailPoint’s identity security solutions interface with various software as a service (SaaS) applications to retrieve the necessary information, such as account and access information, from an identity security standpoint. Each SaaS application implements these functionalities in slightly different ways and might expose their implementation through REST-based web APIs that are typically supported by OpenAPI specifications. SailPoint connectors are TypeScript modules that interface with a SaaS application and map the relevant identity security information (such as accounts and entitlements) to a standardized format understood by SailPoint. Based on the APIs exposed by the application, SailPoint connectors can create, update, and delete access on those accounts. SailPoint connectors help manage user identities and their access rights across different environments within an organization, supporting the organization’s compliance and security efforts.

Although a typical connector exposes several functions, for this post, we focus on developing the list user function of a connector that connects to an API endpoint for listing users, retrieving all the users, and transforming them into the format required by SailPoint.

In the following sections, we detail how we used Anthropic’s Claude Sonnet on Amazon Bedrock to automatically create the list user connector, a critical component of the broader SaaS connectivity.

Understanding the list user connector

Connectors are modules that can connect to an external service and retrieve and update relevant information from a SaaS application. To better understand how connectors are built, we give an example of the connector function that connects to DocuSign’s REST API’s getUsers endpoint. The following TypeScript code defines an asynchronous function listUsers that retrieves a list of user accounts from an external API and constructs a structured output for each user:

// imports
import { Context, Response, StdAccountListHandler, StdAccountListOutput } from '@sailpoint/connector-sdk';

const listUsers: StdAccountListHandler = async (context: Context, input: undefined, res: Response<StdAccountListOutput>) => {
    // retrieve api key and host url from context
    let apiKey = context.apiKey;
    let hostUrl = context.hostUrl;
    let hasMore = true;
    
    // url construction
    let url = `https://${hostUrl}/Management/v2/organizations/${context.organizationId}/users`;
    
    // loop through pages
    while (hasMore) {
        // fetch response from the endpoint
        let response = await fetch(url, {
          headers: {
            'Authorization': `Bearer ${apiKey}`
          }
        });
        let results = await response.json();
  
        // processing users from response
        let users = results.users;
        for (const user of users) {
            const output: StdAccountListOutput = {
                identity: user.id,
                attributes: {
                    user_name: user.user_name,
                    first_name: user.first_name,
                    last_name: user.last_name,
                    user_status: user.user_status, 
                    membership_status: user.membership_status, 
                    email: user.email,
                    created_on: user.created_on, 
                    membership_created_on: user.membership_created_on,                     
                    ds_group_id: user.company_groups.map(group => group.ds_group_id),
                    ds_group_account_id: user.company_groups.map(group => group.ds_group_account_id)                   
                }
            };
        }
        // pagination
        if (results.paging.next) {
          url = results.paging.next;
        } else {
          hasMore = false;
        }
    }
}

The following is a breakdown of what each part of the code does:

  • Imports – The code imports several types and interfaces from @sailpoint/connector-sdk. These include Context, Response, StdAccountListHandler, and StdAccountListOutput, which are used to handle the input and output of the function in a standardized way within a SailPoint environment.
  • Function definitionlistUsers is defined as an asynchronous function compatible with the StdAccountListHandler It uses the Context to access configuration details like API keys and the base URL, and a Response to structure the output.
  • Retrieve API key and host URL – These are extracted from the context parameter. They are used to authenticate and construct the request URL.
  • URL construction – The function constructs the initial URL using the hostUrl and organizationId from the context. This URL points to an endpoint that returns users associated with a specific organization.
  • Loop through pages – The while loop continues as long as there are more pages of data (hasMore is true). It serves the following functions:
    • Fetch data – Inside the while loop, a fetch request is made to the API endpoint. The request includes an Authorization header that uses the apiKey. The API’s response is converted to JSON format.
    • Process users – Inside the while loop, it extracts user data from the API response. The process loops through each user, constructing an StdAccountListOutput object for each one. This object includes user identifiers and attributes like user names, names, status, email, and group IDs.
    • Pagination – Inside the while loop, it checks if there is a next page URL in the pagination information (results.paging.next). If it exists, it updates the url for the next iteration of the loop. If not, it sets hasMore to false to stop the loop.

Understanding this example helps us understand the step-by-step process of building this function in a connector. We aim to reproduce this process using an LLM with a prompt chaining strategy.

Generate a TypeScript connector using an LLM prompt chain

There are several approaches to using pre-trained LLMs for code generation, with varying levels of complexity:

  • Single prompt – You can use models like Anthropic’s Claude to generate code by direct prompting. These models can generate code in a variety of languages, including TypeScript, but they don’t inherently possess domain-specific knowledge relevant to the task of building a connector. All the required information, including API specifications and formatting instructions, must be provided in the prompt, similar to the instructions that would be given to a developer. However, LLMs tend to struggle when given a long list of complex instructions. It’s also difficult for the prompt engineer to understand which steps are challenging for the LLM.
  • Agentic frameworks with LLMs – Agents are a sophisticated framework that can use tools to perform a sequence of complex tasks. In this case, the agent starts by breaking down the user requests into steps, searches for necessary information using tools (a knowledge base or web browser), and autonomously generates code from start to finish. Although they’re powerful, these frameworks are complex to implement, often unstable in their behavior, and less controllable compared to other methods. Agents also require many LLM calls to perform a task, which makes them rather slow in practice. In the case where the logic to perform a task is a fixed sequence of steps, agents are not an efficient option.
  • Prompt chain – A solution that finds a good trade-off between the two previous approaches involves using a prompt chaining technique. This method breaks the complex problem into a series of more manageable steps and integrates them to craft the final code. Each step has clear instructions that are easier for the LLM to follow, and a human in the loop can control the output of each step and correct the LLM if needed. This approach strikes a balance between flexibility and control, avoiding the extremes of the other two methods.

We initially tested the LLM’s ability to generate connector code based on a single prompt and realized that it struggles to generate code that addresses all aspects of the problem, such as pagination or nested data structures. To make sure the LLM would cover all the necessary components of the connector functions, and because creating a connector follows a fixed sequence of steps, prompt chaining was the most natural approach to improve the generated code.

The chain we used for connector generation consists of the following high-level steps:

  1. Parse the data model of the API response into prescribed TypeScript classes.
  2. Generate the function for user flattening in the format expected by the connector interface.
  3. Understand the pagination of the API specs and formulate a high-level solution.
  4. Generate the code for the ListUsers function by combining all the intermediate steps.

Step 1 is used as an input to Step 2, but Step 3 is separate. Both Step 2 and Step 3 results are fed to Step 4 for the final result. The following diagram illustrates this workflow.

In the following sections, we will dive into the prompting techniques we used for each of these steps.

System prompt

The system prompt is an essential component of LLM prompting that typically provides the initial context to guide the model’s response. For all the prompts in the chain, we used the following system prompt:

"""
You are an expert web developer who has an excellent understanding of REST APIs and backend
API development using TypeScript. Your goal is to understand API specification provided in 
OpenAPI specification or Swagger specification as YAML files and to write TypeScript code,
XML, API descriptions etc. as requested in the task description. Do not deviate from the 
provided specification and documentation and if something is unclear in the specification then 
mention your uncertainty in the comments do not make up information that is not available in 
the specs or description.

When writing TypeScript code, provide minimal but meaningful comments.
"""

More specifically, the system prompt is used to establish the role of the LLM (expert web developer), give it a general goal (understand API specs and write TypeScript code), give high-level instructions (add comments in the code) and set boundaries (do not make up information).

Data model parsing

In this step, we prompt the LLM to understand the structure of the API response and create TypeScript classes corresponding to the objects in the response. Although this step isn’t strictly necessary for generating the response, it can help the LLM immensely in generating a correct connector. Similar to chain-of-thought reasoning for arithmetic problems, it is forcing the LLM to “think” before responding.

This step offers two primary benefits:

  • Verbose API response simplification – API responses specified in the documentation can be quite verbose. By converting the response structure into TypeScript classes, we compress the information into fewer lines of code, making it more concise and less complicated for the LLM to comprehend. This step helps ensure that the essential information is prominently displayed at the start.
  • Handling fragmented user responses – In some APIs, the user response is composed of several fragments because of the reuse of data structures. The OpenAPI specification uses the $ref tag to reference these reusable components. By converting the user response into TypeScript classes, we can consolidate all the relevant information into a single location. This consolidation simplifies the downstream steps by providing a centralized source of information.

We use the following task prompt to convert the API response into prescribed TypeScript classes:

"""
You are given an API spec of OpenAPI or Swagger for a REST API endpoint
that serves a list of users for Software as a Service (SaaS) application. You will 
be asked to extract the structure of the User in the JSON response from the API endpoint. 

Here is the API spec provided between the XML tags <api-spec> </api-spec>. 
Understand and remember the API spec well.

<api-spec>
{api_spec}
</api-spec>

Use the following instructions to create TypeScript interfaces based on the structure 
of the User. 
<instructions>
- Provide the code in between <data-model> </data-model> XML tags. 
- If there are any nested objects, expand them into their own interfaces.
- Be comprehensive and include all attributes.
- Retrieve all attributes, including those marked as not mandatory, not required, or nullable.
- The attributes are listed under `properties` section.
- Output only one User interface that includes all the attributes from any interfaces it extends.
</instructions>

The expected format of the output is as follows: 
<data-model> 
// User
interface User {{
  id: number;
  first_name: string;
  last_name: string;
  email: string;
  is_active: boolean;
  company_groups_ids: number[];
  other_attribute: string;
  nested_type: NestedType[];
}}

// Some complex type
interface NestedType {{
    type_id: string;
    some_string_attribute: string;
    some_number_attribute: number;
}}

</data-model>
"""

In the preceding prompt template, the variable {api_spec} is replaced with the API specification of the endpoint. A specific example for a DocuSign ListUsers endpoint is provided in the appendix.

The following code is an example of the LLM-generated classes when applied to the DocuSign API specs. This has been parsed out of the <data-model> tags.

// User
interface User {
    id: string; // format: uuid
    user_name: string;
    first_name: string;
    last_name: string;
    user_status: string; // One of: 'active' | 'created' | 'closed'
    membership_status: string; // One of: 'activation_required' | 'activation_sent' | 'active' | 'closed' | 'disabled'
    email: string;
    created_on: string; // format: date-time
    membership_created_on: string; // format: date-time
    ds_groups: DsGroup[];
}
  
// DsGroup
interface DsGroup {
    ds_group_id: string; // format: uuid
    account_id: string; // format: uuid
    source_product_name: string;
    group_id: string;
    group_name: string;
    description: string;
    is_admin: boolean;
    last_modified_on: string; // format: date-time
    user_count: number; // format: int32
    external_account_id: number; // format: int64
    account_name: string;
    membership_id: string; // format: uuid
}

User flattening function generation

The expected structure for each user is an object consisting of two properties: an identifier and a dictionary of attributes. The attributes dictionary is a map that associates string keys with either primitive attributes (number, Boolean, or string) or an array of primitive attributes. because of the potential for arbitrarily nested JSON object structures in the response, we use the capabilities of an LLM to generate a user flattening and conversion function. Both the user ID and the attributes are extracted from the response. By employing this approach, we effectively separate the intricate task of converting the user structure from the REST API response into the required format for the SailPoint connector SDK (hereafter referred to as the connector SDK).

The benefits of this approach are twofold. First, it allows for a cleaner and more modular code design, because the complex conversion process is abstracted away from the main code base. Second, it enables greater flexibility and adaptability, because the conversion function can be modified or regenerated to accommodate changes in the API response structure or the connector SDK requirements, without necessitating extensive modifications to the surrounding code base.

We use the following prompt to generate the conversion function, which takes as input the data model generated in the previous step:

"""
Understand and remember the following data model provided 
as a set of TypeScript class definitions.
You will be required to generate a TypeScript function based on the 
data model provided between <data-model> </data-model> XML tags.

<data-model>
{data_model}
</data-model>

DO NOT use the TypeScript interfaces defined above in the code you write, 
as they will be unavailable. Substitute them with the type `any` where required 
to write code that works with strict TypeScript.

Given the TypeScript class definitions and data model above, your goals is to
write a TypeScript function that transforms the user into an object containing two
keys - an `identity` and `attributes`. The attributes is a list of attributes
other than the identifier attribute.

Here are the rules for mapping attributes from the user object to attributes field:

<instructions-for-mapping>
- The function takes in a user and return StdAccountListOutput.
- Extract all attributes specified in the user schema above in the <data-model>
- attributes can only contain either primitives values or array of primitive values.
- Determine the type of the attribute from the <data-model> above. Do not guess it from the
names of the attributes. E.g. if the name is plural don't assume it is an array.
- All primitive attributes such as string and numbers are copied without transformation.
- All arrays of primitive attributes are copied without transformation.
- All objects are flattened out and their attributes are copied as simple attributes.
- All arrays of objects are converted into multiple attributes where each attribute is an array of primitive type. 
See further instruction below between the XML tags <object-array-instructions>.
- Use `any` type in functions and arrow function definitions so that it works with TypeScript.
</instructions-for-mapping>


<object-array-instructions>

Consider the following structure of the response where `obj_array` is an attribute that is array of objects of type `MyObj`.
Then in the flattening process, for the response, you will need to convert the `obj_array` attribute 
into multiple attributes like `obj_array_attribute1`, `obj_array_attribute2` each of which is an array of strings in this case.

interface User {{
  id: string; // format: uuid
  user_name: string;
  first_name: string;
  obj_array: MyObject[];
}}

interface MyObj {{
  attribute1: string; // format: uuid
  attribute2: string; // format: uuid
  attribute3: string;
}}
 </object-array-instructions>


Output the code according to the template program below enclosed between the <user-mapping-function> XML tags.

This function is part of a larger program so keep the StdAccountListOutput in the function as it is.
Do not overcomplicate and write simple, well readable code. 

<user-mapping-function>

function flattenUser(user: any): StdAccountListOutput {{
  return {{
    identity: user.id,
    attributes: {{
      first_name: user.first_name,
      last_name: user.last_name,
      user_type: user.type,
      login: user.login,
      email: user.email,
      location: user.location,
      employee_id: user.employee_id,
      reviewer_id: user.reviewer_id,
      is_active: user.is_active,
      struct_array_attribute: user.struct_array.map( (val: any) => val.struct_attribute)
    }}
  }}
}};

</user-mapping-function>
"""

In the preceding prompt template, we replace the {data_model} variable with the data model of TypeScript classes extracted in the previous generation step of parsing the data model.

The following code is an example of the LLM-generated user flattening function when applied to the DocuSign API:

function flattenUser(user: any): StdAccountListOutput {
    const identity = user.id;
    const attributes: { [key: string]: any } = {};
  
    // Copy primitive attributes
    attributes.user_name = user.user_name;
    attributes.first_name = user.first_name;
    attributes.last_name = user.last_name;
    attributes.user_status = user.user_status;
    attributes.membership_status = user.membership_status;
    attributes.email = user.email;
    attributes.created_on = user.created_on;
    attributes.membership_created_on = user.membership_created_on;
  
    // Convert array of objects to multiple attributes
    const dsGroupAttributes: { [key: string]: any[] } = {};
    user.ds_groups.forEach((dsGroup: any) => {
      dsGroupAttributes.ds_group_id = dsGroupAttributes.ds_group_id || [];
      dsGroupAttributes.ds_group_id.push(dsGroup.ds_group_id);
  
      dsGroupAttributes.account_id = dsGroupAttributes.account_id || [];
      dsGroupAttributes.account_id.push(dsGroup.account_id);
  
      dsGroupAttributes.source_product_name = dsGroupAttributes.source_product_name || [];
      dsGroupAttributes.source_product_name.push(dsGroup.source_product_name);
  
      dsGroupAttributes.group_id = dsGroupAttributes.group_id || [];
      dsGroupAttributes.group_id.push(dsGroup.group_id);
  
      dsGroupAttributes.group_name = dsGroupAttributes.group_name || [];
      dsGroupAttributes.group_name.push(dsGroup.group_name);
  
      dsGroupAttributes.description = dsGroupAttributes.description || [];
      dsGroupAttributes.description.push(dsGroup.description);
  
      dsGroupAttributes.is_admin = dsGroupAttributes.is_admin || [];
      dsGroupAttributes.is_admin.push(dsGroup.is_admin);
  
      dsGroupAttributes.last_modified_on = dsGroupAttributes.last_modified_on || [];
      dsGroupAttributes.last_modified_on.push(dsGroup.last_modified_on);
  
      dsGroupAttributes.user_count = dsGroupAttributes.user_count || [];
      dsGroupAttributes.user_count.push(dsGroup.user_count);
  
      dsGroupAttributes.external_account_id = dsGroupAttributes.external_account_id || [];
      dsGroupAttributes.external_account_id.push(dsGroup.external_account_id);
  
      dsGroupAttributes.account_name = dsGroupAttributes.account_name || [];
      dsGroupAttributes.account_name.push(dsGroup.account_name);
  
      dsGroupAttributes.membership_id = dsGroupAttributes.membership_id || [];
      dsGroupAttributes.membership_id.push(dsGroup.membership_id);
    });
  
    Object.assign(attributes, dsGroupAttributes);
  
    return {
      identity,
      attributes
    };
}

Pagination understanding

As mentioned earlier, the REST API can implement one or more pagination schemes. Often, the pagination details aren’t explicitly mentioned. During the development of the chain, we found that when there are multiple pagination schemes, the LLM would mix up elements of different pagination schemes and output code that isn’t coherent and sometimes also contains errors. Because looping over the paged results is a crucial step, we separate out this step in the code generation to let the LLM understand the pagination scheme implemented by the API and formulate its response at a high level before outputting the code. This allows the LLM to think step by step in formulating the response. This step generates the intermediate reasoning, which is fed into the next and final step: generating the list users function code.

We use the following prompt to get the pagination logic. Because we’re using Anthropic’s Claude Sonnet on Amazon Bedrock, we ask the LLM to output the logic in XML format, which is known to be an efficient way to structure information for that model.

"""
Understand and remember the following OpenAPI specification provided between the
<api-spec> </api-spec> XML tags. You will answer questions based on this specification,
which describes an endpoint for listing users from a SaaS application. 

<api-spec>
{api_spec}
</api-spec>

In addition to the specification, use the following information about the API to 
understand the details that are not available in the spec. The details
are included in between XML tags <api-info> </api-info>.

<api-info>
{api_info}
</api_info>

The list users API is used to obtain a list of users from the API. This API
may implement one or more pagination schemes to retrieve the list of users. 
Pagination is used to retrieve pages of results one at a time instead of 
all at once.

Your goal is to understand the multiple pagination schemes implemented 
in the API end point and use the simplest scheme to loop over the pages
to retrieve all the users. 

First describe in details how the pagination is implemented by the API 
in a step by step manner. Include the details of how to get the first 
page and then how to get the subsequent pages. Provide the output in between
the XML tags <pagination-logic> </pagination-logic>. If the API does not 
implement pagination then output "No pagination implemented".

If the API implements multiple pagination schemes: 
- Select the easiest scheme and implement using only that one scheme. 
- If the API response contains a next URL either in the response body or the 
headers, prefer that pagination scheme over more complex offset-based schemes.

If there is insufficient or missing information in the specs then do not
make up or guess it. Specify it in the output.

Output the results in the following format

<pagination-logic>
    <pagination-schemes> list all the pagination schemes <pagination-schemes>
    <selected-scheme> select one scheme to implement looping </selected-scheme>
    <steps-first-page> steps to get first page </steps-first-page>
    <steps-next> steps to get next page </steps-next>
    <termination-logic> condition for exiting loop </termination-logic>
    <information-complete> yes/no </information-complete>
</pagination-logic>
"""

In the preceding prompt template, the variable {api_spec} is replaced with the API specification. An example of the DocuSign API is provided in the appendix at the end of this post. The variable {api_info} can be replaced with additional API documentation in natural language, which is left as an empty string in the DocuSign example.

The following is the LLM’s response for the pagination logic extraction in the case of the DocuSign API, parsed out of the <pagination-logic> tags:

"""
<pagination-schemes>
1. Offset-based pagination using the start and take query parameters. 
2. Index-based pagination using the start and end query parameters. 
3. Link-based pagination using the next and previous links in the response.
</pagination-schemes>

<selected-scheme>
Link-based pagination using the next and previous links in the response.
</selected-scheme>

<steps-first-page>
1. Make a GET request to the /v2/organizations/{organizationId}/users endpoint without any pagination parameters. 
2. The response will contain the first page of results and a next link for the next page (if there are more results).
</steps-first-page>

<steps-next>
1. Check if the response contains a next link. 
2. If the next link is present, make a GET request to the URL provided in the next link. 
3. The response will contain the next page of results and a next link for the subsequent page (if there are more results).
</steps-next>

<termination-logic>
Continue making requests to the next link until the response no longer contains a next link, indicating that there are no more pages.
</termination-logic>

<information-complete>
yes
</information-complete>
"""

ListUsers function generation

This final step in the chain combines the information extracted in the previous steps in addition to the user flattening function generated in the previous steps to formulate the final response, which is the TypeScript function that retrieves a list of users from the provided API.

We use the following prompt to generate the complete TypeScript function:

"""
Your task is to write a TypeScript program to retrieve a list of users from REST API. Detailed
instructions for the task are provided below. The API typically returns a set of users at a
time, you will have to paginate through the results to retrieve all users while there are more 
results remaining.

Understand and remember the following OpenAPI YAML specification provided between the
<api-spec> </api-spec> XML tags you will be asked to write a TypeScript program based on it.

<api-spec>
{api_spec}
</api-spec>

In addition to the specs the following documentation and implementation details about the
API are provided in between the XML tags <api-info> </api-info>.

<api-info>
{api_info}
</api_info>

The following pagination logic specified in between <pagination-logic> </pagination-logic> XML 
tags provides high level details on  how to implement looping over paginated results to retrieve 
all the users. Select the pagination according to the preference mentions in the pagination logic below.

<pagination-logic>
{pagination_logic}
</pagination-logic>


Now, implement a TypeScript function that retrieves all the users following the instructions below
<instructions>
- Do not modify the `flattenUser` function and reproduce it as it is.
- Change only the BODY of `listUsers` function but do not modify the function signature as it is
part of the large program that expects this function signature.
- Do not create or invent any new types.
- For pagination adhere to the instructions provided in <pagination-logic> and <api-spec>
- If pagination requires number of results, then fetch 50 users at a time.
- Assume that `context.hostUrl` from the input context will provide the API hostname and port in the 
format "hostname:port_number" or "host_ip_address:port_number".
- If the API uses a Link based pagination then you typically do not need to reconstruct the url using
`hostUrl` as the complete link will be provided in the next url or in the link headers.
- replace the `endpoint_url` with the endpoint url provided in the specs.
- You can assume that any required values such as `apiKey` will be provided in the `context` variable.
- Change the authorization headers to match the information in the headers.
- Assume that any parameters that are required to construct the path are supplied through `context` variable. The names of such variable 
will match exactly with the variable specified in the templated path in the API specs.
</instructions>

Here is the example output

<example-template>
import {{ Context, Response, StdAccountListHandler, StdAccountListOutput }} from '@sailpoint/connector-sdk';

{flatten_user_function}

const listUsers: StdAccountListHandler = async (context: Context, input: undefined, res: Response<StdAccountListOutput>) => {{
    let apiKey = context.apiKey;
    let hostUrl = context.hostUrl;
    let hasMore = true;

    let url = 'http://${{hostUrl}}/{{endpoint_path}}';
    while (hasMore) {{

        // fetch response from the endpoint
        let response = await fetch(url, {{
          headers: {{
            'Authorization': `${{apiKey}}`
          }}
        }});
        let results = await response.json();

        // get the users from the returned results
        let users = results.users;
        for (let user of users) {{
            flattened_user = flattenUser(user)
            res.send(flattened_user);
        }}

        // Set the url to the "next" URL from the response if it is available
        if (results.next) {{
          url = results.next;
        }} else {{
          hasMore = false;
        }}
    }}
}}
</example-template>

Output the generated code between the XML tags <list-users-code> </list-users-code> and 
only output the code without any additional formatting like triple quotes.
"""

In this prompt, we replace {flatten_user_function} with the flattenUser that was generated earlier and {pagination_logic} with the one that was generated earlier. We provide a template for the listUsers function to make sure the final output meets the requirements for the connector function. The resulting output is the following listUsers function, which uses the flattenUser function from earlier:

const listAccounts: StdAccountListHandler = async (context: Context, input: undefined, res: Response<StdAccountListOutput>) => {
    let apiKey = context.apiKey;
    let hostUrl = context.hostUrl;
    let organizationId = context.organizationId;
    let hasMore = true;

    let url = `https://${hostUrl}/Management/v2/organizations/${organizationId}/users`;
    while (hasMore) {

        // fetch response from the endpoint
        let response = await fetch(url, {
          headers: {
            'Authorization': `Bearer ${apiKey}`
          }
        });
        let results = await response.json();

        // get the users from the returned results
        let users = results.users;
        for (let user of users) {
            let flattened_user = flattenUser(user)
            res.send(flattened_user);
        }

        // Set the url to the "next" URL from the response if it is available
        if (results.paging.next) {
          url = results.paging.next;
        } else {
          hasMore = false;
        }
    }
}

Lessons learned

In this post, we demonstrated how LLMs can address complex code generation problems by employing various core prompting principles and the prompt chaining technique. Although LLMs excel at following clearly defined instructions and generating small code snippets, this use case involved a substantial amount of contextual information in the form of API specifications and user instructions. Our findings from this exercise are the following:

  • Decomposing complex problems – Breaking down a complex code generation problem into several intermediate steps of lower complexity enhances the LLM’s performance. Providing a single complex prompt can result in the LLM missing some instructions. The prompt chaining approach enhances the robustness of the generation, maintaining better adherence to instructions.
  • Iterative optimization – This method allows for iterative optimization of intermediate steps. Each part of the chain can be refined independently before moving to the next step. LLMs can be sensitive to minor changes in instructions, and adjusting one aspect can unintentionally affect other objectives. Prompt chaining offers a systematic way to optimize each step independently.
  • Handling complex decisions – In the section on understanding pagination, we illustrated how LLMs can reason through various options and make complex decisions before generating code. For instance, when the input API specification supports multiple pagination schemes, we prompted the LLM to decide on the pagination approach before implementing the code. With direct code generation, without using an intermediate reasoning step, the LLM tended to mix elements of different pagination schemes, resulting in inconsistent output. By forcing decision-making first, in natural language, we achieved more consistent and accurate code generation.

Through automated code generation, SailPoint was able to dramatically reduce connector development time from hours or days to mere minutes. The approach also democratizes code development, so you don’t need deep TypeScript expertise or intimate familiarity with SailPoint’s connector SDK. By accelerating connector generation, SailPoint significantly shortens the overall customer onboarding process. This streamlined workflow not only saves valuable developer time but also enables faster integration of diverse systems, ultimately allowing customers to use SailPoint’s identity security solutions more rapidly and effectively.

Conclusion

Our AI-powered solution for generating connector code opens up new possibilities for integrating with REST APIs. By automating the creation of connectors from API specifications, developers can rapidly build robust connections to any REST API, saving developer time and reducing the time to value for onboarding new customers. As demonstrated in this post, this technology can significantly streamline the process of working with diverse APIs, allowing teams to focus on using the data and functionality these APIs provide rather than getting overwhelmed by connector code details. Consider how such a solution could enhance your own API integration efforts—it could be the key to more efficient and effective use of the myriad APIs available in today’s interconnected digital landscape.


About the Authors

Erik Huckle is the product lead for AI at SailPoint, where he works to solve critical customer problems in the identity security ecosystem through generative AI and data technologies. Prior to SailPoint, Erik co-founded a startup in robotic automation and later joined AWS as the first product hire at Amazon One. Erik mentors local startups and serves as a board member and tech committee lead for a EdTech nonprofit organization.

Tyler McDonnell is the engineering head of AI at SailPoint, where he leads the development of AI solutions to drive innovation and impact in identity security world. Prior to SailPoint, Tyler led machine learning research and engineering teams at several early to late-stage startups and published work in domains spanning software maintenance, information retrieval, and deep learning. He’s passionate about building products that use AI to bring positive impact to real people and problems.

Anveshi Charuvaka is a Senior Applied Scientist at the Generative AI Innovation Center, where he helps customers adopt Generative AI by implementing solutions for their critical business challenges. With a PhD in Machine Learning and over a decade of experience, he specializes in applying innovative machine learning and generative AI techniques to address complex real-world problems.

Aude Genevay is a Senior Applied Scientist at the Generative AI Innovation Center, where she helps customers tackle critical business challenges and create value using generative AI. She holds a PhD in theoretical machine learning and enjoys turning cutting-edge research into real-world solutions.

Mofijul Islam is an Applied Scientist II at the AWS Generative AI Innovation Center, where he helps customers tackle complex, customer-centric research challenges using generative AI, large language models (LLM), multi-agent learning, and multimodal learning. He holds a PhD in machine learning from the University of Virginia, where his work focused on multimodal machine learning, multilingual NLP, and multitask learning. His research has been published in top-tier conferences like NeurIPS, ICLR, AISTATS, and AAAI, as well as IEEE and ACM Transactions.

Yasin Khatami is a Senior Applied Scientist at the Generative AI Innovation Center. With more than a decade of experience in artificial intelligence (AI), he implements state-of-the-art AI products for AWS customers to drive efficiency and value for customer platforms. His expertise is in generative AI, large language models (LLM), multi-agent techniques, and multimodal learning.

Karthik Ram is a Principal Solutions Architect with Amazon Web Services based in Columbus, Ohio. He works with Independent Software Vendors (ISVs) to build secure and innovative cloud solutions, including helping with their products and solving their business problems using data-driven approaches. Karthik’s area of depth is Cloud Security with a focus on Infrastructure Security and threat detection.


Appendix

The following API specifications were used for the experiments in this post:

Copyright (c) 2017- DocuSign, Inc. (https://www.docusign.com)
swagger: '2.0'
info:
  title: DocuSign API
  version: v2.1
host: api.docusign.net
basePath: "/Management"
schemes:
- https
consumes:
- application/json
produces:
- application/json
paths:
  "/v2/organizations/{organizationId}/users":
    get:
      tags:
      - Users
      summary: Returns information about the users in an organization.
      description: |-
        Returns information about the users in an organization.

        You must include at least one of the following query parameters in the request:

        - `account_id`: The ID of an account associated with the organization.
        - `organization_reserved_domain_id`: The ID of one of the organization's reserved domains.
        - `email`: An email address associated with the users that you want to return.
      operationId: OrganizationUser_OrganizationUsers_GetV2
      produces:
      - application/json
      parameters:
      - name: organizationId
        in: path
        description: The organization ID Guid
        required: true
        type: string
        format: uuid
      - name: start
        in: query
        description: Index of first item to include in the response. The default value
          is 0.
        required: false
        type: integer
        format: int32
      - name: take
        in: query
        description: Page size of the response. The default value is 20.
        required: false
        type: integer
        format: int32
      - name: end
        in: query
        description: Index of the last item to include in the response. Ignored if
          `take` parameter is specified.
        required: false
        type: integer
        format: int32
      - name: email
        in: query
        description: Email address of the desired user. At least one of `email`, `account_id`
          or `organization_reserved_domain_id` must be specified.
        required: false
        type: string
      - name: email_user_name_like
        in: query
        description: Selects users by pattern matching on the user's email address
        required: false
        type: string
      - name: status
        in: query
        description: Status.
        required: false
        type: string
      - name: membership_status
        in: query
        description: |
          The user's membership status. One of:

          - `activation_required`
          - `activation_sent`
          - `active`
          - `closed`
          - `disabled`
        required: false
        type: string
      - name: account_id
        in: query
        description: Select users that are members of the specified account. At least
          one of `email`, `account_id` or `organization_reserved_domain_id` must be
          specified.
        required: false
        type: string
        format: uuid
      - name: organization_reserved_domain_id
        in: query
        description: Select users that are in the specified domain. At least one of
          `email`, `account_id` or `organization_reserved_domain_id` must be specified.
        required: false
        type: string
        format: uuid
      - name: last_modified_since
        in: query
        description: Select users whose data have been modified since the date specified.
          `account_id` or `organization_reserved_domain_id` must be specified.
        required: false
        type: string
      responses:
        '200':
          description: OK
          schema:
            type: object
            properties:
              users:
                type: array
                items:
                  type: object
                  properties:
                    id:
                      format: uuid
                      type: string
                      example: 00000000-0000-0000-0000-000000000000
                      description: The user's unique ID.
                    user_name:
                      type: string
                      description: The full name of the user.
                    first_name:
                      type: string
                      description: The user's first name.
                    last_name:
                      type: string
                      description: The user's last name.
                    user_status:
                      type: string
                      description: |
                        The user's status. One of:

                        - `active`
                        - `created`
                        - `closed`
                    membership_status:
                      type: string
                      description: |
                        The user's membership status. One of:

                        - `activation_required`
                        - `activation_sent`
                        - `active`
                        - `closed`
                        - `disabled`
                    email:
                      type: string
                      description: The email address.
                    created_on:
                      format: date-time
                      type: string
                      description: The date the user's account was created.
                    membership_created_on:
                      format: date-time
                      type: string
                      description: The date on which the user became a member of the organization.
                    ds_groups:
                      type: array
                      items:
                        type: object
                        properties:
                          ds_group_id:
                            format: uuid
                            type: string
                            example: 00000000-0000-0000-0000-000000000000
                            description: ''
                          account_id:
                            format: uuid
                            type: string
                            example: 00000000-0000-0000-0000-000000000000
                            description: Select users that are members of the specified account. At least
                              one of `email`, `account_id` or `organization_reserved_domain_id` must be
                              specified.
                          source_product_name:
                            type: string
                          group_id:
                            type: string
                          group_name:
                            type: string
                          description:
                            type: string
                          is_admin:
                            type: boolean
                          last_modified_on:
                            format: date-time
                            type: string
                          user_count:
                            format: int32
                            type: integer
                          external_account_id:
                            format: int64
                            type: integer
                          account_name:
                            type: string
                          membership_id:
                            format: uuid
                            type: string
                            example: 00000000-0000-0000-0000-000000000000
                  description: Information about a user.
                description: A list of users.
              paging:
                  type: object
                  properties:
                    result_set_size:
                      format: int32
                      type: integer
                      description: The number of items in a result set (page).
                    result_set_start_position:
                      format: int32
                      type: integer
                      description: The index position of the first result in this set.
                    result_set_end_position:
                      format: int32
                      type: integer
                      description: The index position of the last result in this set.
                    total_set_size:
                      format: int32
                      type: integer
                      description: The total number of results.
                    next:
                      type: string
                      description: 'A URL to the next set of results. '
                    previous:
                      type: string
                      description: 'A URL to the previous set of results. '
                  description: Contains information about paging through the results.
            description: A response containing information about users.

Read More

Map Earth’s vegetation in under 20 minutes with Amazon SageMaker

Map Earth’s vegetation in under 20 minutes with Amazon SageMaker

In today’s rapidly changing world, monitoring the health of our planet’s vegetation is more critical than ever. Vegetation plays a crucial role in maintaining an ecological balance, providing sustenance, and acting as a carbon sink. Traditionally, monitoring vegetation health has been a daunting task. Methods such as field surveys and manual satellite data analysis are not only time-consuming, but also require significant resources and domain expertise. These traditional approaches are cumbersome. This often leads to delays in data collection and analysis, making it difficult to track and respond swiftly to environmental changes. Furthermore, the high costs associated with these methods limit their accessibility and frequency, hindering comprehensive and ongoing global vegetation monitoring efforts at a planetary scale. In light of these challenges, we have developed an innovative solution to streamline and enhance the efficiency of vegetation monitoring processes on a global scale.

Transitioning from the traditional, labor-intensive methods of monitoring vegetation health, Amazon SageMaker geospatial capabilities offer a streamlined, cost-effective solution. Amazon SageMaker supports geospatial machine learning (ML) capabilities, allowing data scientists and ML engineers to build, train, and deploy ML models using geospatial data. These geospatial capabilities open up a new world of possibilities for environmental monitoring. With SageMaker, users can access a wide array of geospatial datasets, efficiently process and enrich this data, and accelerate their development timelines. Tasks that previously took days or even weeks to accomplish can now be done in a fraction of the time.

In this post, we demonstrate the power of SageMaker geospatial capabilities by mapping the world’s vegetation in under 20 minutes. This example not only highlights the efficiency of SageMaker, but also its impact how geospatial ML can be used to monitor the environment for sustainability and conservation purposes.

Identify areas of interest

We begin by illustrating how SageMaker can be applied to analyze geospatial data at a global scale. To get started, we follow the steps outlined in Getting Started with Amazon SageMaker geospatial capabilities. We start with the specification of the geographical coordinates that define a bounding box covering the areas of interest. This bounding box acts as a filter to select only the relevant satellite images that cover the Earth’s land masses.

import os
import json
import time
import boto3
import geopandas
from shapely.geometry import Polygon
import leafmap.foliumap as leafmap
import sagemaker
import sagemaker_geospatial_map

session = boto3.Session()
execution_role = sagemaker.get_execution_role()
sg_client = session.client(service_name="sagemaker-geospatial")
cooridinates =[
    [-179.034845, -55.973798],
    [179.371094, -55.973798],
    [179.371094, 83.780085],
    [-179.034845, 83.780085],
    [-179.034845, -55.973798]
]           
polygon = Polygon(cooridinates)
world_gdf = geopandas.GeoDataFrame(index=[0], crs='epsg:4326', geometry=[polygon])
m = leafmap.Map(center=[37, -119], zoom=4)
m.add_basemap('Esri.WorldImagery')
m.add_gdf(world_gdf, layer_name="AOI", style={"color": "red"})
m

Sentinel 2 coverage of Earth's land mass

Data acquisition

SageMaker geospatial capabilities provide access to a wide range of public geospatial datasets, including Sentinel-2, Landsat 8, Copernicus DEM, and NAIP. For our vegetation mapping project, we’ve selected Sentinel-2 for its global coverage and update frequency. The Sentinel-2 satellite captures images of Earth’s land surface at a resolution of 10 meters every 5 days. We pick the first week of December 2023 in this example. To make sure we cover most of the visible earth surface, we filter for images with less than 10% cloud coverage. This way, our analysis is based on clear and reliable imagery.

search_rdc_args = {
    "Arn": "arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8", # sentinel-2 L2A
    "RasterDataCollectionQuery": {
        "AreaOfInterest": {
            "AreaOfInterestGeometry": {
                "PolygonGeometry": {
                    "Coordinates": [
                        [
                            [-179.034845, -55.973798],
                            [179.371094, -55.973798],
                            [179.371094, 83.780085],
                            [-179.034845, 83.780085],
                            [-179.034845, -55.973798]
                        ]
                    ]
                }
            }
        },
        "TimeRangeFilter": {
            "StartTime": "2023-12-01T00:00:00Z",
            "EndTime": "2023-12-07T23:59:59Z",
        },
        "PropertyFilters": {
            "Properties": [{"Property": {"EoCloudCover": {"LowerBound": 0, "UpperBound": 10}}}],
            "LogicalOperator": "AND",
        },
    }
}

s2_items = []
s2_tile_ids = []
s2_geometries = {
    'id': [],
    'geometry': [],
}
while search_rdc_args.get("NextToken", True):
    search_result = sg_client.search_raster_data_collection(**search_rdc_args)
    for item in search_result["Items"]:
        s2_id = item['Id']
        s2_tile_id = s2_id.split('_')[1]
        # filtering out tiles cover the same area
        if s2_tile_id not in s2_tile_ids:
            s2_tile_ids.append(s2_tile_id)
            s2_geometries['id'].append(s2_id)
            s2_geometries['geometry'].append(Polygon(item['Geometry']['Coordinates'][0]))
            del item['DateTime']
            s2_items.append(item)  

    search_rdc_args["NextToken"] = search_result.get("NextToken")

print(f"{len(s2_items)} unique Sentinel-2 images found.")

By utilizing the search_raster_data_collection function from SageMaker geospatial, we identified 8,581 unique Sentinel-2 images taken in the first week of December 2023. To validate the accuracy in our selection, we plotted the footprints of these images on a map, confirming that we had the correct images for our analysis.

s2_gdf = geopandas.GeoDataFrame(s2_geometries)
m = leafmap.Map(center=[37, -119], zoom=4)
m.add_basemap('OpenStreetMap')
m.add_gdf(s2_gdf, layer_name="Sentinel-2 Tiles", style={"color": "blue"})
m

Sentinel 2 image footprints

SageMaker geospatial processing jobs

When querying data with SageMaker geospatial capabilities, we received comprehensive details about our target images, including the data footprint, properties around spectral bands, and hyperlinks for direct access. With these hyperlinks, we can bypass traditional memory and storage-intensive methods of first downloading and subsequently processing images locally—a task made even more daunting by the size and scale of our dataset, spanning over 4 TB. Each of the 8,000 images are large in size, have multiple channels, and are individually sized at approximately 500 MB. Processing multiple terabytes of data on a single machine would be time-prohibitive. Although setting up a processing cluster is an alternative, it introduces its own set of complexities, from data distribution to infrastructure management. SageMaker geospatial streamlines this with Amazon SageMaker Processing. We use the purpose-built geospatial container with SageMaker Processing jobs for a simplified, managed experience to create and run a cluster. With just a few lines of code, you can scale out your geospatial workloads with SageMaker Processing jobs. You simply specify a script that defines your workload, the location of your geospatial data on Amazon Simple Storage Service (Amazon S3), and the geospatial container. SageMaker Processing provisions cluster resources for you to run city-, country-, or continent-scale geospatial ML workloads.

For our project, we’re using 25 clusters, with each cluster comprising 20 instances, to scale out our geospatial workload. Next, we divided the 8,581 images into 25 batches for efficient processing. Each batch contains approximately 340 images. These batches are then evenly distributed across the machines in a cluster. All batch manifests are uploaded to Amazon S3, ready for the processing job, so each segment is processed swiftly and efficiently.

def s2_item_to_relative_metadata_url(item):
    parts = item["Assets"]["visual"]["Href"].split("/")
    tile_prefix = parts[4:-1]
    return "{}/{}.json".format("/".join(tile_prefix), item["Id"])


num_jobs = 25
num_instances_per_job = 20 # maximum 20

manifest_list = {}
for idx in range(num_jobs):
    manifest = [{"prefix": "s3://sentinel-cogs/sentinel-s2-l2a-cogs/"}]
    manifest_list[idx] = manifest
# split the manifest for N processing jobs
for idx, item in enumerate(s2_items):
    job_idx = idx%num_jobs
    manifest_list[job_idx].append(s2_item_to_relative_metadata_url(item))
    
# upload the manifest to S3
sagemaker_session = sagemaker.Session()
s3_bucket_name = sagemaker_session.default_bucket()
s3_prefix = 'processing_job_demo'
s3_client = boto3.client("s3")
s3 = boto3.resource("s3")

manifest_dir = "manifests"
os.makedirs(manifest_dir, exist_ok=True)

for job_idx, manifest in manifest_list.items():
    manifest_file = f"{manifest_dir}/manifest{job_idx}.json"
    s3_manifest_key = s3_prefix + "/" + manifest_file
    with open(manifest_file, "w") as f:
        json.dump(manifest, f)

    s3_client.upload_file(manifest_file, s3_bucket_name, s3_manifest_key)
    print("Uploaded {} to {}".format(manifest_file, s3_manifest_key))

With our input data ready, we now turn to the core analysis that will reveal insights into vegetation health through the Normalized Difference Vegetation Index (NDVI). NDVI is calculated from the difference between Near-infrared (NIR) and Red reflectances, normalized by their sum, yielding values that range from -1 to 1. Higher NDVI values signal dense, healthy vegetation, a value of zero indicates no vegetation, and negative values usually point to water bodies. This index serves as a critical tool for assessing vegetation health and distribution. The following is an example of what NDVI looks like.

Sentinel 2 true color image and NDVI

%%writefile scripts/compute_vi.py

import os
import rioxarray
import json
import gc
import warnings

warnings.filterwarnings("ignore")

if __name__ == "__main__":
    print("Starting processing")

    input_path = "/opt/ml/processing/input"
    output_path = "/opt/ml/processing/output"
    input_files = []
    items = []
    for current_path, sub_dirs, files in os.walk(input_path):
        for file in files:
            if file.endswith(".json"):
                full_file_path = os.path.join(input_path, current_path, file)
                input_files.append(full_file_path)
                with open(full_file_path, "r") as f:
                    items.append(json.load(f))

    print("Received {} input files".format(len(input_files)))

    for item in items:
        print("Computing NDVI for {}".format(item["id"]))
        red_band_url = item["assets"]["red"]["href"]
        nir_band_url = item["assets"]["nir"]["href"]
        scl_mask_url = item["assets"]["scl"]["href"]
        red = rioxarray.open_rasterio(red_band_url, masked=True)
        nir = rioxarray.open_rasterio(nir_band_url, masked=True)
        scl = rioxarray.open_rasterio(scl_mask_url, masked=True)
        scl_interp = scl.interp(
            x=red["x"], y=red["y"]
        )  # interpolate SCL to the same resolution as Red and NIR bands

        # mask out cloudy pixels using SCL (https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-2-msi/level-2a/algorithm-overview)
        # class 8: cloud medium probability
        # class 9: cloud high probability
        # class 10: thin cirrus
        red_cloud_masked = red.where((scl_interp != 8) & (scl_interp != 9) & (scl_interp != 10))
        nir_cloud_masked = nir.where((scl_interp != 8) & (scl_interp != 9) & (scl_interp != 10))

        ndvi = (nir_cloud_masked - red_cloud_masked) / (nir_cloud_masked + red_cloud_masked)
        # save the ndvi as geotiff
        s2_tile_id = red_band_url.split("/")[-2]
        file_name = f"{s2_tile_id}_ndvi.tif"
        output_file_path = f"{output_path}/{file_name}"
        ndvi.rio.to_raster(output_file_path)
        print("Written output: {}".format(output_file_path))

        # keep memory usage low
        del red
        del nir
        del scl
        del scl_interp
        del red_cloud_masked
        del nir_cloud_masked
        del ndvi

        gc.collect()

Now we have the compute logic defined, we’re ready to start the geospatial SageMaker Processing job. This involves a straightforward three-step process: setting up the compute cluster, defining the computation specifics, and organizing the input and output details.

First, to set up the cluster, we decide on the number and type of instances required for the job, making sure they’re well-suited for geospatial data processing. The compute environment itself is prepared by selecting a geospatial image that comes with all commonly used packages for processing geospatial data.

Next, for the input, we use the previously created manifest that lists all image hyperlinks. We also designate an S3 location to save our results.

With these elements configured, we’re able to initiate multiple processing jobs at once, allowing them to operate concurrently for efficiency.

from multiprocessing import Process
import sagemaker
import boto3 
from botocore.config import Config
from sagemaker import get_execution_role
from sagemaker.sklearn.processing import ScriptProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput

role = get_execution_role()
geospatial_image_uri = '081189585635.dkr.ecr.us-west-2.amazonaws.com/sagemaker-geospatial-v1-0:latest'
# use the retry behaviour of boto3 to avoid throttling issue
sm_boto = boto3.client('sagemaker', config=Config(connect_timeout=5, read_timeout=60, retries={'max_attempts': 20}))
sagemaker_session = sagemaker.Session(sagemaker_client = sm_boto)

def run_job(job_idx):
    s3_manifest = f"s3://{s3_bucket_name}/{s3_prefix}/{manifest_dir}/manifest{job_idx}.json"
    s3_output = f"s3://{s3_bucket_name}/{s3_prefix}/output"
    script_processor = ScriptProcessor(
        command=['python3'],
        image_uri=geospatial_image_uri,
        role=role,
        instance_count=num_instances_per_job,
        instance_type='ml.m5.xlarge',
        base_job_name=f'ca-s2-nvdi-{job_idx}',
        sagemaker_session=sagemaker_session,
    )

    script_processor.run(
        code='scripts/compute_vi.py',
        inputs=[
            ProcessingInput(
                source=s3_manifest,
                destination='/opt/ml/processing/input/',
                s3_data_type='ManifestFile',
                s3_data_distribution_type="ShardedByS3Key"
            ),
        ],
        outputs=[
            ProcessingOutput(
                source='/opt/ml/processing/output/',
                destination=s3_output,
                s3_upload_mode='Continuous'
            )
        ],
    )
    time.sleep(2)

processes = []
for idx in range(num_jobs):
    p = Process(target=run_job, args=(idx,))
    processes.append(p)
    p.start()
    
for p in processes:
    p.join()

After you launch the job, SageMaker automatically spins up the required instances and configures the cluster to process the images listed in your input manifest. This entire setup operates seamlessly, without needing your hands-on management. To monitor and manage the processing jobs, you can use the SageMaker console. It offers real-time updates on the status and completion of your processing tasks. In our example, it took under 20 minutes to process all 8,581 images with 500 instances. The scalability of SageMaker allows for faster processing times if needed, simply by increasing the number of instances.

Sagemaker processing job portal

Conclusion

The power and efficiency of SageMaker geospatial capabilities have opened new doors for environmental monitoring, particularly in the realm of vegetation mapping. Through this example, we showcased how to process over 8,500 satellite images in less than 20 minutes. We not only demonstrated the technical feasibility, but also showcased the efficiency gains from using the cloud for environmental analysis. This approach illustrates a significant leap from traditional, resource-intensive methods to a more agile, scalable, and cost-effective approach. The flexibility to scale processing resources up or down as needed, combined with the ease of accessing and analyzing vast datasets, positions SageMaker as a transformative tool in the field of geospatial analysis. By simplifying the complexities associated with large-scale data processing, SageMaker enables scientists, researchers, and businesses stakeholders to focus more on deriving insights and less on infrastructure and data management.

As we look to the future, the integration of ML and geospatial analytics promises to further enhance our understanding of the planet’s ecological systems. The potential to monitor changes in real time, predict future trends, and respond with more informed decisions can significantly contribute to global conservation efforts. This example of vegetation mapping is just the beginning for running planetary-scale ML. See Amazon SageMaker geospatial capabilities to learn more.


About the Author

Xiong Zhou is a Senior Applied Scientist at AWS. He leads the science team for Amazon SageMaker geospatial capabilities. His current area of research includes LLM evaluation and data generation. In his spare time, he enjoys running, playing basketball and spending time with his family.

Anirudh Viswanathan is a Sr Product Manager, Technical – External Services with the SageMaker geospatial ML team. He holds a Masters in Robotics from Carnegie Mellon University, an MBA from the Wharton School of Business, and is named inventor on over 40 patents. He enjoys long-distance running, visiting art galleries and Broadway shows.

Janosch Woschitz is a Senior Solutions Architect at AWS, specializing in AI/ML. With over 15 years of experience, he supports customers globally in leveraging AI and ML for innovative solutions and building ML platforms on AWS. His expertise spans machine learning, data engineering, and scalable distributed systems, augmented by a strong background in software engineering and industry expertise in domains such as autonomous driving.

Li Erran Li is the applied science manager at humain-in-the-loop services, AWS AI, Amazon. His research interests are 3D deep learning, and vision and language representation learning. Previously he was a senior scientist at Alexa AI, the head of machine learning at Scale AI and the chief scientist at Pony.ai. Before that, he was with the perception team at Uber ATG and the machine learning platform team at Uber working on machine learning for autonomous driving, machine learning systems and strategic initiatives of AI. He started his career at Bell Labs and was adjunct professor at Columbia University. He co-taught tutorials at ICML’17 and ICCV’19, and co-organized several workshops at NeurIPS, ICML, CVPR, ICCV on machine learning for autonomous driving, 3D vision and robotics, machine learning systems and adversarial machine learning. He has a PhD in computer science at Cornell University. He is an ACM Fellow and IEEE Fellow.

Amit Modi is the product leader for SageMaker MLOps, ML Governance, and Responsible AI at AWS. With over a decade of B2B experience, he builds scalable products and teams that drive innovation and deliver value to customers globally.

Kris Efland is a visionary technology leader with a successful track record in driving product innovation and growth for over 20 years. Kris has helped create new products including consumer electronics and enterprise software across many industries, at both startups and large companies. In his current role at Amazon Web Services (AWS), Kris leads the Geospatial AI/ML category. He works at the forefront of Amazon’s fastest-growing ML service, Amazon SageMaker, which serves over 100,000 customers worldwide. He recently led the launch of Amazon SageMaker’s new geospatial capabilities, a powerful set of tools that allow data scientists and machine learning engineers to build, train, and deploy ML models using satellite imagery, maps, and location data. Before joining AWS, Kris was the Head of Autonomous Vehicle (AV) Tools and AV Maps for Lyft, where he led the company’s autonomous mapping efforts and toolchain used to build and operate Lyft’s fleet of autonomous vehicles. He also served as the Director of Engineering at HERE Technologies and Nokia and has co-founded several startups..

Read More

Unlocking insights and enhancing customer service: Intact’s transformative AI journey with AWS

Unlocking insights and enhancing customer service: Intact’s transformative AI journey with AWS

Intact Financial Corporation is the leading provider of property and casualty insurance in Canada, a leading provider of global specialty insurance, and a leader in commercial lines in the UK and Ireland. Intact faced a challenge in managing its vast network of customer support call centers and required a workable solution within 6 months and long-term solution within 1 year. With up to 20,000 calls per day, the manual auditing process was inefficient and struggled to keep up with increasing call traffic and rising customer service expectations. Quality control agents had to manually pick calls to audit, which was not a scalable solution. To address this, Intact turned to AI and speech-to-text technology to unlock insights from calls and improve customer service. The company developed an automated solution called Call Quality (CQ) using AI services from Amazon Web Services (AWS). The implementation of CQ allowed Intact to handle 1,500% more calls (15 times more calls per auditor), reduce agent handling time by 10%, and generate valuable insights about agent behavior, leading to improved customer service.

Amazon Transcribe is a fully managed automatic speech recognition (ASR) service that helps developers add speech-to-text capabilities to applications. It uses deep learning to convert audio to text quickly and accurately. In this post, we demonstrate how the CQ solution used Amazon Transcribe and other AWS services to improve critical KPIs with AI-powered contact center call auditing and analytics.

This allowed Intact to transcribe customer calls accurately, train custom language models, simplify the call auditing process, and extract valuable customer insights more efficiently.

Solution overview

Intact aimed to develop a cost-effective and efficient call analytics platform for their contact centers by using speech-to-text and machine learning technologies. The goal was to refine customer service scripts, provide coaching opportunities for agents, and improve call handling processes. By doing so, Intact hoped to improve agent efficiency, identify business opportunities, and analyze customer satisfaction, potential product issues, and training gaps. The following figure shows the architecture of the solution, which is described in the following sections.

Intact selected Amazon Transcribe as their speech-to-text AI solution for its accuracy in handling both English and Canadian French. This was a key factor in Intact’s decision, because the company sought a versatile platform capable of adapting to their diverse business needs. Amazon Transcribe offers deep learning capabilities, which can handle a wide range of speech and acoustic characteristics, in addition to its scalability to process anywhere from a few hundred to over tens of thousands of calls daily, also played a pivotal role. Additionally, Intact was impressed that Amazon Transcribe could adapt to various post-call analytics use cases across their organization.

Call processing and model serving

Intact has on-premises contact centers and cloud contact centers, so they built a call acquisition process to ingest calls from both sources. The architecture incorporates a fully automated workflow, powered by Amazon EventBridge, which triggers an AWS Step Functions workflow when an audio file is uploaded to a designated Amazon Simple Storage Service (Amazon S3) bucket. This serverless processing pipeline is built around Amazon Transcribe, which processes the call recordings and converts them from speech to text. Notifications of processed transcriptions are sent to an Amazon Simple Queue Service (Amazon SQS) queue, which aids in decoupling the architecture and resuming the Step Functions state machine workflow. AWS Lambda is used in this architecture as a transcription processor to store the processed transcriptions into an Amazon OpenSearch Service table.

The call processing workflow uses custom machine learning (ML) models built by Intact that run on Amazon Fargate and Amazon Elastic Compute Cloud (Amazon EC2). The transcriptions in OpenSearch are then further enriched with these custom ML models to perform components identification and provide valuable insights such as named entity recognition, speaker role identification, sentiment analysis, and personally identifiable information (PII) redaction. Regular improvements on existing and new models added valuable insights to be extracted such as reason for call, script adherence, call outcome, and sentiment analysis across various business departments from claims to personal lines. Amazon DynamoDB is used in this architecture to control the limits of the queues. The call transcriptions are then compressed from WAV to an MP3 format to optimize storage costs on Amazon S3.

Machine learning operations (MLOps)

Intact also built an automated MLOps pipeline that use Step Functions, Lambda, and Amazon S3. This pipeline provides self-serving capabilities for data scientists to track ML experiments and push new models to an S3 bucket. It offers flexibility for data scientists to conduct shadow deployments and capacity planning, enabling them to seamlessly switch between models for both production and experimentation purposes. Additionally, the application offers backend dashboards tailored to MLOps functionalities, ensuring smooth monitoring and optimization of machine learning models.

Frontend and API

The CQ application offers a robust search interface specially crafted for call quality agents, equipping them with powerful auditing capabilities for call analysis. The application’s backend is powered by Amazon OpenSearch Service for the search functionality. The application also uses Amazon Cognito to provide single sign-on for secure access. Lastly, Lambda functions are used for orchestration to fetch dynamic content from OpenSearch.

The application offers trend dashboards customized to deliver actionable business insights, aiding in identifying key areas where agents allocate their time. Using data from sources like Amazon S3 and Snowflake, Intact builds comprehensive business intelligence dashboards showcasing key performance metrics such as periods of silence and call handle time. This capability enables call quality agents to delve deeper into call components, facilitating targeted agent coaching opportunities.

Call Quality Trend Dashboard

The following figure is an example of the Call Quality Trend Dashboard, showing the information available to agents. This includes the ability to filter on multiple criteria including Dates and Languages, Average Handle Time per Components and Unit Managers, and Speech time vs. Silence Time.

Results

The implementation of the new system has led to a significant increase in efficiency and productivity. There has been a 1,500% increase in auditing speed and a 1,500% increase in the number of calls reviewed. Additionally, by building the MLOps on AWS alongside the CQ solution, the team has reduced the delivery of new ML models for providing analytics from days to mere hours, making auditors 65% more efficient. This has also resulted in a 10% reduction in agents’ time per call and a 10% reduction of average hold time as they receive targeted coaching to improve their customer conversations. This efficiency has allowed for more effective use of auditors’ time in devising coaching strategies, improving scripts, and agent training.

Additionally, the solution has provided intangible benefits such as extremely high availability with no major downtime since 2020 and high-cost predictability. The solution’s modular design has also led to robust deployments, which significantly reduced the time for new releases to less than an hour. This has also contributed to a near-zero failure rate during deployment.

Conclusion

In conclusion, Intact Financial Corporation’s implementation of the CQ, powered by AWS AI services has revolutionized their customer service approach. This case study serves as a testament to the transformative power of AI and speech-to-text technology in enhancing customer service efficiency and effectiveness. The solution’s design and capabilities position Intact well to use generative AI for future transcription projects. As next steps, Intact plans to further use this technology by processing calls using Amazon Transcribe streaming for real-time transcription and deploying a virtual agent to provide human agents with relevant information and recommended responses.

The journey of Intact Financial Corporation is one example of how embracing AI can lead to significant improvements in service delivery and customer satisfaction. For customers looking to quickly get started on their call analytics journey, explore Amazon Transcribe Call Analytics for live call analytics and agent assist and post call analytics.


About the Authors

Étienne Brouillard is an AWS AI Principal Architect at Intact Financial Corporation, Canada’s largest provider of property and casualty insurance.

Ami Dani is a Senior Technical Program Manager at AWS focusing on AI/ML services. During her career, she has focused on delivering transformative software development projects for the federal government and large companies in industries as diverse as advertising, entertainment, and finance. Ami has experience driving business growth, implementing innovative training programs and successfully managing complex, high-impact projects.

Prabir Sekhri is a Senior Solutions Architect at AWS in the enterprise financial services sector. During his career, he has focused on digital transformation projects within large companies in industries as diverse as finance, multimedia, telecommunications as well as the energy and gas sectors. His background includes DevOps, security, and designing and architecting enterprise storage solutions. Besides technology, Prabir has always been passionate about playing music. He leads a jazz ensemble in Montreal as a pianist, composer and arranger.

Read More

‘We Would Like to Achieve Superhuman Productivity,’ NVIDIA CEO Says as Lenovo Brings Smarter AI to Enterprises

‘We Would Like to Achieve Superhuman Productivity,’ NVIDIA CEO Says as Lenovo Brings Smarter AI to Enterprises

Moving to accelerate enterprise AI innovation, NVIDIA founder and CEO Jensen Huang joined Lenovo CEO Yuanqing Yang on stage Tuesday during the keynote at Lenovo Tech World 2024.

Together, they introduced the Lenovo Hybrid AI Advantage with NVIDIA, a full-stack platform for building and deploying AI capabilities across the enterprise that drive speed, innovation and productivity.

“We would like to achieve essentially superhuman productivity,” Huang told a crowd gathered in-person and online for Lenovo’s Seattle event. “And these AI agents are helping employees across industries to be more efficient and productive.”

They also unveiled a new high-performance AI server featuring Lenovo’s Neptune liquid-cooling technology and NVIDIA Blackwell, marking a leap forward in sustainability and energy efficiency for AI systems.

“This is going to be the largest of industrial revolutions we’ve ever seen,” Huang noted, highlighting the profound impact AI is having on industries worldwide. “And we’re seeing, in the last 12 months or so, just an extraordinary awakening in every single industry, every single company, every single country.”

Lenovo Unveils Hybrid AI Advantage With NVIDIA

The Lenovo Hybrid AI Advantage with NVIDIA is built on Lenovo’s services and infrastructure capabilities with NVIDIA AI software and accelerated computing. It enables organizations to create agentic AI and physical AI that transform data into actionable business outcomes more efficiently.

“Our strategy is to combine modularization with customization so that we can respond quickly to customer needs while tailoring our solutions for them,” Yang said.

Introducing Lenovo AI Fast Start and Hybrid AI Solutions

As part of the Lenovo Hybrid AI Advantage, Lenovo has introduced Lenovo AI Fast Start, a service designed to help organizations rapidly build generative AI solutions.

Leveraging the NVIDIA AI Enterprise software platform, which includes NVIDIA NIM microservices and NVIDIA NeMo for building AI agents, Lenovo AI Fast Start enables customers to prove the business value of AI use cases across personal, enterprise, and public AI platforms within weeks.

By giving organizations access to AI assets, experts, and partners, the service helps tailor solutions to meet the needs of each business, speeding up deployment at scale.This platform also includes the Lenovo AI Service Library and uses NVIDIA AI Enterprise software, including NVIDIA NIM, NVIDIA NeMo and NVIDIA NIM Agent Blueprints for agentic AI, as well as support for NVIDIA Omniverse for physical AI.

The AI Service Library offers a collection of preconfigured AI solutions that can be customized for different needs.

When these offerings are combined with NIM Agent Blueprints, businesses can rapidly develop and deploy AI agents tailored to their specific needs, accelerating AI adoption across industries.

With the addition of NeMo for large language model optimization and Omniverse for digital twin simulations, enterprises can use cutting-edge AI technologies for both agentic and physical AI applications.

Energy Efficiency and AI Infrastructure

Yang and Huang emphasized the critical need for energy-efficient AI infrastructure.

“Speed is sustainability. Speed is performance. Speed is energy efficiency,” Huang said, stressing how performance improvements directly contribute to reducing energy consumption and increasing efficiency.

“Lenovo’s 6th Generation Neptune Liquid Cooling solution supports AI computing and high-performance computing while delivering better energy efficiency,” Yang said.

By reducing data center power consumption by up to 40%, Neptune allows businesses to efficiently run accelerated AI workloads while lowering operational costs and environmental impact.

In line with this, Lenovo’s TruScale infrastructure services offer a scalable cloud-based model that gives organizations access to AI computing power without the need for large upfront investments in physical infrastructure, ensuring businesses can scale deployments as needed.

Introducing Lenovo ThinkSystem SC777 V4 Neptune With NVIDIA Blackwell

The CEOs revealed the ThinkSystem SC777 V4 Neptune server, featuring NVIDIA GB200 Grace Blackwell.

This 100% liquid-cooled system requires no fans or specialized data center air conditioning. It fits into a standard rack and runs on standard power.

“To an engineer, this is sexy,” Huang said, referring to the ThinkSystem SC777 V4 Neptune server he and Yang had just unveiled.

The SC777 includes next-gen NVIDIA NVLink interconnect, supporting NVIDIA Quantum-2 InfiniBand or Spectrum-X Ethernet networking. It also supports NVIDIA AI Enterprise software with NIM microservices.

“Our partnership spans from infrastructure to software and to service level,” Yang said. “Together, we deploy enterprise AI agents to our customers.”

Read More

Scalable Private Search with Wally

This paper presents Wally, a private search system that supports efficient semantic and keyword search queries against
large databases. When sufficiently many clients are making
queries, Wally’s performance is significantly better than previous systems. In previous private search systems, for each
client query, the server must perform at least one expensive
cryptographic operation per database entry. As a result, performance degraded proportionally with the number of entries
in the database.
In Wally, we remove this limitation. Specifically, for each
query the server performs cryptographic…Apple Machine Learning Research

Accelerate migration portfolio assessment using Amazon Bedrock

Accelerate migration portfolio assessment using Amazon Bedrock

Conducting assessments on application portfolios that need to be migrated to the cloud can be a lengthy endeavor. Despite the existence of AWS Application Discovery Service or the presence of some form of configuration management database (CMDB), customers still face many challenges. These include time taken for follow-up discussions with application teams to review outputs and understand dependencies (approximately 2 hours per application), cycles needed to generate a cloud architecture design that meets security and compliance requirements, and the effort needed to provide cost estimates by selecting the right AWS services and configurations for optimal application performance in the cloud. Typically, it takes 6–8 weeks to carry out these tasks before actual application migrations begin.

In this blog post, we will harness the power of generative AI and Amazon Bedrock to help organizations simplify, accelerate, and scale migration assessments. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. By using Amazon Bedrock Agents, action groups, and Amazon Bedrock Knowledge Bases, we demonstrate how to build a migration assistant application that rapidly generates migration plans, R-dispositions, and cost estimates for applications migrating to AWS. This approach enables you to scale your application portfolio discovery and significantly accelerate your planning phase.

General requirements for a migration assistant

The following are some key requirements that you should consider when building a migration assistant.

Accuracy and consistency

Is your migration assistant application able to render accurate and consistent responses?

Guidance: To ensure accurate and consistent responses from your migration assistant, implement Amazon Bedrock Knowledge Bases. The knowledge base should contain contextual information based on your company’s private data sources. This enables the migration assistant to use Retrieval-Augmented Generation (RAG), which enhances the accuracy and consistency of responses. Your knowledge base should comprise multiple data sources, including:

Handle hallucinations

How are you reducing the hallucinations from the large language model (LLM) for your migration assistant application?

Guidance: Reducing hallucinations in LLMs involves implementation of several key strategies. Implement customized prompts based on your requirements and incorporate advanced prompting techniques to guide the model’s reasoning and provide examples for more accurate responses. These techniques include chain-of-thought prompting, zero-shot prompting, multishot prompting, few-shot prompting, and model-specific prompt engineering guidelines (see Anthropic Claude on Amazon Bedrock prompt engineering guidelines). RAG combines information retrieval with generative capabilities to enhance contextual relevance and reduce hallucinations. Finally, a feedback loop or human-in-the-loop when fine-tuning LLMs on specific datasets will help align the responses with accurate and relevant information, mitigating errors and outdated content.

Modular design

Is the design of your migration assistant modular?

Guidance: Building a migration assistant application using Amazon Bedrock action groups, which have a modular design, offers three key benefits.

  • Customization and adaptability: Action groups allow users to customize migration workflows to suit specific AWS environments and requirements. For instance, if a user is migrating a web application to AWS, they can customize the migration workflow to include specific actions tailored to web server setup, database migration, and network configuration. This customization ensures that the migration process aligns with the unique needs of the application being migrated.
  • Maintenance and troubleshooting: Simplifies maintenance and troubleshooting tasks by isolating issues to individual components. For example, if there’s an issue with the database migration action within the migration workflow, it can be addressed independently without affecting other components. This isolation streamlines the troubleshooting process and minimizes the impact on the overall migration operation, ensuring a smoother migration and faster resolution of issues.
  • Scalability and reusability: Promote scalability and reusability across different AWS migration projects. For instance, if a user successfully migrates an application to AWS using a set of modular action groups, they can reuse those same action groups to migrate other applications with similar requirements. This reusability saves time and effort when developing new migration workflows and ensures consistency across multiple migration projects. Additionally, modular design facilitates scalability by allowing users to scale the migration operation up or down based on workload demands. For example, if they need to migrate a larger application with higher resource requirements, they can easily scale up the migration workflow by adding more instances of relevant action groups, without needing to redesign the entire workflow from scratch.

Overview of solution

Before we dive deep into the deployment, let’s walk through the key steps of the architecture that will be established, as shown in Figure 1.

  1. Users interact with the migration assistant through the Amazon Bedrock chat console to input their requests. For example, a user might request to Generate R-disposition with cost estimates or Generate Migration plan for specific application IDs (for example, A1-CRM or A2-CMDB).
  2. The migration assistant, which uses Amazon Bedrock agents, is configured with instructions, action groups, and knowledge bases. When processing the user’s request, the migration assistant invokes relevant action groups such as R Dispositions and Migration Plan, which in turn invoke specific AWS Lambda
  3. The Lambda functions process the request using RAG to produce the required output.
  4. The resulting output documents (R-Dispositions with cost estimates and Migration Plan) are then uploaded to a designated Amazon Simple Storage Service (Amazon S3)

The following image is a screenshot of a sample user interaction with the migration assistant.

Prerequisites

You should have the following:

Deployment steps

  1. Configure a knowledge base:
    • Open the AWS Management Console for Amazon Bedrock and navigate to Amazon Bedrock Knowledge Bases.
    • Choose Create knowledge base and enter a name and optional description.
    • Select the vector database (for example, Amazon OpenSearch Serverless).
    • Select the embedding model (for example, Amazon Titan Embedding G1 – Text).
    • Add data sources:
      • For Amazon S3: Specify the S3 bucket and prefix, file types, and chunking configuration.
      • For custom data: Use the API to ingest data programmatically.
    • Review and create the knowledge base.
  2. Set up Amazon Bedrock Agents:
    • In the Amazon Bedrock console, go to the Agents section and chose Create agent.
    • Enter a name and optional description for the agent.
    • Select the foundation model (for example, Anthropic Claude V3).
    • Configure the agent’s AWS Identity and Access Management (IAM) role to grant necessary permissions.
    • Add instructions to guide the agent’s behavior.
    • Optionally, add the previously created Amazon Bedrock Knowledge Base to enhance the agent’s responses.
    • Configure additional settings such as maximum tokens and temperature.
    • Review and create the agent.
  3. Configure actions groups for the agent:
    • On the agent’s configuration page, navigate to the Action groups
    • Choose Add action group for each required group (for example, Create R-disposition Assessment and Create Migration Plan).
    • For each action group:
    • After adding all action groups, review the entire agent configuration and deploy the agent.

Clean up

To avoid unnecessary charges, delete the resources created during testing. Use the following steps to clean up the resources:

  1. Delete the Amazon Bedrock knowledge base: Open the Amazon Bedrock console.
    Delete the knowledge base from any agents that it’s associated with.

    • From the left navigation pane, choose Agents.
    • Select the Name of the agent that you want to delete the knowledge base from.
    • A red banner appears to warn you to delete the reference to the knowledge base, which no longer exists, from the agent.
    • Select the radio button next to the knowledge base that you want to remove. Choose More and then choose Delete.
    • From the left navigation pane, choose Knowledge base.
    • To delete a source, either choose the radio button next to the source and select Delete or select the Name of the source and then choose Delete in the top right corner of the details page.
    • Review the warnings for deleting a knowledge base. If you accept these conditions, enter delete in the input box and choose Delete to confirm.
  2. Delete the Agent
    • In the Amazon Bedrock console, choose Agents from the left navigation pane.
    • Select the radio button next to the agent to delete.
    • A modal appears warning you about the consequences of deletion. Enter delete in the input box and choose Delete to confirm.
    • A blue banner appears to inform you that the agent is being deleted. When deletion is complete, a green success banner appears.
  3. Delete all the other resources including the Lambda functions and any AWS services used for account customization.

Conclusion

Conducting assessments on application portfolios for AWS cloud migration can be a time-consuming process, involving analyzing data from various sources, discovery and design discussions to develop an AWS Cloud architecture design, and cost estimates.

In this blog post, we demonstrated how you can simplify, accelerate, and scale migration assessments by using generative AI and Amazon Bedrock. We showcased using Amazon Bedrock Agents, action groups, and Amazon Bedrock Knowledge Bases for a migration assistant application that renders migration plans, R-dispositions, and cost estimates. This approach significantly reduces the time and effort required for portfolio assessments, helping organizations to scale and expedite their journey to the AWS Cloud.

Ready to improve your cloud migration process with generative AI in Amazon Bedrock? Begin by exploring the Amazon Bedrock User Guide to understand how it can streamline your organization’s cloud journey. For further assistance and expertise, consider using AWS Professional Services (contact sales) to help you streamline your cloud migration journey and maximize the benefits of Amazon Bedrock.


About the Authors

Ebbey Thomas is a Senior Cloud Architect at AWS, with a strong focus on leveraging generative AI to enhance cloud infrastructure automation and accelerate migrations. In his role at AWS Professional Services, Ebbey designs and implements solutions that improve cloud adoption speed and efficiency while ensuring secure and scalable operations for AWS users. He is known for solving complex cloud challenges and driving tangible results for clients. Ebbey holds a BS in Computer Engineering and an MS in Information Systems from Syracuse University.

Shiva Vaidyanathan is a Principal Cloud Architect at AWS. He provides technical guidance, design and lead implementation projects to customers ensuring their success on AWS. He works towards making cloud networking simpler for everyone. Prior to joining AWS, he has worked on several NSF funded research initiatives on performing secure computing in public cloud infrastructures. He holds a MS in Computer Science from Rutgers University and a MS in Electrical Engineering from New York University.

Read More

Improve public speaking skills using a generative AI-based virtual assistant with Amazon Bedrock

Improve public speaking skills using a generative AI-based virtual assistant with Amazon Bedrock

Public speaking is a critical skill in today’s world, whether it’s for professional presentations, academic settings, or personal growth. By practicing it regularly, individuals can build confidence, manage anxiety in a healthy way, and develop effective communication skills leading to successful public speaking engagements. Now, with the advent of large language models (LLMs), you can use generative AI-powered virtual assistants to provide real-time analysis of speech, identification of areas for improvement, and suggestions for enhancing speech delivery.

In this post, we present an Amazon Bedrock powered virtual assistant that can transcribe presentation audio and examine it for language use, grammatical errors, filler words, and repetition of words and sentences to provide recommendations as well as suggest a curated version of the speech to elevate the presentation. This solution helps refine communication skills and empower individuals to become more effective and impactful public speakers. Organizations across various sectors, including corporations, educational institutions, government entities, and social media personalities, can use this solution to provide automated coaching for their employees, students, and public speaking engagements.

In the following sections, we walk you through constructing a scalable, serverless, end-to-end Public Speaking Mentor AI Assistant with Amazon Bedrock, Amazon Transcribe, and AWS Step Functions using provided sample code. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Overview of solution

The solution consists of four main components:

  • An Amazon Cognito user pool for user authentication. Authenticated users are granted access to the Public Speaking Mentor AI Assistant web portal to upload audio and video recordings.
  • A simple web portal created using Streamlit to upload audio and video recordings. The uploaded files are stored in an Amazon Simple Storage Service (Amazon S3) bucket for later processing, retrieval, and analysis.
  • A Step Functions standard workflow to orchestrate converting the audio to text using Amazon Transcribe and then invoking Amazon Bedrock with AI prompt chaining to generate speech recommendations and rewrite suggestions.
  • Amazon Simple Notification Service (Amazon SNS) to send an email notification to the user with Amazon Bedrock generated recommendations.

This solution uses Amazon Transcribe for speech-to-text conversion. When an audio or video file is uploaded, Amazon Transcribe transcribes the speech into text. This text is passed as an input to Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock. The solution sends two prompts to Amazon Bedrock: one to generate feedback and recommendations on language usage, grammar, filler words, repetition, and more, and another to obtain a curated version of the original speech. Prompt chaining is performed with Amazon Bedrock for these prompts. The solution then consolidates the outputs, displays recommendations on the user’s webpage, and emails the results.

The generative AI capabilities of Amazon Bedrock efficiently process user speech inputs. It uses natural language processing to analyze the speech and provides tailored recommendations. Using LLMs trained on extensive data, Amazon Bedrock generates curated speech outputs to enhance the presentation delivery.

The following diagram shows our solution architecture.

Scope of solution

Let’s explore the architecture step by step:

  1. The user authenticates to the Public Speaking Mentor AI Assistant web portal (a Streamlit application hosted on user’s local desktop) using the Amazon Cognito user pool authentication mechanism.
  2. The user uploads an audio or video file to the web portal, which is stored in an S3 bucket encrypted using server-side encryption with Amazon S3 managed keys (SSE-S3).
  3. The S3 service triggers an s3:ObjectCreated event for each file that is saved to the bucket.
  4. Amazon EventBridge invokes the Step Functions state machine based on this event. Because the state machine execution could exceed 5 minutes, we use a standard workflow. Step Functions state machine logs are sent to Amazon CloudWatch for logging and troubleshooting purposes.
  5. The Step Functions workflow uses AWS SDK integrations to invoke Amazon Transcribe and initiates a StartTranscriptionJob, passing the S3 bucket, prefix path, and object name in the MediaFileUri The workflow waits for the transcription job to complete and saves the transcript in another S3 bucket prefix path.
  6. The Step Functions workflow uses the optimized integrations to invoke the Amazon Bedrock InvokeModel API, which specifies the Anthropic Claude 3.5 Sonnet model, the system prompt, maximum tokens, and the transcribed speech text as inputs to the API. The system prompt instructs the Anthropic Claude 3.5 Sonnet model to provide suggestions on how to improve the speech by identifying incorrect grammar, repetitions of words or content, use of filler words, and other recommendations.
  7. After receiving a response from Amazon Bedrock, the Step Functions workflow uses prompt chaining to craft another input for Amazon Bedrock, incorporating the previous transcribed speech and the model’s previous response, and requesting the model to provide suggestions for rewriting the speech.
  8. The workflow combines these outputs from Amazon Bedrock and crafts a message that is displayed on the logged-in user’s webpage.
  9. The Step Functions workflow invokes the Amazon SNS Publish optimized integration to send an email to the user with the Amazon Bedrock generated message.
  10. The Streamlit application queries Step Functions to display output results on the Amazon Cognito user’s webpage.

Prerequisites

For implementing the Public Speaking Mentor AI Assistant solution, you should have the following prerequisites:

  1. An AWS account with sufficient AWS Identity and Access Management (IAM) permissions for the following AWS services to deploy the solution and run the Streamlit application web portal:
    • Amazon Bedrock
    • AWS CloudFormation
    • Amazon CloudWatch
    • Amazon Cognito
    • Amazon EventBridge
    • Amazon Transcribe
    • Amazon SNS
    • Amazon S3
    • AWS Step Functions
  1. Model access enabled for Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock in your desired AWS Region.
  2. A local desktop environment with the AWS Command Line Interface (AWS CLI) installed, Python 3.8 or above, and the AWS Cloud Development Kit (AWS CDK) for Python and Git installed.
  3. The AWS CLI set up with necessary AWS credentials and desired Region.

Deploy the Public Speaking Mentor AI Assistant solution

Complete the following steps to deploy the Public Speaking Mentor AI Assistant AWS infrastructure:

  1. Clone the repository to your local desktop environment with the following command:
    git clone https://github.com/aws-samples/improve_public_speaking_skills_using_a_genai_based_virtual_assistant_with_amazon_bedrock.git

  2. Change to the app directory in the cloned repository:
    cd improve_public_speaking_skills_using_a_genai_based_virtual_assistant_with_amazon_bedrock/app

  3. Create a Python virtual environment:
    python3 -m venv .venv

  4. Activate your virtual environment:
    source .venv/bin/activate

  5. Install the required dependencies:
    pip install -r requirements.txt

  6. Optionally, synthesize the CloudFormation template using the AWS CDK:
    cdk synth

You may need to perform a one-time AWS CDK bootstrapping using the following command. See AWS CDK bootstrapping for more details.

cdk bootstrap aws://<ACCOUNT-NUMBER-1>/<REGION-1>
  1. Deploy the CloudFormation template in your AWS account and selected Region:
    cdk deploy

After the AWS CDK is deployed successfully, you can follow the steps in the next section to create an Amazon Cognito user.

Create an Amazon Cognito user for authentication

Complete the following steps to create a user in the Amazon Cognito user pool to access the web portal. The user created doesn’t need AWS permissions.

  1. Sign in to the AWS Management Console of your account and select the Region for your deployment.
  2. On the Amazon Cognito console, choose User pools in the navigation pane.
  3. Choose the user pool created by the CloudFormation template. (The user pool name should have the prefix PSMBUserPool followed by a string of random characters as one word.)
  4. Choose Create user.

Cognito Create User

  1. Enter a user name and password, then choose Create user.

Cognito User Information

Subscribe to an SNS topic for email notifications

Complete the following steps to subscribe to an SNS topic to receive speech recommendation email notifications:

  1. Sign in to the console of your account and select the Region for your deployment.
  2. On the Amazon SNS console, choose Topics in the navigation pane.
  3. Choose the topic created by the CloudFormation template. (The name of the topic should look like InfraStack-PublicSpeakingMentorAIAssistantTopic followed by a string of random characters as one word.)
  4. Choose Create subscription.

SNS Create Subscription

  1. For Protocol, choose Email.
  2. For Endpoint, enter your email address.
  3. Choose Create subscription.

SNS Subscription Information

Run the Streamlit application to access the web portal

Complete the following steps to run the Streamlit application to access the Public Speaking Mentor AI Assistant web portal:

  1. Change the directory to webapp inside the app directory:
    cd webapp

  2. Launch the Streamlit server on port 8080:
    streamlit run webapp.py --server.port 8080

  3. Make note of the Streamlit application URL for further use. Depending on your environment setup, you could choose one of the URLs out of three (Local, Network, or External) provided by Streamlit server’s running process.
  1. Make sure incoming traffic on port 8080 is allowed on your local machine to access the Streamlit application URL.

Use the Public Speaking Mentor AI Assistant

Complete the following steps to use the Public Speaking Mentor AI Assistant to improve your speech:

  1. Open the Streamlit application URL in your browser (Google Chrome, preferably) that you noted in the previous steps.
  2. Log in to the web portal using the Amazon Cognito user name and password created earlier for authentication.

Public Speaking Mentor AI Assistant Login Page

  1. Choose Browse files to locate and choose your recording.
  2. Choose Upload File to upload your file to an S3 bucket.

Public Speaking Mentor AI Assistant Upload File

As soon as the file upload finishes, the Public Speaking Mentor AI Assistant processes the audio transcription and prompt engineering steps to generate speech recommendations and rewrite results.

Public Speaking Mentor AI Assistant Processing

When the processing is complete, you can see the Speech Recommendations and Speech Rewrite sections on the webpage as well as in your email through Amazon SNS notifications.

On the right pane of the webpage, you can review the processing steps performed by the Public Speaking Mentor AI Assistant solution to get your speech results.

Public Speaking Mentor AI Assistant Results Page

Clean up

Complete the following steps to clean up your resources:

  1. Shut down your Streamlit application server process running in your environment using Ctrl+C.
  2. Change to the app directory in your repository.
  3. Destroy the resources created with AWS CloudFormation using the AWS CDK:
    cdk destroy

Optimize for functionality, accuracy, and cost

Let’s conduct an analysis of this proposed solution architecture to identify opportunities for functionality enhancements, accuracy improvements, and cost optimization.

Starting with prompt engineering, our approach involves analyzing users’ speech based on several criteria, such as language usage, grammatical errors, filler words, and repetition of words and sentences. Individuals and organizations have the flexibility to customize the prompt by including additional analysis parameters or adjusting existing ones to align with their requirements and company policies. Furthermore, you can set the inference parameters to control the response from the LLM deployed on Amazon Bedrock.

To create a lean architecture, we have primarily chosen serverless technologies, such as Amazon Bedrock for prompt engineering and natural language generation, Amazon Transcribe for speech-to-text conversion, Amazon S3 for storage, Step Functions for orchestration, EventBridge for scalable event handling to process audio files, and Amazon SNS for email notifications. Serverless technologies enable you to run the solution without provisioning or managing servers, allowing for automatic scaling and pay-per-use billing, which can lead to cost savings and increased agility.

For the web portal component, we are currently deploying the Streamlit application in a local desktop environment. Alternatively, you have the option to use Amazon S3 Website Hosting, which would further contribute to a serverless architecture.

To enhance the accuracy of audio-to-text translation, it’s recommended to record your presentation audio in a quiet environment, away from noise and distractions.

In cases where your media contains domain-specific or non-standard terms, such as brand names, acronyms, and technical words, Amazon Transcribe might not accurately capture these terms in your transcription output. To address transcription inaccuracies and customize your output for your specific use case, you can create custom vocabularies and custom language models.

At the time of writing, our solution analyzes only the audio component. Uploading audio files alone can optimize storage costs. You may consider converting your video files into audio using third-party tools prior to uploading them to the Public Speaking Mentor AI Assistant web portal.

Our solution currently uses the standard tier of Amazon S3. However, you have the option to choose the S3 One Zone-IA storage class for storing files that don’t require high availability. Additionally, configuring an Amazon S3 lifecycle policy can further help reduce costs.

You can configure Amazon SNS to send speech recommendations to other destinations, such as email, webhook, and Slack. Refer to Configure Amazon SNS to send messages for alerts to other destinations for more information.

To estimate the cost of implementing the solution, you can use the AWS Pricing Calculator. For larger workloads, additional volume discounts may be available. We recommend contacting AWS pricing specialists or your account manager for more detailed pricing information.

Security best practices

Security and compliance is a shared responsibility between AWS and the customer, as outlined in the Shared Responsibility Model. We encourage you to review this model for a comprehensive understanding of the respective responsibilities. Refer to Security in Amazon Bedrock and Build generative AI applications on Amazon Bedrock to learn more about building secure, compliant, and responsible generative AI applications on Amazon Bedrock. OWASP Top 10 For LLMs outlines the most common vulnerabilities. We encourage you to enable Amazon Bedrock Guardrails to implement safeguards for your generative AI applications based on your use cases and responsible AI policies.

With AWS, you manage the privacy controls of your data, control how your data is used, who has access to it, and how it is encrypted. Refer to Data Protection in Amazon Bedrock and Data Protection in Amazon Transcribe for more information. Similarly, we strongly recommend referring to the data protection guidelines for each AWS service used in our solution architecture. Furthermore, we advise applying the principle of least privilege when granting permissions, because this practice enhances the overall security of your implementation.

Conclusion

By harnessing the capabilities of LLMs in Amazon Bedrock, our Public Speaking Mentor AI Assistant offers a revolutionary approach to enhancing public speaking abilities. With its personalized feedback and constructive recommendations, individuals can develop effective communication skills in a supportive and non-judgmental environment.

Unlock your potential as a captivating public speaker. Embrace the power of our Public Speaking Mentor AI Assistant and embark on a transformative journey towards mastering the art of public speaking. Try out our solution today by cloning the GitHub repository and experience the difference our cutting-edge technology can make in your personal and professional growth.


About the Authors

Nehal Sangoi is a Sr. Technical Account Manager at Amazon Web Services. She provides strategic technical guidance to help independent software vendors plan and build solutions using AWS best practices. Connect with Nehal on LinkedIn.

Akshay Singhal is a Sr. Technical Account Manager at Amazon Web Services supporting Enterprise Support customers focusing on the Security ISV segment. He provides technical guidance for customers to implement AWS solutions, with expertise spanning serverless architectures and cost optimization. Outside of work, Akshay enjoys traveling, Formula 1, making short movies, and exploring new cuisines. Connect with him on LinkedIn.

Read More

Bria 2.3, Bria 2.2 HD, and Bria 2.3 Fast are now available in Amazon SageMaker JumpStart

Bria 2.3, Bria 2.2 HD, and Bria 2.3 Fast are now available in Amazon SageMaker JumpStart

This post is co-written with Bar Fingerman from Bria.

We are thrilled to announce that Bria 2.3, 2.2 HD, and 2.3 Fast text-to-image foundation models (FMs) from Bria AI are now available in Amazon SageMaker JumpStart. Bria models are trained exclusively on commercial-grade licensed data, providing high standards of safety and compliance with full legal indemnity.

These advanced models from Bria AI generate high-quality and contextually relevant visual content that is ready to use in marketing, design, and image generation use cases across industries from ecommerce, media and entertainment, and gaming to consumer-packaged goods and retail.

In this post, we discuss Bria’s family of models, explain the Amazon SageMaker platform, and walk through how to discover, deploy, and run inference on a Bria 2.3 model using SageMaker JumpStart.

Overview of Bria 2.3, Bria 2.2 HD, and Bria 2.3 Fast

Bria AI offers a family of high-quality visual content models. These advanced models represent the cutting edge of generative AI technology for image creation:

  • Bria 2.3 – The core model delivers high-quality visual content with exceptional photorealism and detail, capable of generating stunning images with complex concepts in various art styles, including photorealism.
  • Bria 2.2 HD – Optimized for high-definition, Bria 2.2 HD offers high-definition visual content that meets the demanding needs of high-resolution applications, making sure every detail is crisp and clear.
  • Bria 2.3 Fast – Optimized for speed, Bria 2.3 Fast generates high-quality visuals at a faster rate, perfect for applications requiring quick turnaround times without compromising on quality. Using the model on SageMaker g5 instance types gives fast latency and throughput (compared to Bria 2.3 and Bria 2.2 HD), and the p4d instance type provides twice the latency from the g5 instance.

Overview of SageMaker JumpStart

With SageMaker JumpStart, you can choose from a broad selection of publicly available FMs. ML practitioners can deploy FMs to dedicated SageMaker instances from a network-isolated environment and customize models using SageMaker for model training and deployment. You can now discover and deploy Bria models in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. Doing so enables you to derive model performance and machine learning operations (MLOps) controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs.

The model is deployed in an AWS secure environment and under your virtual private cloud (VPC) controls, helping provide data security. Bria models are available today for deployment and inferencing in SageMaker Studio in 22 AWS Regions where SageMaker JumpStart is available. Bria models will require g5 and p4 instances.

Prerequisites

To try out the Bria models using SageMaker JumpStart, you need the following prerequisites:

Discover Bria models in SageMaker JumpStart

You can access the FMs through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we show how to discover the models in SageMaker Studio.

SageMaker Studio is an IDE that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.

In SageMaker Studio, you can access SageMaker JumpStart by choosing JumpStart in the navigation pane or by choosing JumpStart on the Home page.

On the SageMaker JumpStart landing page, you can find pre-trained models from popular model hubs. You can search for Bria, and the search results will list all the Bria model variants available. For this post, we use the Bria 2.3 Commercial Text-to-image model.

You can choose the model card to view details about the model such as license, data used to train, and how to use the model. You also have two options, Deploy and Preview notebooks, to deploy the model and create an endpoint.

Subscribe to Bria models in AWS Marketplace

When you choose Deploy, if the model wasn’t already subscribed, you first have to subscribe before you can deploy the model. We demonstrate the subscription process for the Bria 2.3 Commercial Text-to-image model. You can repeat the same steps for subscribing to other Bria models.

After you choose Subscribe, you’re redirected to the model overview page, where you can read the model details, pricing, usage, and other information. Choose Continue to Subscribe and accept the offer on the following page to complete the subscription.

Configure and deploy Bria models using AWS Marketplace

The configuration page gives three different launch methods to choose from. For this post, we showcase how you can use SageMaker console:

  1. For Available launch method, select SageMaker console.
  2. For Region, choose your preferred Region.
  3. Choose View in Amazon SageMaker.
  4. For Model name, enter a name (for example, Model-Bria-v2-3).
  5. For IAM role, choose an existing IAM role or create a new role that has the SageMaker full access IAM policy attached.
  6. Choose Next.The recommended instance types for this model endpoint are ml.g5.2xlarge, ml.g5.12xlarge, ml.g5.48xlarge, ml.p4d.24xlarge, and ml.p4de.24xlarge. Make sure you have the account-level service limit for one or more of these instance types to deploy this model. For more information, refer to Requesting a quota increase.
  7. In the Variants section, select any of the recommended instance types provided by Bria (for example, ml.g5.2xlarge).
  8. Choose Create endpoint configuration.

    A success message should appear after the endpoint configuration is successfully created.
  9. Choose Next to create an endpoint.
  10. In the Create endpoint section, enter the endpoint name (for example, Endpoint-Bria-v2-3-Model) and choose Submit.After you successfully create the endpoint, it’s displayed on the SageMaker endpoints page on the SageMaker console.

Configure and deploy Bria models using SageMaker JumpStart

If the Bria models are already subscribed in AWS Marketplace, you can choose Deploy in the model card page to configure the endpoint.

On the endpoint configuration page, SageMaker pre-populates the endpoint name, recommended instance type, instance count, and other details for you. You can modify them based on your requirements and then choose Deploy to create an endpoint.

After you successfully create the endpoint, the status will show as In service.

Run inference in SageMaker Studio

You can test the endpoint by passing a sample inference request payload in SageMaker Studio, or you can use SageMaker notebook. In this section, we demonstrate using SageMaker Studio:

  1. In SageMaker Studio, in the navigation pane, choose Endpoints under Deployments.
  2. Choose the Bria endpoint you just created.
  3. On the Test inference tab, test the endpoint by sending a sample request.
    You can see the response on the same page, as shown in the following screenshot.

Text-to-image generation using a SageMaker notebook

You can also use a SageMaker notebook to run inference against the deployed endpoint using the SageMaker Python SDK.

The following code initiates the endpoint you created using SageMaker JumpStart:

from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

# Use the existing endpoint name
endpoint_name = "XXXXXXXX"  # Replace with your endpoint name

# Create a SageMaker predictor object
bria_predictor = Predictor(
    endpoint_name=endpoint_name,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer(),
)

bria_predictor.endpoint_name

The model responses are in base64 encoded format. The following function helps decode the base64 encoded image and displays it as an image:

import base64
from PIL import Image
import io

def display_base64_image(base64_string):
    image_bytes = base64.b64decode(base64_string)
    image_stream = io.BytesIO(image_bytes)
    image = Image.open(image_stream)

    # Display the image
    image.show()

The following is a sample payload with a text prompt to generate an image using the Bria model:

payload = {
  "prompt": "a baby riding a bicycle in a field of flowers",
  "num_results": 1,
  "sync": True
}

response = bria_predictor.predict(payload)
artifacts = response['artifacts'][0]

encoded_image = artifacts['image_base64']

display_base64_image(encoded_image)

Example prompts

You can interact with the Bria 2.3 text-to-image model like any standard image generation model, where the model processes an input sequence and outputs response. In this section, we provide some example prompts and sample output.

We use the following prompts:

  • Photography, dynamic, in the city, professional mail skateboarder, sunglasses, teal and orange hue
  • Young woman with flowing curly hair stands on a subway platform, illuminated by the vibrant lights of a speeding train, purple and cyan colors
  • Close up of vibrant blue and green parrot perched on a wooden branch inside a cozy, well-lit room
  • Light speed motion with blue and purple neon colors and building in the background

The model generates the following images.

The following is an example prompt for generating an image using the preceding text prompt:

payload = {
"prompt": "Photography, dynamic, in the city, professional mail skateboarder, sunglasses, teal and orange hue",
"num_results": 1,
"sync": True
}

response = bria_predictor.predict(payload)
artifacts = response['artifacts'][0]

encoded_image = artifacts['image_base64']

display_base64_image(encoded_image)

Clean up

After you’re done running the notebook, delete all resources that you created in the process so your billing is stopped. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

With the availability of Bria 2.3, 2.2 HD, and 2.3 Fast in SageMaker JumpStart and AWS Marketplace, enterprises can now use advanced generative AI capabilities to enhance their visual content creation processes. These models provide a balance of quality, speed, and compliance, making them an invaluable asset for any organization looking to stay ahead in the competitive landscape.

Bria’s commitment to responsible AI and the robust security framework of SageMaker provide enterprises with the full package for data privacy, regulatory compliance, and responsible AI models for commercial use. In addition, the integrated experience takes advantage of the capabilities of both platforms to simplify MLOps, data storage, and real-time processing.

For more information about using FMs in SageMaker JumpStart, refer to Train, deploy, and evaluate pretrained models with SageMaker JumpStart, JumpStart Foundation Models, and Getting started with Amazon SageMaker JumpStart.

Explore Bria models in SageMaker JumpStart today and revolutionize your visual content creation process!


About the Authors

Bar FingermanBar Fingerman is the Head of AI/ML Engineering at Bria. He leads the development and optimization of core infrastructure, enabling the company to scale cutting-edge generative AI technologies. With a focus on designing high-performance supercomputers for large-scale AI training, Bar leads the engineering group in deploying, managing, and securing scalable AI/ML cloud solutions. He works closely with leadership and cross-functional teams to align business goals while driving innovation and cost-efficiency.

Supriya Puragundla is a Senior Solutions Architect at AWS. She has over 15 years of IT experience in software development, design, and architecture. She helps key customer accounts on their data, generative AI, and AI/ML journeys. She is passionate about data-driven AI and the area of depth in ML and generative AI.

Rodrigo Merino is a Generative AI Solutions Architect Manager at AWS. With over a decade of experience deploying emerging technologies, ranging from generative AI to IoT, Rodrigo guides customers across various industries to accelerate their AI/ML and generative AI journeys. He specializes in helping organizations train and build models on AWS, as well as operationalize end-to-end ML solutions. Rodrigo’s expertise lies in bridging the gap between cutting-edge technology and practical business applications, enabling companies to harness the full potential of AI and drive innovation in their respective fields.

Eliad Maimon is a Senior Startup Solutions Architect at AWS, focusing on generative AI startups. He helps startups accelerate and scale their AI/ML journeys by guiding them through deep-learning model training and deployment on AWS. With a passion for AI and entrepreneurship, Eliad is committed to driving innovation and growth in the startup ecosystem.

Read More

Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker

Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker

We’re excited to announce the release of SageMaker Core, a new Python SDK from Amazon SageMaker designed to offer an object-oriented approach for managing the machine learning (ML) lifecycle. This new SDK streamlines data processing, training, and inference and features resource chaining, intelligent defaults, and enhanced logging capabilities. With SageMaker Core, managing ML workloads on SageMaker becomes simpler and more efficient. The SageMaker Core SDK comes bundled as part of the SageMaker Python SDK version 2.231.0 and above.

In this post, we show how the SageMaker Core SDK simplifies the developer experience while providing API for seamlessly executing various steps in a general ML lifecycle. We also discuss the main benefits of using this SDK along with sharing relevant resources to learn more about this SDK.

Traditionally, developers have had two options when working with SageMaker: the  AWS SDK for Python, also known as boto3, or the SageMaker Python SDK. Although both provide comprehensive APIs for ML lifecycle management, they often rely on loosely typed constructs such as hard-coded constants and JSON dictionaries, mimicking a REST interface. For instance, to create a training job, Boto3 offers a create_training_job API, but retrieving job details requires the describe_training_job API.

While using boto3, developers face the challenge of remembering and crafting lengthy JSON dictionaries, ensuring that all keys are accurately placed. Let’s take a closer look at the create_training_job method from boto3:

response = client.create_training_job(
    TrainingJobName='string',
    HyperParameters={
        'string': 'string'
    },
    AlgorithmSpecification={
            .
            .
            .
    },
    RoleArn='string',
    InputDataConfig=[
        {
            .
            .
            .
        },
    ],
    OutputDataConfig={
            .
            .
            .
    },
    ResourceConfig={
            .
            .
            .    
    },
    VpcConfig={
            .
            .
            .
    },
    .
    .
    .
    .
# All arguments/fields are not shown for brevity purposes.

)

If we observe carefully, for arguments such as AlgorithmSpecification, InputDataConfig, OutputDataConfig, ResourceConfig, or VpcConfig, we need to write verbose JSON dictionaries. Because it contains many string variables in a long dictionary field, it’s very easy to have a typo somewhere or a missing key. There is no type checking possible, and as for the compiler, it’s just a string.
Similarly in SageMaker Python SDK, it requires us to create an estimator object and invoke the fit() method on it. Although these constructs work well, they aren’t intuitive to the developer experience. It’s hard for developers to map the meaning of an estimator to something that can be used to train a model.

Introducing SageMaker Core SDK

SageMaker Core SDK offers to solve this problem by replacing such long dictionaries with object-oriented interfaces, so developers can work with object-oriented abstractions, and SageMaker Core will take care of converting those objects to dictionaries and executing the actions on the developer’s behalf.

The following are the key features of SageMaker Core:

  • Object-oriented interface – It provides object-oriented classes for tasks such as processing, training, or deployment. Providing such interface can enforce strong type checking, make the code more manageable and promote reusability. Developers can benefit from all features of object-oriented programming.
  • Resource chaining – Developers can seamlessly pass SageMaker resources as objects by supplying them as arguments to different resources. For example, we can create a model object and pass that model object as an argument while setting up the endpoint. In contrast, while using Boto3, we need to supply ModelName as a string argument.
  • Abstraction of low-level details – It automatically handles resource state transitions and polling logics, freeing developers from managing these intricacies and allowing them to focus on higher value tasks.
  • Support for intelligent defaults – It supports SageMaker intelligent defaults, allowing developers to set default values for parameters such as AWS and Identity and Access Management (IAM) roles and virtual private cloud (VPC) configurations. This streamlines the setup process, and SageMaker Core API will pick the default settings automatically from the environment.
  • Auto code completion – It enhances the developer experience by offering real-time suggestions and completions in popular integrated development environments (IDEs), reducing chances of syntax errors and speeding up the coding process.
  • Full parity with SageMaker APIs, including generative AI – It provides access to the SageMaker capabilities, including generative AI, through the core SDK, so developers can seamlessly use SageMaker Core without worrying about feature parity with Boto3.
  • Comprehensive documentation and type hints – It provides robust and comprehensive documentation and type hints so developers can understand the functionalities of the APIs and objects, write code faster, and reduce errors.

For this walkthrough, we use a straightforward generative AI lifecycle involving data preparation, fine-tuning, and a deployment of Meta’s Llama-3-8B LLM. We use the SageMaker Core SDK to execute all the steps.

Prerequsities

To get started with SageMaker Core, make sure Python 3.8 or greater is installed in the environment. There are two ways to get started with SageMaker Core:

  1. If not using SageMaker Python SDK, install the sagemaker-core SDK using the following code example.
    %pip install sagemaker-core

  2. If you’re already using SageMaker Python SDK, upgrade it to a version greater than or matching version 2.231.0. Any version above 2.231.0 has SageMaker Core preinstalled. The following code example shows the command for upgrading the SageMaker Python SDK.
    %pip install –upgrade sagemaker>=2.231.0

Solution walkthrough

To manage your ML workloads on SageMaker using SageMaker Core, use the steps in the following sections.

Data preparation

In this phase, prepare the training and test data for the LLM. Here, use a publicly available dataset Stanford Question Answering Dataset (SQuAD). The following code creates a ProcessingJob object using the static method create, specifying the script path, instance type, and instance count. Intelligent default settings fetch the SageMaker execution role, which simplifies the developer experience further. You didn’t need to provide the input data location and output data location because that also is supplied through intelligent defaults. For information on how to set up intelligent defaults, check out Configuring and using defaults with the SageMaker Python SDK.

from sagemaker_core.resources import ProcessingJob

# Initialize a ProcessingJob resource
processing_job = ProcessingJob.create(
    processing_job_name="llm-data-prep",
    script_path="s3://my-bucket/data-prep-script.py",
    role_arn=<<Execution Role ARN>>, # Intelligent default for execution role
    instance_type="ml.m5.xlarge",
    instance_count=1
)

# Wait for the ProcessingJob to complete
processing_job.wait()

Training

In this step, you use the pre-trained Llama-3-8B model and fine-tune it on the prepared data from the previous step. The following code snippet shows the training API. You create a TrainingJob object using the create method, specifying the training script, source directory, instance type, instance count, output path, and hyper-parameters.

from sagemaker_core.resources import TrainingJob
from sagemaker_core.shapes import HyperParameters

# Initialize a TrainingJob resource
training_job = TrainingJob.create(
    training_job_name="llm-fine-tune",
    estimator_entry_point="train.py",
    source_dir="s3://my-bucket/training-code",
    instance_type="ml.g5.12xlarge",
    instance_count=1,
    output_path="s3://my-bucket/training-output",
    hyperparameters=HyperParameters(
        learning_rate=0.00001,
        batch_size=8,
        epochs=3
    ),
    role_arn==<<Execution Role ARN>>, # Intelligent default for execution role
    input_data=processing_job.output # Resource chaining
)

# Wait for the TrainingJob to complete
training_job.wait()

For hyperparameters, you create an object, instead of supplying a dictionary. Use resource chaining by passing the output of the ProcessingJob resource as the input data for the TrainingJob.

You also use the intelligent defaults to get the SageMaker execution role. Wait for the training job to finish, and it will produce a model artifact, wrapped in a tar.gz, and store it in the output_path provided in the preceding training API.

Model creation and deployment

Deploying a model on a SageMaker endpoint consists of three steps:

  1. Create a SageMaker model object
  2. Create the endpoint configuration
  3. Create the endpoint

SageMaker Core provides an object-oriented interface for all three steps.

  1. Create a SageMaker model object

The following code snippet shows the model creation experience in SageMaker Core.

from sagemaker_core.shapes import ContainerDefinition
from sagemaker_core.resources import Model

# Create a Model resource
model = Model.create(
    model_name="llm-model",
    primary_container=ContainerDefinition(
        image="763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-tensorrtllm0.11.0-cu124",
        environment={"HF_MODEL_ID": "meta-llama/Meta-Llama-3-8B"}
    ),
    execution_role_arn=<<Execution Role ARN>>, # Intelligent default for execution role
    input_data=training_job.output # Resource chaining
)

Similar to the processing and training steps, you have a create method from model class. The container definition is an object now, specifying the container definition that includes the large model inference (LMI) container image and the HuggingFace model ID. You can also observie resource chaining in action where you pass the output of the TrainingJob as input data to the model.

  1. Create the endpoint configuration

Create the endpoint configuration. The following code snippet shows the experience in SageMaker Core.

from sagemaker_core.shapes import ProductionVariant
from sagemaker_core.resources import Model, EndpointConfig, Endpoint

# Create an EndpointConfig resource
endpoint_config = EndpointConfig.create(
    endpoint_config_name="llm-endpoint-config",
    production_variants=[
        ProductionVariant(
            variant_name="llm-variant",
            initial_instance_count=1,
            instance_type="ml.g5.12xlarge",
            model_name=model
        )
    ]
)

ProductionVariant is an object in itself now.

  1. Create the endpoint

Create the endpoint using the following code snippet.

endpoint = Endpoint.create(
endpoint_name=model_name,
endpoint_config_name=endpoint_config,  # Pass `EndpointConfig` object created above
)

This also uses resource chaining. Instead of supplying just the endpoint_config_name (in Boto3), you pass the whole endpoint_config object.

As we have shown in these steps, SageMaker Core simplifies the development experience by providing an object-oriented interface for interacting with SageMaker resources. The use of intelligent defaults and resource chaining reduces the amount of boilerplate code and manual parameter specification, resulting in more readable and maintainable code.

Cleanup

Any endpoint created using the code in this post will incur charges. Shut down any unused endpoints by using the delete() method.

A note on existing SageMaker Python SDK

SageMaker Python SDK will be using the SageMaker Core as its foundation and will benefit from the object-oriented interfaces created as part of SageMaker Core. Customers can choose to use the object-oriented approach while using the SageMaker Python SDK going forward.

Benefits

The SageMaker Core SDK offers several benefits:

  • Simplified development – By abstracting low-level details and providing intelligent defaults, developers can focus on building and deploying ML models without getting slowed down by repetitive tasks. It also relieves the developers of the cognitive overload of having to remember long and complex multilevel dictionaries. They can instead work on the object-oriented paradigm that developers are most comfortable with.
  • Increased productivity – Features like automatic code completion and type hints help developers write code faster and with fewer errors.
  • Enhanced readability – Dedicated resource classes and resource chaining result in more readable and maintainable code.
  • Lightweight integration with AWS Lambda – Because this SDK is lightweight (about 8 MB when unzipped), it is straightforward to build an AWS Lambda layer for SageMaker Core and use it for executing various steps in the ML lifecycle through Lambda functions.

Conclusion

SageMaker Core is a powerful addition to Amazon SageMaker, providing a streamlined and efficient development experience for ML practitioners. With its object-oriented interface, resource chaining, and intelligent defaults, SageMaker Core empowers developers to focus on building and deploying ML models without getting slowed down by complex orchestration of JSON structures. Check out the following resources to get started today on SageMaker Core:


About the authors

Vikesh Pandey is a Principal GenAI/ML Specialist Solutions Architect at AWS, helping customers from financial industries design, build and scale their GenAI/ML workloads on AWS. He carries an experience of more than a decade and a half working on entire ML and software engineering stack. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.

sishwe-author-picShweta Singh is a Senior Product Manager in the Amazon SageMaker Machine Learning (ML) platform team at AWS, leading SageMaker Python SDK. She has worked in several product roles in Amazon for over 5 years. She has a Bachelor of Science degree in Computer Engineering and Masters of Science in Financial Engineering, both from New York University.

Read More