Build a custom UI for Amazon Q Business

Build a custom UI for Amazon Q Business

Amazon Q is a new generative artificial intelligence (AI)-powered assistant designed for work that can be tailored to your business. Amazon Q can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take actions using the data and expertise found in your company’s information repositories and enterprise systems. When you chat with Amazon Q, it provides immediate, relevant information and advice to help streamline tasks, speed up decision-making, and spark creativity and innovation at work. For more information, see Amazon Q Business, now generally available, helps boost workforce productivity with generative AI.

This post demonstrates how to build a custom UI for Amazon Q Business. The customized UI allows you to implement special features like handling feedback, using company brand colors and templates, and using a custom login. It also enables conversing with Amazon Q through an interface personalized to your use case.

Solution overview

In this solution, we deploy a custom web experience for Amazon Q to deliver quick, accurate, and relevant answers to your business questions on top of an enterprise knowledge base. The following diagram illustrates the solution architecture.

The workflow includes the following steps:

  1. The user accesses the chatbot application, which is hosted behind an Application Load Balancer.
  2. After the user logs in, they’re redirected to the Amazon Cognito login page for authentication.
    • This solution uses an Amazon Cognito user pool as an OAuth-compatible identity provider (IdP), which is required in order to exchange a token with AWS IAM Identity Center and later on interact with the Amazon Q Business APIs. For more information about trusted token issuers and how token exchanges are performed, see Using applications with a trusted token issuer. If you already have an OAuth-compatible IdP, you can use it instead of setting an Amazon Cognito user pool.
    • Provisioning local users in the user pool and reconciling them with IAM Identity Center can be error-prone. You can streamline the integration of IAM Identity Center users into the user pool by using a federated IdP and creating a second custom application (SAML) in IAM Identity Center. For instructions, refer to How do I integrate IAM Identity Center with an Amazon Cognito user pool and the associated demo video.
  3. The UI application, deployed on an Amazon Elastic Compute Cloud (Amazon EC2) instance, authenticates the user with Amazon Cognito and obtains an authentication token. It then exchanges this Amazon Cognito identity token for an IAM Identity Center token that grants the application permissions to access Amazon Q.
  4. The UI application assumes an AWS Identity and Access Management (IAM) role and retrieves an AWS session token from the AWS Security Token Service (AWS STS). This session token is augmented with the IAM Identity Center token, enabling the application to interact with Amazon Q. For more information about the token exchange flow between IAM Identity Center and the IdP, refer to How to develop a user-facing data application with IAM Identity Center and S3 Access Grants (Part 1) and Part 2.
  5. Amazon Q uses the chat_sync API to carry out the conversation.
    1. The request uses the following mandatory parameters:
      1. applicationId – The identifier of the Amazon Q application linked to the Amazon Q conversation.
      2. userMessage – An end-user message in a conversation.
    2. Amazon Q returns the response as a JSON object (detailed in the Amazon Q documentation). The following are a few core attributes from the response payload:
      1. systemMessage – An AI-generated message in a conversation.
      2. sourceAttributions – The source documents used to generate the conversation response. In Retrieval Augmentation Generation (RAG), this always refers to one or more documents from enterprise knowledge bases that are indexed in Amazon Q.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account set up.
  • A VPC where you will deploy the solution.
  • An IAM role in the account with sufficient permissions to create the necessary resources. If you have administrator access to the account, no additional action is required.
  • An existing, working Amazon Q application, integrated with IAM Identity Center. If you haven’t set one up yet, see Creating an Amazon Q application.
  • Access to IAM Identity Center to create a customer managed application.
  • An SSL certificate created and imported into AWS Certificate Manager (ACM). For more details, refer to Importing a certificate. If you don’t have a public SSL certificate, follow the steps in the next section to generate a private certificate.

Generate a private certificate

If you already have an SSL certificate, you can skip this section.

You will receive a warning from your browser when accessing the UI if you didn’t provide a custom SSL certificate when launching the AWS CloudFormation stack. The instructions in this section show you how to create a self-signed certificate. This is not recommended for production use cases. You should obtain an SSL certificate that has been validated by a certificate authority, import it into ACM, and reference this when launching the CloudFormation stack. If you want to continue with the self-signed certificate (for development purposes), you should be able to proceed past the browser warning page. With Chrome, you will see the message Your connection is not private error message (NET::ERR_CERT_AUTHORITY_INVALID), but by choosing Advanced, you should then see a link to proceed.

The following command generates a sample self-signed certificate (for development purposes) and uploads the certificate to ACM. You can also find the script on the GitHub repo.

openssl req 
  -x509 -nodes -days 365 -sha256 
  -subj '/C=US/ST=Oregon/L=Portland/CN=sampleexample.com' 
  -newkey rsa:2048 -keyout key.pem -out cert.pem

aws acm import-certificate --certificate fileb://cert.pem --private-key fileb://key.pem   

Note down the CertificateARN to use later while provisioning the CloudFormation template.

Provision resources with the CloudFormation template

The full source of the solution on in the GitHub repository and is deployed with AWS CloudFormation.

Choose Launch Stack to launch a CloudFormation stack in your account and deploy the template:

This template creates separate IAM roles for the Application Load Balancer, Amazon Cognito, and the EC2 instance. Additionally, it creates and configures those services to run the end-to-end demonstration.

Provide the following parameters for the stack:

  • Stack name – The name of the CloudFormation stack (for example, AmazonQ-UI-Demo).
  • AuthName – A globally unique name to assign to the Amazon Cognito user pool. Make sure your domain name doesn’t include any reserved words, such as cognito, aws, or amazon.
  • CertificateARN – The CertificateARN generated from the previous step.
  • IdcApplicationArn – This is the Amazon Resource Name (ARN) for the AWS Identity Center customer application. Leave it blank on the first run, because you need to create the Amazon Cognito user pool as part of this stack. This will create an IAM Identity Center application with an Amazon Cognito user pool as the trusted token issuer.
  • LatestAMIId – The ID of the AMI to use for the EC2 instance. We suggest keeping the default value.
  • PublicSubnetIds – The ID of the public subnet that can be used to deploy the EC2 instance and the Application Load Balancer.
  • QApplicationId – The existing application ID of Amazon Q.
  • VPCId – The ID of the existing VPC that can be used to deploy the demo.

After the CloudFormation stack deploys successfully, copy the following values on the stack’s Outputs tab:

  • Audience – Audience to set up the customer application in IAM Identity Center
  • RoleArn – ARN of the IAM role required to set up the token exchange in IAM Identity Center
  • TrustedIssuerUrl – Endpoint of the trusted issuer to set up IAM Identity Center
  • URL – The load balancer URL to access the UI application

Create an IAM Identity Center application

The actions described in this section are one-time actions. The goal is to configure an application in IAM Identity Center to represent the application you are building. Specifically, in this step, you configure IAM Identity Center to be able to trust the identity tokens by which your application will represent its authenticated users. Complete the following steps:

  1. On the IAM Identity Center console, add a new custom managed application.
  2. For Application type, select OAuth 2.0, then choose Next.
  3. Enter an application name and description.
  4. Set Application visibility as Not visible, then choose Next.
  5. On the Trusted token issuers tab, choose Create trusted token issuer.
  6. For Issuer URL, provide the TrustedIssuerUrl you copied from the CloudFormation stack output.
  7. Enter an issuer name and keep the map attributes as Email.
  8. In the IAM Identity Center application authentication settings, select the trusted token issuer created in the previous step and add the Aud claim, providing the audience you copied from the CloudFormation stack output, then choose Next.
  9. On the Specify application credentials tab, choose Enter one or more IAM roles and provide the value for RoleArn you copied from the CloudFormation stack output.
  10. Review all the steps and create the application.
  11. After the application is created, go to the application, choose Assign users and groups, and add the users who will have access to the UI application.
  12. On the Select setup type page, choose All applications for service with same access, choose Amazon Q from the Services list, and choose Trust applications.
  13. After the IAM Identity Center application is created, copy the application ARN.
  14. On the AWS CloudFormation console, update the stack and provide the IAM Identity Center application ARN for the parameter IdcApplicationArn, then run the stack.
  15. When the update process is complete, go to the CloudFormation stack’s Outputs tab and copy the URL provided there.

Custom UI

The CloudFormation stack deploys and starts the Streamlit application on an EC2 instance on port 8080. To view the health of the application running behind the Application Load Balancer, open the Amazon EC2 console and choose Load Balancing under Target groups in the navigation pane. For debugging purposes, you can also connect to Amazon EC2 through Session Manager, a capability of AWS Systems Manager.

To access the custom UI, use the URL that you copied from the CloudFormation stack output. Choose Sign up and use the same email address for the users that were registered in IAM Identity Center.

After successful authentication, you’re redirected to the custom UI. You can enhance it by implementing custom features like handling feedback, using your companies brand colors and templates, and personalizing it to your specific use case.

Clean up

To avoid future charges in your account, delete the resources you created in this walkthrough. The EC2 instance with the custom UI will incur charges as long as the instance is active, so stop it when you’re done.

  1. On the CloudFormation console, in the navigation pane, choose Stacks.
  2. Select the stack you launched (AmazonQ-UI-Demo), then choose Delete.

Conclusion

In this post, you learned how to integrate a custom UI with Amazon Q Business. Using a custom UI tailored to your specific needs and requirements makes Amazon Q more efficient and straightforward to use for your business. You can include your company branding and design, and have control and ownership over the user experience. For example, you could introduce custom feedback handling features.

The sample custom UI for Amazon Q discussed in this post is provided as open source—you can use it as a starting point for your own solution, and help improve it by contributing bug fixes and new features using GitHub pull requests. Explore the code, choose Watch in the GitHub repo to receive notifications about new releases, and check back for the latest updates. We welcome your suggestions for improvements and new features.

For more information on Amazon Q business, refer to the Amazon Q Business Developer Guide.


About the Authors

Ennio Emanuele Pastore is a Senior Architect on the AWS GenAI Labs team. He is an enthusiast of everything related to new technologies that have a positive impact on businesses and general livelihood. He helps organizations in achieving specific business outcomes by using data and AI, and accelerating their AWS Cloud adoption journey.

Deba is a Senior Architect on the AWS GenAI Labs team. He has extensive experience across big data, data science, and IoT, across consulting and industrials. He is an advocate of cloud-centered data and ML platforms and the value they can drive for customers across industries.

Joseph de Clerck is a senior Cloud Infrastructure Architect at AWS. He leverages his expertise to help enterprises solve their business challenges by effectively utilizing AWS services. His broad understanding of cloud technologies enables him to devise tailored solutions on topics such as analytics, security, infrastructure, and automation.

Read More

Scalable intelligent document processing using Amazon Bedrock

Scalable intelligent document processing using Amazon Bedrock

In today’s data-driven business landscape, the ability to efficiently extract and process information from a wide range of documents is crucial for informed decision-making and maintaining a competitive edge. However, traditional document processing workflows often involve complex and time-consuming manual tasks, hindering productivity and scalability.

In this post, we discuss an approach that uses the Anthropic Claude 3 Haiku model on Amazon Bedrock to enhance document processing capabilities. Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading artificial intelligence (AI) startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the AWS tools without having to manage any infrastructure.

At the heart of this solution lies the Anthropic Claude 3 Haiku model, the fastest and most affordable model in its intelligence class. With state-of-the-art vision capabilities and strong performance on industry benchmarks, Anthropic Claude 3 Haiku is a versatile solution for a wide range of enterprise applications. By using the advanced natural language processing (NLP) capabilities of Anthropic Claude 3 Haiku, our intelligent document processing (IDP) solution can extract valuable data directly from images, eliminating the need for complex postprocessing.

Scalable and efficient data extraction

Our solution overcomes the traditional limitations of document processing by addressing the following key challenges:

  • Simple prompt-based extraction – This solution allows you to define the specific data you need to extract from the documents through intuitive prompts. The Anthropic Claude 3 Haiku model then processes the documents and returns the desired information, streamlining the entire workflow.
  • Handling larger file sizes and multipage documents – To provide scalability and flexibility, this solution integrates additional AWS services to handle file sizes beyond the 5 MB limit of Anthropic Claude 3 Haiku. The solution can process both PDFs and image files, including multipage documents, providing comprehensive processing for unparalleled efficiency.

With the advanced NLP capabilities of the Anthropic Claude 3 Haiku model, our solution can directly extract the specific data you need without requiring complex postprocessing or parsing the output. This approach simplifies the workflow and enables more targeted and efficient document processing than traditional OCR-based solutions.

Confidence scores and human review

Maintaining data accuracy and quality is paramount in any document processing solution. This solution incorporates customizable rules, allowing you to define the criteria for invoking a human review. This provides a seamless collaboration between the automated extraction and human expertise, delivering high-quality results that meet your specific requirements.

In this post, we show how you can use Amazon Bedrock and Amazon Augmented AI (Amazon A2I) to build a workflow that enables multipage PDF document processing with a human reviewer loop.

Solution overview

The following architecture shows how you can have a serverless architecture to process multipage PDF documents or images with a human review. To implement this architecture, we take advantage of AWS Step Functions to build the overall workflow. As the workflow starts, it extracts individual pages from the multipage PDF document. It then uses the Map state to process multiple pages concurrently using the Amazon Bedrock API. After the data is extracted from the document, it validates against the business rules and sends the document to Amazon A2I for a human to review if any business rules fail. Reviewers use the Amazon A2I UI (a customizable website) to verify the extraction result. When the human review is complete, the callback task token is used to resume the state machine and store the output in an Amazon DynamoDB table.

You can deploy this solution following the steps in this post.

Prerequisites

For this walkthrough, you need the following:

Create an AWS Cloud9 IDE

We use an AWS Cloud9 integrated development environment (IDE) to deploy the solution. It provides a convenient way to access a full development and build environment. Complete the following steps:

  1. Sign in to the AWS Management Console through your AWS account.
  2. Select the AWS Region in which you want to deploy the solution.
  3. On the AWS Cloud9 console, choose Create environment.
  4. Name your environment mycloud9.
  5. Choose “t3.small” instance on the Amazon Linux2 platform.
  6. Choose Create.

AWS Cloud9 automatically creates and sets up a new Amazon Elastic Compute Cloud (Amazon EC2) instance in your account.

  1. When the environment is ready, select it and choose Open.

The AWS Cloud9 instance opens in a new terminal tab, as shown in the following screenshot.

Clone the source code to deploy the solution

Now that your AWS Cloud9 IDE is set up, you can proceed with the following steps to deploy the solution.

Confirm the Node.js version

AWS Cloud9 preinstalls Node.js. You can confirm the installed version by running the following command:

node --version

You should see output like the following:

V20.13.1

If you’re on v20.x or higher, you can skip to the steps in “Install the AWS CDK” section. If you’re on a different version of Node.js, complete the following steps:

  1. In an AWS Cloud9 terminal, run the following command to confirm you have the latest version of Node.js Version Manager (nvm) :
    curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.5/install.sh | bash

  2. Install Node.js 20:
    nvm install 20

  3. Confirm the current Node.js version by running the following command:
    node --version

Install the AWS CDK

Confirm whether you already have the AWS Cloud Development Kit (AWS CDK) installed. To do this, with the terminal session still open in the IDE, run the following command:

cdk --version

If the AWS CDK is installed, the output contains the AWS CDK version and build numbers. In this case, you can skip to the steps in “Download the source code” section. Otherwise, complete the following steps:

  1. Install the AWS CDK by running the npm command along with the install action, the name of the AWS CDK package to install, and the -g option to install the package globally in the environment:
    npm install -g aws-cdk

  2. To confirm that the AWS CDK is installed and correctly referenced, run the cdk command with the –version option:
    cdk --version

If successful, the AWS CDK version and build numbers are displayed.

Download the source code form the GitHub repo

Complete the following steps to download the source code:

  1. In an AWS Cloud9 terminal, clone the GitHub repo:
    git clone https://github.com/aws-samples/aws-generative-ai-document-processing-solution

  2. Run the following commands to create the Sharp npm package and copy the package to the source code:
    	mkdir -p ~/environment/sharplayer/nodejs && cd ~/environment/sharplayer/nodejs
    	npm init -y && npm install --arch=x64 --platform=linux sharp
    	cd .. && zip -r sharplayer.zip .
    	cp sharplayer.zip ~/environment/aws-generative-ai-document-processing-solution/deploy_code/multipagepdfa2i_imageresize/
    	cd .. && rm -r sharplayer
    

  3. Change to the repository directory:
    cd aws-generative-ai-document-processing-solution
    

  4. Run the following command:
    pip install -r requirements.txt

The first time you deploy an AWS CDK app into an environment for a specific AWS account and Region combination, you must install a bootstrap stack. This stack includes various resources that the AWS CDK needs to complete its operations. For example, this stack includes an Amazon Simple Storage Service (Amazon S3) bucket that the AWS CDK uses to store templates and assets during its deployment processes.

  1. To install the bootstrap stack, run the following command:
    cdk bootstrap

  2. From the project’s root directory, run the following command to deploy the stack:
    cdk deploy

If successful, the output displays that the stack deployed without errors.

The last step is to update the cross-origin resource sharing (CORS) for the S3 bucket.

  1. On the Amazon S3 console, choose Buckets in the navigation pane.
  2. Choose the name of the bucket that was created in the AWS CDK deployment step. It should have a name format like multipagepdfa2i-multipagepdf-xxxxxxxxx.
  3. Choose Permissions.
  4. In the Cross-origin resource sharing (CORS) section, choose Edit.
  5. In the CORS configuration editor text box, enter the following CORS configuration:
    [
         {
             "AllowedHeaders": [
                 "Authorization"
             ],
             "AllowedMethods": [
                 "GET",
                 "HEAD"
             ],
             "AllowedOrigins": [
                 "*"
             ],
             "ExposeHeaders": [
                 "Access-Control-Allow-Origin"
             ]
         }
     ]
    

  6. Choose Save changes.

Create a private work team

A work team is a group of people you select to review your documents. You can create a work team from a workforce, which is made up of Amazon Mechanical Turk workers, vendor-managed workers, or your own private workers that you invite to work on your tasks. Whichever workforce type you choose, Amazon A2I takes care of sending tasks to workers. For this solution, you create a work team using a private workforce and add yourself to the team to preview the Amazon A2I workflow.

To create and manage your private workforce, you can use the Amazon SageMaker console. You can create a private workforce by entering worker emails or importing a preexisting workforce from an Amazon Cognito user pool.

To create your private work team, complete the following steps:

  1. On the SageMaker console, choose Labeling workforces under Ground Truth in the navigation pane.
  2. On the Private tab, choose Create private team.
  3. Choose Invite new workers by email.
  4. In the Email addresses box, enter the email addresses for your work team (for this post, enter your email address).

You can enter a list of up to 50 email addresses, separated by commas.

  1. Enter an organization name and contact email.
  2. Choose Create private team.

After you create the private team, you get an email invitation. The following screenshot shows an example email.

After you choose the link and change your password, you will be registered as a verified worker for this team. The following screenshot shows the updated information on the Private tab.

Your one-person team is now ready, and you can create a human review workflow.

Create a human review workflow

You define the business conditions under which the Amazon Bedrock extracted content should go to a human for review. These business conditions are set in Parameter Store, a capability of AWS Systems Manager. For example, you can look for specific keys in the document. When the extraction is complete, in the AWS Lambda function, check for those keys and their values. If the key is not present or the value is blank, the form will go for human review.

Complete the following steps to create a worker task template for your document review task:

  1. On the SageMaker console, choose Worker task templates under Augmented AI in the navigation pane.
  2. Choose Create template.
  3. In the template properties section, enter a unique template name for Template name and select Custom for Template type.
  4. Copy the contents from the Custom template file you downloaded from GitHub repo and replace the content in the Template editor section.
  5. Choose Create and the template will be created successfully.

Next, you create instructions to help workers complete your document review task.

  1. Choose Human review workflows under Augmented AI in the navigation pane.
  2. Choose Create human review workflow.
  3. In the Workflow settings section, for Name, enter a unique workflow name.
  4. For S3 bucket, enter the S3 bucket that was created in the AWS CDK deployment step. It should have a name format like multipagepdfa2i-multipagepdf-xxxxxxxxx.

This bucket is where Amazon A2I will store the human review results.

  1. For IAM role, choose Create a new role for Amazon A2I to create a role automatically for you.
    • For S3 buckets you specify, select Specific S3 buckets.
    • Enter the S3 bucket you specified earlier in Step 9; for example, multipagepdfa2i-multipagepdf-xxxxxxxxxx.
    • Choose Create.

You see a confirmation when role creation is complete, and your role is now pre-populated on the IAM role dropdown menu.

  1. For Task type, select Custom.
  2. In the worker task template section, choose the template that you previously created.
  3. For Task Description, enter “Review the extracted content from the document and make changes as needed”.
  4. For Worker types, select Private.
  5. For Private teams, choose the work team you created earlier.
  6. Choose Create.

You’re redirected to the Human review workflows page, where you will see a confirmation message.

In a few seconds, the status of the workflow will be changed to active. Record your new human review workflow ARN, which you use to configure your human loop in a later step.

Update the solution with the human review workflow

You’re now ready to add your human review workflow Amazon Resource Name (ARN):

  1. Within the code you downloaded from GitHub repo, open the file
    /aws-generative-ai-document-processing-solution/multipagepdfa2i/multipagepdfa2i_stack.py. 

  2. Update line 23 with the ARN that you copied earlier:
    SAGEMAKER_WORKFLOW_AUGMENTED_AI_ARN_EV = "arn:aws:sagemaker:us-east-1:XXXXXXXXXXX:....

  3. Save the changes you made.
  4. Deploy by entering the following command:
    cdk deploy

Test the solution without business rules validation

To test the solution without using a human review, create a folder called uploads in the S3 bucket multipagepdfa2i-multipagepdf-xxxxxxxxx and upload the sample PDF document provided. For example, uploads/Vital-records-birth-application.pdf.

The content will be extracted, and you will see the data in the DynamoDB table
multipagepdfa2i-ddbtableVitalBirthDataXXXXX
.

Test the solution with business rules validation

Complete the following steps to test the solution with a human review:

  1. On the Systems Manager console , choose Parameter Store in the navigation pane.
  2. Select the Parameter /business_rules/validationrequied and update the value to yes.
  3. upload the sample PDF document provided to the uploads folder that you created earlier in the S3 bucket multipagepdfa2i-multipagepdf-xxxxxxxxx
  4. On the SageMaker console, choose Labeling workforces under Ground Truth in the navigation pane.
  5. On the Private tab, choose the link under Labeling portal sign-in URL.
  6. Sign in with the account you configured with Amazon Cognito.
  7. Select the job you want to complete and choose Start working.

In the reviewer UI, you will see instructions and the document to work on. You can use the toolbox to zoom in and out, fit image, and reposition the document.

This UI is specifically designed for document-processing tasks. On the right side of the preceding screenshot, the extracted data is automatically prefilled with the Amazon Bedrock response. As a worker, you can quickly refer to this sidebar to make sure the extracted information is identified correctly.

When you complete the human review, you will see the data in the DynamoDB table
multipagepdfa2i-ddbtableVitalBirthDataXXXXX
.

Conclusion

In this post, we showed you how to use the Anthropic Claude 3 Haiku model on Amazon Bedrock and Amazon A2I to automatically extract data from multipage PDF documents and images. We also demonstrated how to conduct a human review of the pages for given business criteria. By eliminating the need for complex postprocessing, handling larger file sizes, and integrating a flexible human review process, this solution can help your business unlock the true value of your documents, drive informed decision-making, and gain a competitive edge in the market.

Overall, this post provides a roadmap for building an scalable document processing workflow using Anthropic Claude models on Amazon Bedrock.

As next steps, check out What is Amazon Bedrock to start using the service. Follow the Amazon Bedrock on the AWS Machine Learning Blog to keep up to date with new capabilities and use cases for Amazon Bedrock.


About the Authors

Venkata Kampana is a Senior Solutions Architect in the AWS Health and Human Services team and is based in Sacramento, CA. In that role, he helps public sector customers achieve their mission objectives with well-architected solutions on AWS.

Jim Daniel is the Public Health lead at Amazon Web Services. Previously, he held positions with the United States Department of Health and Human Services for nearly a decade, including Director of Public Health Innovation and Public Health Coordinator. Before his government service, Jim served as the Chief Information Officer for the Massachusetts Department of Public Health.

Read More

Use weather data to improve forecasts with Amazon SageMaker Canvas

Use weather data to improve forecasts with Amazon SageMaker Canvas

Photo by Zbynek Burival on Unsplash

Photo by Zbynek Burival on Unsplash

Time series forecasting is a specific machine learning (ML) discipline that enables organizations to make informed planning decisions. The main idea is to supply historic data to an ML algorithm that can identify patterns from the past and then use those patterns to estimate likely values about unseen periods in the future.

Amazon has a long heritage of using time series forecasting, dating back to the early days of having to meet mail-order book demand. Fast forward more than a quarter century and advanced forecasting using modern ML algorithms is offered to customers through Amazon SageMaker Canvas, a no-code workspace for all phases of ML. SageMaker Canvas enables you to prepare data using natural language, build and train highly accurate models, generate predictions, and deploy models to production—all without writing a single line of code.

In this post, we describe how to use weather data to build and implement a forecasting cycle that you can use to elevate your business’ planning capabilities.

Business use cases for time series forecasting

Today, companies of every size and industry who invest in forecasting capabilities can improve outcomes—whether measured financially or in customer satisfaction—compared to using intuition-based estimation. Regardless of industry, every customer desires highly accurate models that can maximize their outcome. Here, accuracy means that future estimates produced by the ML model end up being as close as possible to the actual future. If the ML model estimates either too high or too low, it can reduce the effectiveness the business was hoping to achieve.

To maximize accuracy, ML models benefit from rich, quality data that reflects demand patterns, including cycles of highs and lows, and periods of stability. The shape of these historic patterns may be driven by several factors. Examples include seasonality, marketing promotions, pricing, and in-stock availability for retail sales, or temperature, length of daylight, or special events for utility demand. Local, regional, and world factors such as commodity prices, financial markets, and events such as COVID-19 can also change demand trajectory.

Weather is a key factor that can influence forecasts in many domains, and comes in long-term and short-term varieties. The following are just a few examples of how weather can affect time series estimates:

  • Energy companies use temperature forecasts to predict energy demand and manage supply accordingly. Warmer weather and sunny days can drive up demand for air conditioning.
  • Agribusinesses forecast crop yields using weather data like rainfall, temperature, humidity, and more. This helps optimize planting, harvesting, and pricing decisions.
  • Outdoor events might be influenced by short-term weather forecasts such as rain, heat, or storms that could change attendance, fresh prepared food needs, staffing, and more.
  • Airlines use weather forecasts to schedule staff and equipment efficiently. Bad weather can cause flight delays and cancellations.

If weather has an influence on your business planning, it’s important to use weather signals from both the past and the future to help inform your planning. The remaining portion of this post discusses how you can source, prepare, and use weather data to help improve and inform your journey.

Find a weather data provider

First, if you have not already done so, you will need to find a weather data provider. There are many providers that offer a wide variety of capabilities. The following are just a few things to consider as you select a provider:

  • Price – Some providers offer free weather data, some offer subscriptions, and some offer meter-based packages.
  • Information capture method – Some providers allow you to download data in bulk, whereas others enable you to fetch data in real time through programmatic API calls.
  • Time resolution – Depending in your business, you might need weather at the hourly level, daily level, or other interval. Make sure the provider you choose provides data at the right level of control to manage your business decisions.
  • Time coverage – It’s important to select a provider based on their ability to provide historic and future forecasts aligned with your data. If you have 3 years of your own history, then find a provider that has that amount of history too. If you’re an outdoor stadium manager who needs to know weather for several days ahead, select a provider that has a weather forecast out as far as you need to plan. If you’re a farmer, you might need a long-term seasonal forecast, so your data provider should have future-dated data in line with your forecast horizon.
  • Geography – Different providers have data coverage for different parts of the world, including both land and sea coverage. Providers may have information at GPS coordinates, ZIP code level, or other. Energy companies might seek to have weather by GPS coordinates, enabling them to personalize weather forecasts to their meter locations.
  • Weather features – There are many weather-related features available, including not only the temperature, but other key data points such as precipitation, solar index, pressure, lightning, air quality, and pollen, to name a few.

In making the provider choice, be sure to conduct your own independent search and perform due diligence. Selecting the right provider is crucial and can be a long-term decision. Ultimately, you will decide on one or more providers that are a best fit for your unique needs.

Build a weather ingestion process

After you have identified a weather data provider, you need to develop a process to harvest their data, which will be blended with your historic data. In addition to building a time series model, SageMaker Canvas is able to help build your weather data processing pipeline. The automated process might have the following steps, generally, though your use case might vary:

  1. Identify your locations – In your data, you will need to identify all the unique locations through time, whether by postal code, address, or GPS coordinates. In some cases, you may need to geocode your data, for example convert a mailing address to GPS coordinates. You can use Amazon Location Service to assist with this conversion, as needed. Ideally, if you do geocode, you should only need to do this one time, and retain the GPS coordinates for your postal code or address.
  2. Acquire weather data – For each of your locations, you should acquire historic data and persist this information so you only need to retrieve it one time.
  3. Store weather data – For each of your locations, you need to develop a process to harvest future-dated weather predictions, as part of your pipeline to build an ML model. AWS has many databases to help store your data, including cost-effective data lakes on Amazon Simple Storage Service (Amazon S3).
  4. Normalize weather data – Prior to moving to the next step, it’s important to make all weather data relative to location and set on the same scale. Barometric pressure can have values in the 1000+ range; temperature exists on another scale. Pollen, ultraviolet light, and other weather measures also have independent scales. Within a geography, any measure is relative to that location’s own normal. In this post, we demonstrate how to normalize weather features for each location to help make sure no feature has bias over another, and to help maximize the effectiveness of weather data on a global basis.
  5. Combine internal business data with external weather data – As part of your time series pipeline, you will need to harvest historical business data to train a model. First, you will extract data, such as weekly sales data by product sold and by retail store for the last 4 years.

Don’t be surprised if your company needs several forecasts that are independent and concurrent. Each forecast can offer multiple perspectives to help navigate. For example, you may have a short-term weather forecast to make sure weather-volatile products are stocked. In addition, a medium-term forecast can help make replenishment decisions. Finally, you can use a long-term forecast to estimate growth of the company or make seasonal buying decisions that require long lead times.

At this point, you will combine weather and business data together by joining (or merging) them together using time and location. An example follows in the next section.

Example weather ingestion process

The following screenshot and code snippet show an example of using SageMaker Canvas to geocode location data using Amazon Location Service.

This process submits a location to Amazon Location Service and receives a response in the form of latitude and longitude. The example provides a city as input—but your use cases should provide postal codes or specific street addresses depending on your need for location precision. As guidance, take care to persist the responses in a data store, so you aren’t continuously performing geocoding on the same locations each forecasting cycle. Instead, determine which locations you have not geocoded and only perform those. The latitude and longitude are important and are used in a later step to request weather data from your selected provider.

import json, boto3
from pyspark.sql.functions import col, udf
import pyspark.sql.types as types

def obtain_lat_long(place_search):
   location = boto3.client('location')
   response = location.search_place_index_for_text(IndexName = 'myplaceindex', Text = str(place_search))
   return (response['Results'][0]['Place']['Geometry']['Point'])

UDF = udf(lambda z: obtain_lat_long(z),
types.StructType([types.StructField('longitude', types.DoubleType()),
types.StructField('latitude', types.DoubleType())
]))

# use the UDF to create a struct column with lat and long
df = df.withColumn('lat_long', UDF(col('Location')))
# extract the lat and long from the struct column
df = df.withColumn("latitude", col("lat_long.latitude"))
df = df.withColumn("longitude", col("lat_long.longitude"))
df = df.drop('lat_long')

In the following screenshots, we show an example of calling a weather provider using the latitude and longitude. Each provider will have differing capabilities, which is why selecting a provider is an important consideration. The example we show in this post could be used for historical weather capture as well as future-dated weather forecast capture.

The following screenshot shows an example of using SageMaker Canvas to connect to a weather provider and retrieve weather data.

The following code snippet illustrates how you might provide a latitude and longitude pair to a weather provider, along with parameters such as specific types of weather features, time periods, and time resolution. In this example, a request for temperature and Barometric pressure is made. The data is requested at the hourly level for the next day ahead. Your use case will vary; consider this as an example.

import requests, json
from pyspark.sql.functions import col, udf

def get_weather_data(latitude, longitude):

    params = {
        "latitude": str(latitude),
        "longitude": str(longitude),
        "hourly" : "temperature_2m,surface_pressure",
        "forecast_days": 1
    }

    response = requests.get(url= weather_provider_url, params=params)

return response.content.decode('utf-8')

UDF = udf(lambda latitude,longitude: get_weather_data(latitude, longitude))
df = df.withColumn('weather_response', UDF(col('latitude'), col('longitude')))

After you retrieve the weather data, the next step is to convert structured weather provider data into a tabular set of data. As you can see in the following screenshot, temperature and pressure data are available at the hourly level by location. This will enable you to join the weather data alongside your historic demand data. It’s important you use future-dated weather data to train your model. Without future-dated data, there is no basis to use weather to help inform what might lie ahead.

The following code snippet is from the preceding screenshot. This code converts the weather provider nested JSON array into tabular features:

from pyspark.sql.functions import from_json, struct, col, regexp_replace, cast
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType, ArrayType, MapType, LongType
from pyspark.sql.functions import explode, arrays_zip, array

json_schema = StructType([
        StructField("hourly", StructType([
        StructField("time", ArrayType(StringType()), True),
        StructField("temperature_2m", ArrayType(DoubleType()), True),
        StructField("surface_pressure", ArrayType(DoubleType()), True)
    ]), True)
])

#parse string into structure
df = df.withColumn("weather_response", from_json(col("weather_response"), json_schema))

#extract feature arrays
df = df.withColumn("time",col("weather_response.hourly.time"))
df = df.withColumn("temperature_2m",col("weather_response.hourly.temperature_2m"))
df = df.withColumn("surface_pressure",col("weather_response.hourly.surface_pressure"))

#explode all arrays together
df = df.withColumn("zipped", arrays_zip("surface_pressure", "temperature_2m", "time")) 
  .withColumn("exploded", explode("zipped")) 
  .select("Location", "exploded.time", "exploded.surface_pressure", "exploded.temperature_2m")

#cleanup format of timestamp
df = df.withColumn("time", regexp_replace(col("time"), "T", " "))

In this next step, we demonstrate how to set all weather features on the same scale—a scale that is also sensitive to each location’s range of values. In the preceding screenshot, observe how pressure and temperature in Seattle are on different scales. Temperature in Celsius is single or double digits, and pressure exceeds 1,000. Seattle may also have different ranges than any other city, as the result of its unique climate, natural topology, and geographic position. In this normalization step, the goal is to bring all weather features on a same scale, so pressure doesn’t outweigh temperature. We also want to place Seattle on its own scale, Mumbai on its own scale, and so forth. In the following screenshot, the minimum and maximum values per location are obtained. These are important intermediate computations for scaling, where weather values are set based on their position in the observed range by geography.

With the extreme values computed per location, a data frame with row-level values can be joined to a data frame with minimum and maximum values on locations being equal. The result is scaled data, according to a normalization formula that follows with example code.

First, this code example computes the minimum and maximum weather values per location. Next, the range is computed. Finally, a data frame is created with the location, range, and minimum per weather feature. Maximum is not needed because the range can be used as part of the normalization formula. See the following code:

from pyspark.sql.functions import min,max, expr, sum

df = df.groupBy("Location") 
	.agg(min("surface_pressure").alias("min_surface_pressure"), 
		max("surface_pressure").alias("max_surface_pressure"), 
		min("temperature_2m").alias("min_temperature_2m"), 
		max("temperature_2m").alias("max_temperature_2m")
		)

df = df.withColumn("range_surface_pressure",
	df.max_surface_pressure-df.min_surface_pressure)

df = df.withColumn("range_temperature_2m",
	df.max_temperature_2m-df.min_temperature_2m)

df = df.select("Location", 
	"range_surface_pressure", "min_surface_pressure", 
	"range_temperature_2m","min_temperature_2m" 
    )

In this code snippet, the scaled value is computed according the normalization formula shown. The minimum value is being subtracted from the actual value, at each time interval. Next, the difference is divided by the range. In the previous screenshot, you can see values range on a 0–1 scale. Zero is the lowest observed value for the location; 1 is the highest observed value for the location, for all the time periods where data exists.

Here, we compute the scaled x, represented as x’ :

from pyspark.sql.functions import round

df = df.withColumnRenamed('Location_0','Location')

df = df.withColumn('scaled_temperature_2m',
                     (df.temperature_2m-df.min_temperature_2m) / 
                         df.range_temperature_2m)

df = df.withColumn('scaled_surface_pressure',
                     (df.surface_pressure-df.min_surface_pressure) / 
                         df.range_surface_pressure)

df = df.drop('Location_1','min_surface_pressure','range_surface_pressure',
            'min_temperature_2m','range_temperature_2m')

Build a forecasting workflow with SageMaker Canvas

With your historic data and weather data now available to you, the next step is to bring your business data and prepared weather data together to build your time series model. The following high-level steps are required:

  1. Combine weather data with your historic data on a point-in-time and location basis. Your actual data will end, but the weather data should extend out to the end of your horizon.

This is a crucial point—weather data can only help your forecast if it’s included in your future forecast horizon. The following screenshot illustrates weather data alongside business demand data. For each item and location, known historic unit demand and weather features are provided. The red boxes added to the screenshot highlight the concept of future data, where weather data is provided, yet future demand is not provided because it remains unknown.

  1. After your data is prepared, you can use SageMaker Canvas to build a time series model with a few-clicks—no coding required.

As you get started, you should build a time series model in Canvas with and without weather data. This will let you quickly quantify how much of an impact weather data has for your forecast. You may find that some items are more impacted by weather than others.

  1. After you add the weather data, use SageMaker Canvas feature importance scores to quantify which weather features are important, and retain these in the future. For example, if pollen value has no lift in accuracy but barometric pressure does, you can eliminate the pollen data feature to keep your process as simple as possible.

As an alternate to using a visual interface, we have also created a sample notebook on GitHub that demonstrates how to use SageMaker Canvas AutoML capabilities as an API. This method can be useful when your business prefers to orchestrate forecasting through programmatic APIs.

Clean up

Choose Log out in the left pane to log out of the Amazon SageMaker Canvas application to stop the consumption of SageMaker Canvas workspace instance hours. This will release all resources used by the workspace instance.

Conclusion

In this post, we discussed the importance of time series forecasting to business, and focused on how you can use weather data to build a more accurate forecasting model in certain cases. This post described key factors you should consider when finding a weather data provider and how to build a pipeline that sources and stages the external data, so that it can be combined with your existing data, on a time-and-place basis. Next, we discussed how to use SageMaker Canvas to combine these datasets and train a time series ML model with no coding required. Finally, we suggested that you compare a model with and without weather data so you can quantify the impact and also learn which weather features drive your business decisions.

If you’re ready to start this journey, or improve on an existing forecast method, reach out to your AWS account team and ask for an Amazon SageMaker Canvas Immersion Day. You can gain hands-on experience and learn how to apply ML to improve forecasting outcomes in your business.


About the Author

Charles Laughlin is a Principal AI Specialist at Amazon Web Services (AWS). Charles holds an MS in Supply Chain Management and a PhD in Data Science. Charles works in the Amazon SageMaker service team where he brings research and voice of the customer to inform the service roadmap. In his work, he collaborates daily with diverse AWS customers to help transform their businesses with cutting-edge AWS technologies and thought leadership.

Read More

Reimagining software development with the Amazon Q Developer Agent

Reimagining software development with the Amazon Q Developer Agent

Amazon Q Developer is an AI-powered assistant for software development that reimagines the experience across the entire software development lifecycle, making it faster to build, secure, manage, and optimize applications on or off of AWS. The Amazon Q Developer Agent includes an agent for feature development that automatically implements multi-file features, bug fixes, and unit tests in your integrated development environment (IDE) workspace using natural language input. After you enter your query, the software development agent analyzes your code base and formulates a plan to fulfill the request. You can accept the plan or ask the agent to iterate on it. After the plan is validated, the agent generates the code changes needed to implement the feature you requested. You can then review and accept the code changes or request a revision.

Amazon Q Developer uses generative artificial intelligence (AI) to deliver state-of-the-art accuracy for all developers, taking first place on the leaderboard for SWE-bench, a dataset that tests a system’s ability to automatically resolve GitHub issues. This post describes how to get started with the software development agent, gives an overview of how the agent works, and discusses its performance on public benchmarks. We also delve into the process of getting started with the Amazon Q Developer Agent and give an overview of the underlying mechanisms that make it a state-of-the-art feature development agent.

Getting started

To get started, you need to have an AWS Builder ID or be part of an organization with an AWS IAM Identity Center instance set up that allows you to use Amazon Q. To use Amazon Q Developer Agent for feature development in Visual Studio Code, start by installing the Amazon Q extension. The extension is also available for JetBrains, Visual Studio (in preview), and in the Command Line on macOS. Find the latest version on the Amazon Q Developer page.

Amazon Q App card in VS Code

After authenticating, you can invoke the feature development agent by entering /dev in the chat field.

Invoking /dev in Amazon Q

The feature development agent is now ready for your requests. Let’s use the repository of Amazon’s Chronos forecasting model to demonstrate how the agent works. The code for Chronos is already of high quality, but unit test coverage could be improved in places. Let’s ask the software development agent to improve the unit test coverage of the file chronos.py. Stating your request as clearly and precisely as you can will help the agent deliver the best possible solution.

/dev initial prompt

The agent returns a detailed plan to add missing tests in the existing test suite test/test_chronos.py. To generate the plan (and later the code change), the agent has explored your code base to understand how to satisfy your request. The agent will work best if the names of files and functions are descriptive of their intent.

Plan generated by the agent

You are asked to review the plan. If the plan looks good and you want to proceed, choose Generate code. If you find that it can be improved in places, you can provide feedback and request an improved plan.

The agent asking for plan validation

After the code is generated, the software development agent will list the files for which it has created a diff (for this post, test/test_chronos.py). You can review the code changes and decide to either insert them in your code base or provide feedback on possible improvements and regenerate the code.

List of files changed by the agent.

Choosing a modified file opens a diff view in the IDE showing which lines have been added or modified. The agent has added multiple unit tests for parts of chronos.py that were not previously covered.

the diff generated by the agent.

After you review the code changes, you can decide to insert them, provide feedback to generate the code again, or discard it altogether. That’s it; there is nothing else for you to do. If you want to request another feature, invoke dev again in Amazon Q Developer.

System overview

Now that we have shown you how to use Amazon Q Developer Agent for software development, let’s explore how it works. This is an overview of the system as of May 2024. The agent is continuously being improved. The logic described in this section will evolve and change.

When you submit a query, the agent generates a structured representation of the repository’s file system in XML. The following is an example output, truncated for brevity:

<tree>
  <directory name="requests">
    <file name="README.rst"/>
    <directory name="requests">
      <file name="adapters.py"/>
      <file name="api.py"/>
      <file name="models.py"/>
      <directory name="packages">
        <directory name="chardet">
          <file name="charsetprober.py"/>
          <file name="codingstatemachine.py"/>
        </directory>
        <file name="__init__.py"/>
        <file name="README.rst"/>
        <directory name="urllib3">
          <file name="connectionpool.py"/>
          <file name="connection.py"/>
          <file name="exceptions.py"/>
          <file name="fields.py"/>
          <file name="filepost.py"/>
          <file name="__init__.py"/>
        </directory>
      </directory>
    </directory>
    <file name="setup.cfg"/>
    <file name="setup.py"/>
  </directory>
</tree>

An LLM then uses this representation with your query to determine which files are relevant and should be retrieved. We use automated systems to check that the files identified by the LLM are all valid. The agent uses the retrieved files with your query to generate a plan for how it will resolve the task you have assigned to it. This plan is returned to you for validation or iteration. After you validate the plan, the agent moves to the next step, which ultimately ends with a proposed code change to resolve the issue.

The content of each retrieved code file is parsed with a syntax parser to obtain an XML syntax tree representation of the code, which the LLM is capable of using more efficiently than the source code itself while using far fewer tokens. The following is an example of that representation. Non-code files are encoded and chunked using a logic commonly used in Retrieval Augmented Generation (RAG) systems to allow for the efficient retrieval of chunks of documentation.

The following screenshot shows a chunk of Python code.

A snippet of Python code

The following is its syntax tree representation.

A syntax tree representation of python code

The LLM is prompted again with the problem statement, the plan, and the XML tree structure of each of the retrieved files to identify the line ranges that need updating in order to resolve the issue. This approach allows you to be more frugal with your usage of LLM bandwidth.

The software development agent is now ready to generate the code that will resolve your issue. The LLM directly rewrites sections of code, rather than attempting to generate a patch. This task is much closer to those that the LLM was optimized to perform compared to attempting to directly generate a patch. The agent proceeds to some syntactic validation of the generated code and attempts to fix issues before moving to the final step. The original and rewritten code are passed to a diff library to generate a patch programmatically. This creates the final output that is then shared with you to review and accept.

System accuracy

In the press release announcing the launch of Amazon Q Developer Agent for feature development, we shared that the model scored 13.82% on SWE-bench and 20.33% on SWE-bench lite, putting it at the top of the SWE-bench leaderboard as of May 2024. SWE-bench is a public dataset of over 2,000 tasks from 12 popular Python open source repositories. The key metric reported in the leaderboard of SWE-bench is the pass rate: how often we see all the unit tests associated to a specific issue passing after an AI-generated code changes are applied. This is an important metric because our customers want to use the agent to solve real-world problems and we are proud to report a state-of-the-art pass rate.

A single metric never tells the whole story. We look at the performance of our agent as a point on the Pareto front over multiple metrics. The Amazon Q Developer Agent for software development is not specifically optimized for SWE-bench. Our approach focuses on optimizing for a range of metrics and datasets. For instance, we aim to strike a balance between accuracy and resource efficiency, such as the number of LLMs calls and input/output tokens used, because this directly impacts runtime and cost. In this regard, we take pride in our solution’s ability to consistently deliver results within minutes.

Limitations of public benchmarks

Public benchmarks such as SWE-bench are an incredibly useful contribution to the AI code generation community and present an interesting scientific challenge. We are grateful to the team releasing and maintaining this benchmark. We are proud to be able to share our state-of-the-art results on this benchmark. Nonetheless, we would like to call out a few limitations, which are not exclusive to SWE-bench.

The success metric for SWE-bench is binary. Either a code change passes all tests or it does not. We believe that this doesn’t capture the full value feature development agents can generate for developers. Agents save developers a lot of time even when they don’t implement the entirety of a feature at once. Latency, cost, number of LLM calls, and number of tokens are all highly correlated metrics that represent the computational complexity of a solution. This dimension is as important as accuracy for our customers.

The test cases included in the SWE-bench benchmark are publicly available on GitHub. As such, it’s possible that these test cases may have been used in the training data of various large language models. Although LLMs have the capability to memorize portions of their training data, it’s challenging to quantify the extent to which this memorization occurs and whether the models are inadvertently leaking this information during testing.

To investigate this potential concern, we have conducted multiple experiments to evaluate the possibility of data leakage across different popular models. One approach to testing memorization involves asking the models to predict the next line of an issue description given a very short context. This is a task that they should theoretically struggle with in the absence of memorization. Our findings indicate that recent models exhibit signs of having been trained on the SWE-bench dataset.

The following figure shows the distribution of rougeL scores when asking each model to complete the next sentence of an SWE-bench issue description given the preceding sentences.

RougeL scores to measure information leakage of SWE-bench on different models.

We have shared measurements of the performance of our software development agent on SWE-bench to offer a reference point. We recommend testing the agents on private code repositories that haven’t been used in the training of any LLMs and compare these results with the ones of publicly available baselines. We will continue benchmarking our system on SWE-bench while focusing our testing on private benchmarking datasets that haven’t been used to train models and that better represent the tasks submitted by our customers.

Conclusion

This post discussed how to get started with Amazon Q Developer Agent for software development. The agent automatically implements features that you describe with natural language in your IDE. We gave you an overview of how the agent works behind the scenes and discussed its state-of-the-art accuracy and position at the top of the SWE-bench leaderboard.

You are now ready to explore the capabilities of Amazon Q Developer Agent for software development and make it your personal AI coding assistant! Install the Amazon Q plugin in your IDE of choice and start using Amazon Q (including the software development agent) for free using your AWS Builder ID or subscribe to Amazon Q to unlock higher limits.


About the authors

Christian Bock is an applied scientist at Amazon Web Services working on AI for code.

Laurent Callot is a Principal Applied Scientist at Amazon Web Services leading teams creating AI solutions for developers.

Tim Esler is a Senior Applied Scientist at Amazon Web Services working on Generative AI and Coding Agents for building developer tools and foundational tooling for Amazon Q products.

Prabhu Teja is an Applied Scientist at Amazon Web Services. Prabhu works on LLM assisted code generation with a focus on natural language interaction.

Martin Wistuba is a senior applied scientist at Amazon Web Services. As part of Amazon Q Developer, he is helping developers to write more code in less time.

Giovanni Zappella is a Principal Applied Scientist working on the creations of intelligent agents for code generation. While at Amazon he also contributed to the creation of new algorithms for Continual Learning, AutoML and recommendations systems.

Read More

Get started quickly with AWS Trainium and AWS Inferentia using AWS Neuron DLAMI and AWS Neuron DLC

Get started quickly with AWS Trainium and AWS Inferentia using AWS Neuron DLAMI and AWS Neuron DLC

Starting with the AWS Neuron 2.18 release, you can now launch Neuron DLAMIs (AWS Deep Learning AMIs) and Neuron DLCs (AWS Deep Learning Containers) with the latest released Neuron packages on the same day as the Neuron SDK release. When a Neuron SDK is released, you’ll now be notified of the support for Neuron DLAMIs and Neuron DLCs in the Neuron SDK release notes, with a link to the AWS documentation containing the DLAMI and DLC release notes. In addition, this release introduces a number of features that help improve user experience for Neuron DLAMIs and DLCs. In this post, we walk through some of the support highlights with Neuron 2.18.

Neuron DLC and DLAMI overview and announcements

The DLAMI is a pre-configured AMI that comes with popular deep learning frameworks like TensorFlow, PyTorch, Apache MXNet, and others pre-installed. This allows machine learning (ML) practitioners to rapidly launch an Amazon Elastic Compute Cloud (Amazon EC2) instance with a ready-to-use deep learning environment, without having to spend time manually installing and configuring the required packages. The DLAMI supports various instance types, including Neuron Trainium and Inferentia powered instances, for accelerated training and inference.

AWS DLCs provide a set of Docker images that are pre-installed with deep learning frameworks. The containers are optimized for performance and available in Amazon Elastic Container Registry (Amazon ECR). DLCs make it straightforward to deploy custom ML environments in a containerized manner, while taking advantage of the portability and reproducibility benefits of containers.

Multi-Framework DLAMIs

The Neuron Multi-Framework DLAMI for Ubuntu 22 provides separate virtual environments for multiple ML frameworks: PyTorch 2.1, PyTorch 1.13, Transformers NeuronX, and TensorFlow 2.10. DLAMI offers you the convenience of having all these popular frameworks readily available in a single AMI, simplifying their setup and reducing the need for multiple installations.

This new Neuron Multi-Framework DLAMI is now the default choice when launching Neuron instances for Ubuntu through the AWS Management Console, making it even faster for you to get started with the latest Neuron capabilities right from the Quick Start AMI list.

Existing Neuron DLAMI support

The existing Neuron DLAMIs for PyTorch 1.13 and TensorFlow 2.10 have been updated with the latest 2.18 Neuron SDK, making sure you have access to the latest performance optimizations and features for both Ubuntu 20 and Amazon Linux 2 distributions.

AWS Systems Manager Parameter Store support

Neuron 2.18 also introduces support in Parameter Store, a capability of AWS Systems Manager, for Neuron DLAMIs, allowing you to effortlessly find and query the DLAMI ID with the latest Neuron SDK release. This feature streamlines the process of launching new instances with the most up-to-date Neuron SDK, enabling you to automate your deployment workflows and make sure you’re always using the latest optimizations.

Availability of Neuron DLC Images in Amazon ECR

To provide customers with more deployment options, Neuron DLCs are now hosted both in the public Neuron ECR repository and as private images. Public images provide seamless integration with AWS ML deployment services such as Amazon EC2, Amazon Elastic Container Service (Amazon ECS), and Amazon Elastic Kubernetes Service (Amazon EKS); private images are required when using Neuron DLCs with Amazon SageMaker.

Updated Dockerfile locations

Prior to this release, Dockerfiles for Neuron DLCs were located within the AWS/Deep Learning Containers repository. Moving forward, Neuron containers can be found in the AWS-Neuron/ Deep Learning Containers repository.

Improved documentation

The Neuron SDK documentation and AWS documentation sections for DLAMI and DLC now have up-to-date user guides about Neuron. The Neuron SDK documentation also includes a dedicated DLAMI section with guides on discovering, installing, and upgrading Neuron DLAMIs, along with links to release notes in AWS documentation.

Using the Neuron DLC and DLAMI with Trn and Inf instances

AWS Trainium and AWS Inferentia are custom ML chips designed by AWS to accelerate deep learning workloads in the cloud.

You can choose your desired Neuron DLAMI when launching Trn and Inf instances through the console or infrastructure automation tools like AWS Command Line Interface (AWS CLI). After a Trn or Inf instance is launched with the selected DLAMI, you can activate the virtual environment corresponding to your chosen framework and begin using the Neuron SDK. If you’re interested in using DLCs, refer to the DLC documentation section in the Neuron SDK documentation or the DLC release notes section in the AWS documentation to find the list of Neuron DLCs with the latest Neuron SDK release. Each DLC in the list includes a link to the corresponding container image in the Neuron container registry. After choosing a specific DLC, please refer to the DLC walkthrough in the next section to learn how to launch scalable training and inference workloads using AWS services like Kubernetes (Amazon EKS), Amazon ECS, Amazon EC2, and SageMaker. The following sections contain walkthroughs for both the Neuron DLC and DLAMI.

DLC walkthrough

In this section, we provide resources to help you use containers for your accelerated deep learning model acceleration on top of AWS Inferentia and Trainium enabled instances.

The section is organized based on the target deployment environment and use case. In most cases, it is recommended to use a preconfigured DLC from AWS. Each DLC is preconfigured to have all the Neuron components installed and is specific to the chosen ML framework.

Locate the Neuron DLC image

The PyTorch Neuron DLC images are published to ECR Public Gallery, which is the recommended URL to use for most cases. If you’re working within SageMaker, use the Amazon ECR URL instead of the Amazon ECR Public Gallery. TensorFlow DLCs are not updated with the latest release. For earlier releases, refer to Neuron Containers. In the following sections, we provide the recommended steps for running an inference or training job in Neuron DLCs.

Prerequisites

Prepare your infrastructure (Amazon EKS, Amazon ECS, Amazon EC2, and SageMaker) with AWS Inferentia or Trainium instances as worker nodes, making sure they have the necessary roles attached for Amazon ECR read access to retrieve container images from Amazon ECR: arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly.

When setting up hosts for Amazon EC2 and Amazon ECS, using Deep Learning AMI (DLAMI) is recommended. An Amazon EKS optimized GPU AMI is recommended to use in Amazon EKS.

You also need the ML job scripts ready with a command to invoke them. In the following steps, we use a single file, train.py, as the ML job script. The command to invoke it is torchrun —nproc_per_node=2 —nnodes=1 train.py.

Extend the Neuron DLC

Extend the Neuron DLC to include your ML job scripts and other necessary logic. As the simplest example, you can have the following Dockerfile:

FROM public.ecr.aws/neuron/pytorch-training-neuronx:2.1.2-neuronx-py310-sdk2.18.2-ubuntu20.04

COPY train.py /train.py

This Dockerfile uses the Neuron PyTorch training container as a base and adds your training script, train.py, to the container.

Build and push to Amazon ECR

Complete the following steps:

  1. Build your Docker image:
    docker build -t <your-repo-name>:<your-image-tag> 

  2. Authenticate your Docker client to your ECR registry:
    aws ecr get-login-password --region <your-region> | docker login --username AWS --password-stdin <your-aws-account-id>.dkr.ecr.<your-region>.amazonaws.com

  3. Tag your image to match your repository:
    docker tag <your-repo-name>:<your-image-tag> <your-aws-account-id>.dkr.ecr.<your-region>.amazonaws.com/<your-repo-name>:<your-image-tag>

  4. Push this image to Amazon ECR:
    docker push <your-aws-account-id>.dkr.ecr.<your-region>.amazonaws.com/<your-repo-name>:<your-image-tag>

You can now run the extended Neuron DLC in different AWS services.

Amazon EKS configuration

For Amazon EKS, create a simple pod YAML file to use the extended Neuron DLC. For example:

apiVersion: v1
kind: Pod
metadata:
  name: training-pod
spec:
  containers:
  - name: training-container
    image: <your-aws-account-id>.dkr.ecr.<your-region>.amazonaws.com/<your-repo-name>:<your-image-tag>
    command: ["torchrun"]
    args: ["--nproc_per_node=2", "--nnodes=1", "/train.py"]
    resources:
      limits:
        aws.amazon.com/neuron: 1
      requests:
        aws.amazon.com/neuron: 1

Use kubectl apply -f <pod-file-name>.yaml to deploy this pod in your Kubernetes cluster.

Amazon ECS configuration

For Amazon ECS, create a task definition that references your custom Docker image. The following is an example JSON task definition:

{
    "family": "training-task",
    "requiresCompatibilities": ["EC2"],
    "containerDefinitions": [
        {
            "command": [
                "torchrun --nproc_per_node=2 --nnodes=1 /train.py"
            ],
            "linuxParameters": {
                "devices": [
                    {
                        "containerPath": "/dev/neuron0",
                        "hostPath": "/dev/neuron0",
                        "permissions": [
                            "read",
                            "write"
                        ]
                    }
                ],
                "capabilities": {
                    "add": [
                        "IPC_LOCK"
                    ]
                }
            },
            "cpu": 0,
            "memoryReservation": 1000,
            "image": "<your-aws-account-id>.dkr.ecr.<your-region>.amazonaws.com/<your-repo-name>:<your-image-tag>",
            "essential": true,
            "name": "training-container",
        }
    ]
}

This definition sets up a task with the necessary configuration to run your containerized application in Amazon ECS.

Amazon EC2 configuration

For Amazon EC2, you can directly run your Docker container:

docker run --name training-job --device=/dev/neuron0 <your-aws-account-id>.dkr.ecr.<your-region>.amazonaws.com/<your-repo-name>:<your-image-tag> "torchrun --nproc_per_node=2 --nnodes=1 /train.py"

SageMaker configuration

For SageMaker, create a model with your container and specify the training job command in the SageMaker SDK:

import sagemaker
from sagemaker.pytorch import PyTorch
role = sagemaker.get_execution_role()
pytorch_estimator = PyTorch(entry_point='train.py',
                            role=role,
                            image_uri='<your-aws-account-id>.dkr.ecr.<your-region>.amazonaws.com/<your-repo-name>:<your-image-tag>',
                            instance_count=1,
                            instance_type='ml.trn1.2xlarge',
                            framework_version='2.1.2',
                            py_version='py3',
                            hyperparameters={"nproc-per-node": 2, "nnodes": 1},
                            script_mode=True)
pytorch_estimator.fit()

DLAMI walkthrough

This section walks through launching an Inf1, Inf2, or Trn1 instance using the Multi-Framework DLAMI in the Quick Start AMI list and getting the latest DLAMI that supports the newest Neuron SDK release easily.

The Neuron DLAMI is a multi-framework DLAMI that supports multiple Neuron frameworks and libraries. Each DLAMI is pre-installed with Neuron drivers and support all Neuron instance types. Each virtual environment that corresponds to a specific Neuron framework or library comes pre-installed with all the Neuron libraries, including the Neuron compiler and Neuron runtime needed for you to get started.

This release introduces a new Multi-Framework DLAMI for Ubuntu 22 that you can use to quickly get started with the latest Neuron SDK on multiple frameworks that Neuron supports as well as Systems Manager (SSM) parameter support for DLAMIs to automate the retrieval of the latest DLAMI ID in cloud automation flows.

For instructions on getting started with the multi-framework DLAMI through the console, refer to Get Started with Neuron on Ubuntu 22 with Neuron Multi-Framework DLAMI. If you want to use the Neuron DLAMI in your cloud automation flows, Neuron also supports SSM parameters to retrieve the latest DLAMI ID.

Launch the instance using Neuron DLAMI

Complete the following steps:

  1. On the Amazon EC2 console, choose your desired AWS Region and choose Launch Instance.
  2. On the Quick Start tab, choose Ubuntu.
  3. For Amazon Machine Image, choose Deep Learning AMI Neuron (Ubuntu 22.04).
  4. Specify your desired Neuron instance.
  5. Configure disk size and other criteria.
  6. Launch the instance.

Activate the virtual environment

Activate your desired virtual environment, as shown in the following screenshot.

After you have activated the virtual environment, you can try out one of the tutorials listed in the corresponding framework or library training and inference section.

Use SSM parameters to find specific Neuron DLAMIs

Neuron DLAMIs support SSM parameters to quickly find Neuron DLAMI IDs. As of this writing, we only support finding the latest DLAMI ID that corresponds to the latest Neuron SDK release with SSM parameter support. In the future releases, we will add support for finding the DLAMI ID using SSM parameters for a specific Neuron release.

You can find the DLAMI that supports the latest Neuron SDK by using the get-parameter command:

aws ssm get-parameter 
--region us-east-1 
--name <dlami-ssm-parameter-prefix>/latest/image_id 
--query "Parameter.Value" 
--output text

For example, to find the latest DLAMI ID for the Multi-Framework DLAMI (Ubuntu 22), you can use the following code:

aws ssm get-parameter 
--region us-east-1 
--name /aws/service/neuron/dlami/multi-framework/ubuntu-22.04/latest/image_id 
--query "Parameter.Value" 
--output text

You can find all available parameters supported in Neuron DLAMIs using the AWS CLI:

aws ssm get-parameters-by-path 
--region us-east-1 
--path /aws/service/neuron 
--recursive

You can also view the SSM parameters supported in Neuron through Parameter Store by selecting the neuron service.

Use SSM parameters to launch an instance directly using the AWS CLI

You can use the AWS CLI to find the latest DLAMI ID and launch the instance simultaneously. The following code snippet shows an example of launching an Inf2 instance using a multi-framework DLAMI:

aws ec2 run-instances 
--region us-east-1 
--image-id resolve:ssm:/aws/service/neuron/dlami/pytorch-1.13/amazon-linux-2/latest/image_id 
--count 1 
--instance-type inf2.48xlarge 
--key-name <my-key-pair> 
--security-groups <my-security-group>

Use SSM parameters in EC2 launch templates

You can also use SSM parameters directly in launch templates. You can update your Auto Scaling groups to use new AMI IDs without needing to create new launch templates or new versions of launch templates each time an AMI ID changes.

Clean up

When you’re done running the resources that you deployed as part of this post, make sure to delete or stop them from running and accruing charges:

  1. Stop your EC2 instance.
  2. Delete your ECS cluster.
  3. Delete your EKS cluster.
  4. Clean up your SageMaker resources.

Conclusion

In this post, we introduced several enhancements incorporated into Neuron 2.18 that improve the user experience and time-to-value for customers working with AWS Inferentia and Trainium instances. Neuron DLAMIs and DLCs with the latest Neuron SDK on the same day as the release means you can immediately benefit from the latest performance optimizations, features, and documentation for installing and upgrading Neuron DLAMIs and DLCs.

Additionally, you can now use the Multi-Framework DLAMI, which simplifies the setup process by providing isolated virtual environments for multiple popular ML frameworks. Finally, we discussed Parameter Store support for Neuron DLAMIs that streamlines the process of launching new instances with the most up-to-date Neuron SDK, enabling you to automate your deployment workflows with ease.

Neuron DLCs are available both private and public ECR repositories to help you deploy Neuron in your preferred AWS service. Refer to the following resources to get started:


About the Authors

Niithiyn Vijeaswaran is a Solutions Architect at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics. Niithiyn works closely with the Generative AI GTM team to enable AWS customers on multiple fronts and accelerate their adoption of generative AI. He’s an avid fan of the Dallas Mavericks and enjoys collecting sneakers.

Armando Diaz is a Solutions Architect at AWS. He focuses on generative AI, AI/ML, and data analytics. At AWS, Armando helps customers integrate cutting-edge generative AI capabilities into their systems, fostering innovation and competitive advantage. When he’s not at work, he enjoys spending time with his wife and family, hiking, and traveling the world.

Sebastian Bustillo is an Enterprise Solutions Architect at AWS. He focuses on AI/ML technologies and has a profound passion for generative AI and compute accelerators. At AWS, he helps customers unlock business value through generative AI, assisting with the overall process from ideation to production. When he’s not at work, he enjoys brewing a perfect cup of specialty coffee and exploring the outdoors with his wife.

Ziwen Ning is a software development engineer at AWS. He currently focuses on enhancing the AI/ML experience through the integration of AWS Neuron with containerized environments and Kubernetes. In his free time, he enjoys challenging himself with badminton, swimming and other various sports, and immersing himself in music.

Anant Sharma is a software engineer at AWS Annapurna Labs specializing in DevOps. His primary focus revolves around building, automating and refining the process of delivering software to AWS Trainium and Inferentia customers. Beyond work, he’s passionate about gaming, exploring new destinations and following latest tech developments.

Roopnath Grandhi is a Sr. Product Manager at AWS. He leads large-scale model inference and developer experiences for AWS Trainium and Inferentia AI accelerators. With over 15 years of experience in architecting and building AI based products and platforms, he holds multiple patents and publications in AI and eCommerce.

Marco Punio is a Solutions Architect focused on generative AI strategy, applied AI solutions and conducting research to help customers hyperscale on AWS. He is a qualified technologist with a passion for machine learning, artificial intelligence, and mergers & acquisitions. Marco is based in Seattle, WA and enjoys writing, reading, exercising, and building applications in his free time.

Rohit Talluri is a Generative AI GTM Specialist (Tech BD) at Amazon Web Services (AWS). He is partnering with top generative AI model builders, strategic customers, key AI/ML partners, and AWS Service Teams to enable the next generation of artificial intelligence, machine learning, and accelerated computing on AWS. He was previously an Enterprise Solutions Architect, and the Global Solutions Lead for AWS Mergers & Acquisitions Advisory.

Read More

Sprinklr improves performance by 20% and reduces cost by 25% for machine learning inference on AWS Graviton3

Sprinklr improves performance by 20% and reduces cost by 25% for machine learning inference on AWS Graviton3

This is a guest post co-written with Ratnesh Jamidar and Vinayak Trivedi from Sprinklr.

Sprinklr’s mission is to unify silos, technology, and teams across large, complex companies. To achieve this, we provide four product suites, Sprinklr Service, Sprinklr Insights, Sprinklr Marketing, and Sprinklr Social, as well as several self-serve offerings.

Each of these products are infused with artificial intelligence (AI) capabilities to deliver exceptional customer experience. Sprinklr’s specialized AI models streamline data processing, gather valuable insights, and enable workflows and analytics at scale to drive better decision-making and productivity.

In this post, we describe the scale of our AI offerings, the challenges with diverse AI workloads, and how we optimized mixed AI workload inference performance with AWS Graviton3 based c7g instances and achieved 20% throughput improvement, 30% latency reduction, and reduced our cost by 25–30%.

Sprinklr’s AI scale and challenges with diverse AI workloads

Our purpose-built AI processes unstructured customer experience data from millions of sources, providing actionable insights and improving productivity for customer-facing teams to deliver exceptional experiences at scale. To understand our scaling and cost challenges, let’s look at some representative numbers. Sprinklr’s platform uses thousands of servers that fine-tune and serve over 750 pre-built AI models across over 60 verticals, and run more than 10 billion predictions per day.

To deliver a tailored user experience across these verticals, we deploy patented AI models fine-tuned for specific business applications and use nine layers of machine learning (ML) to extract meaning from data across formats: automatic speech recognition, natural language processing, computer vision, network graph analysis, anomaly detection, trends, predictive analysis, natural language generation, and similarity engine.

The diverse and rich database of models brings unique challenges for choosing the most efficient deployment infrastructure that gives the best latency and performance.

For example, for mixed AI workloads, the AI inference is part of the search engine service with real-time latency requirements. In these cases, the model sizes are smaller, which means the communication overhead with GPUs or ML accelerator instances outweighs their compute performance benefits. Also, inference requests are infrequent, which means accelerators are more often idle and not cost-effective. Therefore, the production instances were not cost-effective for these mixed AI workloads, causing us to look for new instances that offer the right balance of scale and cost-effectiveness.

Cost-effective ML inference using AWS Graviton3

Graviton3 processors are optimized for ML workloads, including support for bfloat16, Scalable Vector Extension (SVE), twice the Single Instruction Multiple Data (SIMD) bandwidth, and 50% more memory bandwidth compared to AWS Graviton2 processors, making them an ideal choice for our mixed workloads. Our goal is to use the latest technologies for efficiency and cost savings, so when AWS released Graviton3-based Amazon Elastic Compute Cloud (Amazon EC2) instances, we were excited to try them in our mixed workloads, especially given our previous Graviton experience. For over 3 years, we have run our search infrastructure on Graviton2-based EC2 instances and our real-time and batched inference workloads on AWS Inferentia ML-accelerated instances, and in both cases we improved latency by 30% and achieved up to 40% price-performance benefits over comparable x86 instances.

To migrate our mixed AI workloads from x86-based instances to Graviton3-based c7g instances, we took a two-step approach. First, we had to experiment and benchmark in order to determine that Graviton3 was indeed the right solution for us. After that was confirmed, we had to perform the actual migration.

First, we started by benchmarking our workloads using the readily available Graviton Deep Learning Containers (DLCs) in a standalone environment. As early adopters of Graviton for ML workloads, it was initially challenging to identify the right software versions and the runtime tunings. During this journey, we collaborated with our AWS technical account manager and the Graviton software engineering teams. We collaborated closely and frequently for the optimized software packages and detailed instructions on how to tune them to achieve optimum performance. In our test environment, we observed 20% throughput improvement and 30% latency reduction across multiple natural language processing models.

After we had validated that Graviton3 met our needs, we integrated the optimizations into our production software stack. The AWS account team assisted us promptly, helping us ramp up quickly to meet our deployment timelines. Overall, migration to Graviton3-based instances was smooth, and it took less than 2 months to achieve the performance improvements in our production workloads.

Results

By migrating our mixed inference/search workloads to Graviton3-based c7g instances from the comparable x86-based instances, we achieved the following:

  • Higher performance – We realized 20% throughput improvement and 30% latency reduction.
  • Reduced cost – We achieved 25–30% cost savings.
  • Improved customer experience – By reducing the latency and increasing throughput, we significantly improved the performance of our products and services, providing the best user experience for our customers.
  • Sustainable AI – Because we saw a higher throughput on the same number of instances, we were able to lower our overall carbon footprint, and we made our products appealing to environmentally conscious customers.
  • Better software quality and maintenance – The AWS engineering team upstreamed all the software optimizations into PyTorch and TensorFlow open source repositories. As a result, our current software upgrade process on Graviton3-based instances is seamless. For example, PyTorch (v2.0+), TensorFlow (v2.9+), and Graviton DLCs come with Graviton3 optimizations and the user guides provide best practices for runtime tuning.

So far, we have migrated PyTorch and TensorFlow based Distil RoBerta-base, spaCy clustering, prophet, and xlmr models to Graviton3-based c7g instances. These models are serving intent detection, text clustering, creative insights, text classification, smart budget allocation, and image download services. These services power our unified customer experience (unified-cxm) platform and conversional AI to allow brands to build more self-serve use cases for their customers. Next, we are migrating ONNX and other larger models to Graviton3-based m7g general purpose and Graviton2-based g5g GPU instances to achieve similar performance improvements and cost savings.

Conclusion

Switching to Graviton3-based instances was quick in terms of engineering time, and resulted in 20% throughput improvement, 30% latency reduction, 25–30% cost savings, improved customer experience, and a lower carbon footprint for our workloads. Based on our experience, we will continue to seek new compute from AWS that will reduce our costs and improve the customer experience.

For further reading, refer to the following:


About the Authors

Sunita Nadampalli is a Software Development Manager at AWS. She leads Graviton software performance optimizations for Machine Learning and HPC workloads. She is passionate about open source software development and delivering high-performance and sustainable software solutions with Arm SoCs.

Gaurav Garg is a Sr. Technical Account Manager at AWS with 15 years of experience. He has a strong operations background. In his role he works with Independent Software Vendors to build scalable and cost-effective solutions with AWS that meet the business requirements. He is passionate about Security and Databases.

Ratnesh Jamidar is a AVP Engineering at Sprinklr with 8 years of experience. He is a seasoned Machine Learning professional with expertise in designing, implementing large-scale, distributed, and highly available AI products and infrastructure.

Vinayak Trivedi is an Associate Director of Engineering at Sprinklr with 4 years of experience in Backend & AI. He is proficient in Applied Machine Learning & Data Science, with a history of building large-scale, scalable and resilient systems.

Read More

How Wiz is empowering organizations to remediate security risks faster with Amazon Bedrock

How Wiz is empowering organizations to remediate security risks faster with Amazon Bedrock

Wiz is a cloud security platform that enables organizations to secure everything they build and run in the cloud by rapidly identifying and removing critical risks. Over 40% of the Fortune 100 trust Wiz’s purpose-built cloud security platform to gain full-stack visibility, accurate risk prioritization, and enhanced business agility. Organizations can connect Wiz in minutes to scan the entire cloud environment without agents and identify the issues representing real risk. Security and cloud teams can then proactively remove risks and harden cloud environments with remediation workflows.

Artificial intelligence (AI) has revolutionized the way organizations function, paving the way for automation and improved efficiency in various tasks that were traditionally manual. One of these use cases is using AI in security organizations to improve security processes and increase your overall security posture. One of the major challenges in cloud security is discerning the best ways to resolve an identified issue in the most effective way to allow you to respond quickly.

Wiz has harnessed the power of generative AI to help organizations remove risks in their cloud environment faster. With Wiz’s new integration with Amazon Bedrock, Wiz customers can now generate guided remediation steps backed by foundation models (FMs) running on Amazon Bedrock to reduce their mean time to remediation (MTTR). Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

“The Wiz and Amazon Bedrock integration enables organizations to further enhance security and improve remediation time by leveraging a choice of powerful foundation models to generate GenAI-powered remediation steps.”

– Vivek Singh, Senior Manager, Product Management-Tech, AWS AI

In this post, we share how Wiz uses Amazon Bedrock to generate remediation guidance for customers that allow them to quickly address security risks in their cloud environment.

Detecting security risks in the cloud with the Wiz Security Graph

Wiz scans cloud environments without agents and runs deep risk assessment across network exposures, vulnerabilities, misconfigurations, identities, data, secrets, and malware. Wiz stores the entire technology stack as well as any risks detected on the Wiz Security Graph, which is backed by Amazon Neptune. Neptune enables Wiz to quickly traverse the graph and understand interconnected risk factors in seconds and how they create an attack path. The Security Graph allows Wiz to surface these critical attack paths in the form of Wiz Issues. For example, a Wiz Issue can alert of a publicly exposed Amazon Elastic Compute Cloud (Amazon EC2) instance that is vulnerable, has admin permissions, and can access sensitive data. The following graph illustrates this attack path.

Attack path

With its Security Graph, Wiz provides customers with pinpoint-accurate alerts on security risks in their environment, reduces the noise faced with traditional security tools, and enables organizations to focus on the most critical risks in their environment.

Remediating cloud risks with guided remediation provided by Amazon Bedrock

To help customers remediate security risks even faster, Wiz uses Amazon Bedrock to analyze metadata from Wiz Issues to generate effective remediation recommendations for customers. With Amazon Bedrock, Wiz combines its deep risk context with cutting-edge FMs to offer enhanced remediation guidance to customers. Customers can scale their remediation workflow and minimize their MTTR by generating straightforward-to-use copy-paste remediation steps that can be directly implemented into the tool of their choice, such as the AWS Command Line Interface (AWS CLI), Terraform, AWS CloudFormation, Pulumi, Go, and Python, or directly using the cloud environment console. The following screenshot showcases an example of the remediation steps generated by Amazon Bedrock for a Wiz Issue.

An example of the remediation steps generated by Amazon Bedrock for a Wiz Issue

Wiz sends a prompt with all the relevant context around a security risk to Amazon Bedrock with instructions on how to present the results based on the target platform. Amazon Bedrock native APIs allow Wiz to select the best model for the use case to answer the request, so when it’s received, it’s parsed and presented in a straightforward manner in the Wiz portal.

To fully operationalize this functionality in production, the Wiz backend has a service running on Amazon Elastic Kubernetes Service (Amazon EKS) that receives the customer request to generate remediation steps, collects the context of the alert the customer wants to remediate, and runs personally identifiable information (PII) redaction on the data to remove any sensitive data. Then, another service running on Amazon EKS pulls the resulting data and sends it to Amazon Bedrock. Such a flow can run in each needed AWS Region supported by Amazon Bedrock to address any compliance needs of their customers. In addition, to secure the usage of Amazon Bedrock with least privilege, Wiz uses AWS permission sets and follows AWS best practices. The Wiz service sending the prompt to Amazon Bedrock has a dedicated AWS Identity and Access Management (IAM) role that allows it to communicate only with the specific Amazon Bedrock service and to only generate those requests. Amazon Bedrock also has restrictions to block any data coming from a non-authorized service. Using these AWS services and the Wiz Security Graph, Wiz helps its customers adopt the most advanced LLMs to speed up the process of addressing complex security issues in a straightforward and secure manner. The following diagram illustrates this architecture.

System architecture

Wiz customers are already experiencing the advantages of our new AI-driven remediation:

“The faster we can remediate security risks, the more we can focus on driving broader strategic initiatives. With Wiz’s AI-powered remediation, we can quickly generate remediation steps that our security team and developers can simply copy-paste to remediate the issue.”

– Rohit Kohli, Deputy CISO, Genpact

By using Amazon Bedrock for generating AI-powered remediation steps, we learnt that security teams are able to minimize the time spent investigating complex risks by 40%, allowing them to focus on mitigating more risks. Furthermore, they are able to empower developers to remediate risks by removing the need for security expertise and providing them with exact steps to take. Not only does Wiz use AI to enhance security processes for customers, but it also makes it effortless for customers to securely adopt AI in their organization with its AI Security Posture Management capabilities, empowering them to protect their AI models while increasing innovation.

Conclusion

Using generative AI for generating enhanced remediation steps marks a significant advancement in the realm of problem-solving and automation. By harnessing the power of AI models powered by Amazon Bedrock, Wiz users can quickly remediate risks with straightforward remediation guidance, reducing manual efforts and improving MTTR. Learn more about Wiz and check out a live demo.


About the Authors

Shaked RotleviShaked Rotlevi is a Technical Product Marketing Manager at Wiz focusing on AI security. Prior to Wiz she was a Solutions Architect at AWS working with public sector customers as well as a Technical Program Manager for a security service team. In her spare time she enjoys playing beach volleyball and hiking.

Itay ArbelItay Arbel is a Lead Product Manager at Wiz. Before joining Wiz, Itay was a product manager at Microsoft and did an MBA in Oxford University, majoring in high tech and emerging technologies. Itay is Wiz’s product lead for the effort of helping organizations securing their AI pipeline and usage of this new emerging technology.

Eitan SelaEitan Sela is a Generative AI and Machine Learning Specialist Solutions Architect at AWS. He works with AWS customers to provide guidance and technical assistance, helping them build and operate Generative AI and Machine Learning solutions on AWS. In his spare time, Eitan enjoys jogging and reading the latest machine learning articles.

Adi AvniAdi Avni is a Senior Solutions Architect at AWS based in Israel. Adi works with AWS ISV customers, helping them to build innovative, scalable and cost-effective solutions on AWS. In his spare time, he enjoys sports and traveling with family and friends.

Read More

Code generation using Code Llama 70B and Mixtral 8x7B on Amazon SageMaker

Code generation using Code Llama 70B and Mixtral 8x7B on Amazon SageMaker

In the ever-evolving landscape of machine learning and artificial intelligence (AI), large language models (LLMs) have emerged as powerful tools for a wide range of natural language processing (NLP) tasks, including code generation. Among these cutting-edge models, Code Llama 70B stands out as a true heavyweight, boasting an impressive 70 billion parameters. Developed by Meta and now available on Amazon SageMaker, this state-of-the-art LLM promises to revolutionize the way developers and data scientists approach coding tasks.

What is Code Llama 70B and Mixtral 8x7B?

Code Llama 70B is a variant of the Code Llama foundation model (FM), a fine-tuned version of Meta’s renowned Llama 2 model. This massive language model is specifically designed for code generation and understanding, capable of generating code from natural language prompts or existing code snippets. With its 70 billion parameters, Code Llama 70B offers unparalleled performance and versatility, making it a game-changer in the world of AI-assisted coding.

Mixtral 8x7B is a state-of-the-art sparse mixture of experts (MoE) foundation model released by Mistral AI. It supports multiple use cases such as text summarization, classification, text generation, and code generation. It is an 8x model, which means it contains eight distinct groups of parameters. The model has about 45 billion total parameters and supports a context length of 32,000 tokens. MoE is a type of neural network architecture that consists of multiple experts” where each expert is a neural network. In the context of transformer models, MoE replaces some feed-forward layers with sparse MoE layers. These layers have a certain number of experts, and a router network selects which experts process each token at each layer. MoE models enable more compute-efficient and faster inference compared to dense models.

Key features and capabilities of Code Llama 70B and Mixtral 8x7B include:

  1. Code generation: These LLMs excel at generating high-quality code across a wide range of programming languages, including Python, Java, C++, and more. They can translate natural language instructions into functional code, streamlining the development process and accelerating project timelines.
  2. Code infilling: In addition to generating new code, they can seamlessly infill missing sections of existing code by providing the prefix and suffix. This feature is particularly valuable for enhancing productivity and reducing the time spent on repetitive coding tasks.
  3. Natural language interaction: The instruct variants of Code Llama 70B and Mixtral 8x7B support natural language interaction, allowing developers to engage in conversational exchanges to develop code-based solutions. This intuitive interface fosters collaboration and enhances the overall coding experience.
  4. Long context support: With the ability to handle context lengths of up to 48 thousand tokens, Code Llama 70B can maintain coherence and consistency over extended code segments or conversations, ensuring relevant and accurate responses. Mixtral 8x7B has a context window of 32 thousand tokens.
  5. Multi-language support: While both of these models excel at generating code, their capabilities extend beyond programming languages. They can also assist with natural language tasks, such as text generation, summarization, and question answering, making them versatile tools for various applications.

Harnessing the power of Code Llama 70B and Mistral models on SageMaker

Amazon SageMaker, a fully managed machine learning service, provides a seamless integration with Code Llama 70B, enabling developers and data scientists to use its capabilities with just a few clicks. Here’s how you can get started:

  1. One-click deployment: Code Llama 70B and Mixtral 8x7B are available in Amazon SageMaker JumpStart, a hub that provides access to pre-trained models and solutions. With a few clicks, you can deploy them and create a private inference endpoint for your coding tasks.
  2. Scalable infrastructure: The SageMaker scalable infrastructure ensures that foundation models can handle even the most demanding workloads, allowing you to generate code efficiently and without delays.
  3. Integrated development environment: SageMaker provides a seamless integrated development environment (IDE) that you can use to interact with these models directly from your coding environment. This integration streamlines the workflow and enhances productivity.
  4. Customization and fine-tuning: While Code Llama 70B and Mixtral 8x7B are powerful out-of-the-box models, you can use SageMaker to fine-tune and customize a model to suit your specific needs, further enhancing its performance and accuracy.
  5. Security and compliance: SageMaker JumpStart employs multiple layers of security, including data encryption, network isolation, VPC deployment, and customizable inference, to ensure the privacy and confidentiality of your data when working with LLMs

Solution overview

The following figure showcases how code generation can be done using the Llama and Mistral AI Models on SageMaker presented in this blog post.

You first deploy a SageMaker endpoint using an LLM from SageMaker JumpStart. For the examples presented in this article, you either deploy a Code Llama 70 B or a Mixtral 8x7B endpoint. After the endpoint has been deployed, you can use it to generate code with the prompts provided in this article and the associated notebook, or with your own prompts. After the code has been generated with the endpoint, you can use a notebook to test the code and its functionality.

Prerequisites

In this section, you sign up for an AWS account and create an AWS Identity and Access Management (IAM) admin user.

If you’re new to SageMaker, we recommend that you read What is Amazon SageMaker?.

Use the following hyperlinks to finish setting up the prerequisites for an AWS account and Sagemaker:

  1. Create an AWS Account: This walks you through setting up an AWS account
  2. When you create an AWS account, you get a single sign-in identity that has complete access to all of the AWS services and resources in the account. This identity is called the AWS account root user.
  3. Signing in to the AWS Management Console using the email address and password that you used to create the account gives you complete access to all of the AWS resources in your account. We strongly recommend that you not use the root user for everyday tasks, even the administrative ones.
  4. Adhere to the security best practices in IAM, and Create an Administrative User and Group. Then securely lock away the root user credentials and use them to perform only a few account and service management tasks.
  5. In the console, go to the SageMaker console andopen the left navigation pane.
    1. Under Admin configurations, choose Domains.
    2. Choose Create domain.
    3. Choose Set up for single user (Quick setup). Your domain and user profile are created automatically.
  6. Follow the steps in Custom setup to Amazon SageMaker to set up SageMaker for your organization.

With the prerequisites complete, you’re ready to continue.

Code generation scenarios

The Mixtral 8x7B and Code Llama 70B models requires an ml.g5.48xlarge instance. SageMaker JumpStart provides a simplified way to access and deploy over 100 different open source and third-party foundation models. In order to deploy an endpoint using SageMaker JumpStart, you might need to request a service quota increase to access an ml.g5.48xlarge instance for endpoint use. You can request service quota increases through the AWS console, AWS Command Line Interface (AWS CLI), or API to allow access to those additional resources.

Code Llama use cases with SageMaker

While Code Llama excels at generating simple functions and scripts, its capabilities extend far beyond that. The models can generate complex code for advanced applications, such as building neural networks for machine learning tasks. Let’s explore an example of using Code Llama to create a neural network on SageMaker. Let us start with deploying the Code Llama Model through SageMaker JumpStart.

  1. Launch SageMaker JumpStart
    Sign in to the console, navigate to SageMaker, and launch the SageMaker domain to open SageMaker Studio. Within SageMaker Studio, select JumpStart in the left-hand navigation menu.
  2. Search for Code Llama 70B
    In the JumpStart model hub, search for Code Llama 70B in the search bar. You should see the Code Llama 70B model listed under the Models category.
  3. Deploy the Model
    Select the Code Llama 70B model, and then choose Deploy. Enter an endpoint name (or keep the default value) and select the target instance type (for example, ml.g5.48xlarge). Choose Deploy to start the deployment process. You can leave the rest of the options as default.

Additional details on deployment can be found in Code Llama 70B is now available in Amazon SageMaker JumpStart

  1. Create an inference endpoint
    After the deployment is complete, SageMaker will provide you with an inference endpoint URL. Copy this URL to use later.
  2. Set set up your development environment
    You can interact with the deployed Code Llama 70B model using Python and the AWS SDK for Python (Boto3). First, make sure you have the required dependencies installed: pip install boto3

Note: This blog post section contains code that was generated with the assistance of Code Llama70B powered by Amazon Sagemaker.

Generating a transformer model for natural language processing

Let us walk through a code generation example with Code Llama 70B where you will generate a transformer model in python using Amazon SageMaker SDK.

Prompt:

<s>[INST]
<<SYS>>You are an expert code assistant that can teach a junior developer how to code. Your language of choice is Python. Don't explain the code, just generate the code block itself. Always use Amazon SageMaker SDK for python code generation. Add test case to test the code<</SYS>>

Generate a Python code that defines and trains a Transformer model for text classification on movie dataset. The python code should use Amazon SageMaker's TensorFlow estimator and be ready for deployment on SageMaker.
[/INST]

Response:

Code Llama generates a Python script for training a Transformer model on the sample dataset using TensorFlow and Amazon SageMaker.

Code example:
Create a new Python script (for example, code_llama_inference.py) and add the following code. Replace <YOUR_ENDPOINT_NAME> with the actual inference endpoint name provided by SageMaker JumpStart:

import boto3
import json

# Set up the SageMaker client
session = boto3.Session()
sagemaker_client = session.client("sagemaker-runtime")

# Set the inference endpoint URL
endpoint_name = "<YOUR_ENDPOINT_NAME>"

def query_endpoint(payload):
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='application/json',
        Body=json.dumps(payload).encode('utf-8'),
    )
    response = response["Body"].read().decode("utf8")
    response = json.loads(response)
    return response

def print_completion(prompt: str, response: str) -> None:
    bold, unbold = '33[1m', '33[0m'
    print(f"{bold}> Input{unbold}n{prompt}{bold}n> Output{unbold}n{response[0]['generated_text']}n")

# Define the input prompt

prompt = """

<s>[INST]
<<SYS>>You are an expert code assistant that can teach a junior developer how to code. Your language of choice is Python. Don't explain the code, just generate the code block itself. Always use Amazon SageMaker SDK for python code generation. Add test case to test the code<</SYS>>
 
Generate a Python code that defines and trains a Transformer model for text classification on movie dataset. The python code should use Amazon SageMaker's TensorFlow estimator and be ready for deployment on SageMaker.
[/INST]
"""



# Send the request to the endpoint and decode the response

payload = {"inputs": prompt,
"parameters": {"max_new_tokens": 2000, "temperature": 0.2, "top_p": 0.9},
}


response = query_endpoint(payload)

print_completion(prompt, response)

Save the script and run it:

python code_llama_inference.py

The script will send the provided prompt to the Code Llama 70B model deployed on SageMaker, and the model’s response will be printed to the output.

Example output:

Input

<s>[INST]
<<SYS>>
You are an expert AI assistant specializing in generating Python code for natural language processing (NLP) tasks on Amazon SageMaker. Your code should follow best practices, be well-documented, and include examples of usage.
<</SYS>>

Generate a Python script that defines and trains a Transformer model for text classification on movie dataset. The script should use Amazon SageMaker's TensorFlow estimator and be ready for deployment on SageMaker.
[/INST]


> Output

[RESP]
import os
import json
import sagemaker
import boto3
import numpy as np
import pandas as pd
from sagemaker.tensorflow import TensorFlow
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_20newsgroups

# Set up SageMaker session
sagemaker_session = sagemaker.Session()
region = boto3.Session().region_name
bucket = sagemaker_session.default_bucket()
prefix = "sagemaker/DEMO-xSum"

# Download dataset
train = fetch_20newsgroups(subset="train")
test = fetch_20newsgroups(subset="test")

# Prepare dataset
train_data = pd.DataFrame(
{"text": train.data, "label": train.target.astype(np.int32)}
)
test_data = pd.DataFrame({"text": test.data, "label": test.target.astype(np.int32)})

# Split dataset
train_data, validation_data = train_test_split(train_data, test_size=0.2)

# Upload dataset to S3
train_input = sagemaker_session.upload_data(
path=train_data.to_csv(header=False, index=False),
bucket=bucket,
key_prefix=os.path.join(prefix, "train"),
)
validation_input = sagemaker_session.upload_data(
path=validation_data.to_csv(header=False, index=False),
bucket=bucket,
key_prefix=os.path.join(prefix, "validation"),
)
test_input = sagemaker_session.upload_data(
path=test_data.to_csv(header=False, index=False),
bucket=bucket,
key_prefix=os.path.join(prefix, "test"),
)

# Define hyperparameters
hyperparameters = {
"epochs": 1,
"batch_size": 64,
"max_seq_length": 256,
"model_dir": "/opt/ml/model",
"train_dir": "/opt/ml/input/data/train",
"validation_dir": "/opt/ml/input/data/validation",
"test_dir": "/opt/ml/input/data/test",
"train_filename": "train.csv",
"validation_filename": "validation.csv",
"test_filename": "test.csv",
"freeze_bert_layer": True,
"learning_rate": 0.0001,
"epsilon": 1e-08,
"bert_trainable_layers": None,
"bert_layer": -2,
"lambda_regularizer": 0.01,
"gradient_accumulation_steps": 2,
"steps_per_epoch": 100,
"validation_steps": 10,
"test_steps": 10,
"output_dir": "/opt/ml/output",
}

# Define estimator
estimator = TensorFlow(
entry_point="train.py",
source_dir="code",
role=sagemaker.get_execution_role(),
instance_count=1,
instance_type="ml.p3.2xlarge",
framework_version="2.4.1",
py_version="py37",
hyperparameters=hyperparameters,
)

# Train model
estimator.fit(
{
"train": train_input,
"validation": validation_input,
"test": test_input,
},
wait=True,
logs=True,
)

# Deploy model
predictor = estimator.deploy(
initial_instance_count=1, instance_type="ml.p3.2xlarge", wait=True
)

# Predict
predictor.predict(json.dumps({"instances": [test_data.iloc[0]["text"]]}))
<</RESP>>

You can modify the prompt variable to request different code generation tasks or engage in natural language interactions with the model.

This example demonstrates how to deploy and interact with the Code Llama 70B model on SageMaker JumpStart using Python and the AWS SDK. Because the model might be prone to minor errors in generating the response output, make sure you run the code. Further, you can instruct the model to fact-check the output and refine the model response in order to fix any other unnecessary errors in the code. With this setup, you can leverage the powerful code generation capabilities of Code Llama 70B within your development workflows, streamlining the coding process and unlocking new levels of productivity. Lets take a look at some additional examples.

Additional examples and use cases

Let’s walk through some other complex code generation scenarios. In the following sample, we’re running the script to generate a Deep Q reinforcement learning (RL) agent for playing the CartPole-v0 environment.

Generating a reinforcement learning agent

The following prompt was tested on Code Llama 70B to generate a Deep Q RL agent adept in playing CartPole-v0 environment.

Prompt:

<s>[INST]
<<SYS>>
You are a skilled AI assistant capable of generating Python code for reinforcement learning tasks on Amazon SageMaker. Your code should be efficient, well-documented, and include examples of usage.
<</SYS>>

Could you please generate a Python script that implements a Deep Q-Network (DQN) agent for playing the CartPole-v1 environment? The script should use Amazon SageMaker's TensorFlow estimator and be ready for deployment on SageMaker.
[/INST]

Response: Code Llama generates a Python script for training a DQN agent on the CartPole-v1 environment using TensorFlow and Amazon SageMaker as showcased in our GitHub repository.

Generating a distributed training script

In this scenario, you will generate a sample python code for distributed machine learning training on Amazon SageMaker using Code Llama 70B.

Prompt:

<s>[INST]
<<SYS>>
You are an expert AI assistant skilled in generating Python code for distributed machine learning training on Amazon SageMaker. Your code should be optimized for performance, follow best practices, and include examples of usage.
<</SYS>>

Could you please generate a Python script that performs distributed training of a deep neural network for image classification on the ImageNet dataset? The script should use Amazon SageMaker's PyTorch estimator with distributed data parallelism and be ready for deployment on SageMaker.
[/INST]

Response: Code Llama generates a Python script for distributed training of a deep neural network on the ImageNet dataset using PyTorch and Amazon SageMaker. Additional details are available in our GitHub repository.

Mixtral 8x7B use cases with SageMaker

Compared to traditional LLMs, Mixtral 8x7B offers the advantage of faster decoding at the speed of a smaller, parameter-dense model despite containing more parameters. It also outperforms other open-access models on certain benchmarks and supports a longer context length.

  1. Launch SageMaker JumpStart
    Sign in to the console, navigate to SageMaker, and launch the SageMaker domain to open SageMaker Studio. Within SageMaker Studio, select JumpStart in the left-hand navigation menu.
  2. Search for Mixtral 8x7B Instruct
    In the JumpStart model hub, search for Mixtral 8x7B Instruct in the search bar. You should see the Mixtral 8x7B Instruct model listed under the Models category.
  3. Deploy the Model
    Select the Code Llama 70B model, and then choose Deploy. Enter an endpoint name (or keep the default value) and choose the target instance type (for example, ml.g5.48xlarge). Choose Deploy to start the deployment process. You can leave the rest of the options as default.

Additional details on deployment can be found in Mixtral-8x7B is now available in Amazon SageMaker JumpStart.

  1. Create an inference endpoint
    After the deployment is complete, SageMaker will provide you with an inference endpoint URL. Copy this URL to use later.

Generating a hyperparameter tuning script for SageMaker

Hyperparameters are external configuration variables that data scientists use to manage machine learning model training. Sometimes called model hyperparameters, the hyperparameters are manually set before training a model. They’re different from parameters, which are internal parameters automatically derived during the learning process and not set by data scientists. Hyperparameters directly control model structure, function, and performance.

When you build complex machine learning systems like deep learning neural networks, exploring all the possible combinations is impractical. Hyperparameter tuning can accelerate your productivity by trying many variations of a model. It looks for the best model automatically by focusing on the most promising combinations of hyperparameter values within the ranges that you specify. To get good results, you must choose the right ranges to explore.

SageMaker automatic model tuning (AMT) finds the best version of a model by running many training jobs on your dataset. To do this, AMT uses the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that creates a model that performs the best, as measured by a metric that you choose.

Note: This blog post section contains code that was generated with the assistance of Mixtral 8X7B model, powered by Amazon Sagemaker.

Prompt:

<s>[INST]
<<SYS>>
You are an AI assistant specializing in generating Python code for hyperparameter tuning of machine learning models on Amazon SageMaker. Your code should be efficient, well-documented, and include examples of usage.
<</SYS>>

Generate a Python script that performs hyperparameter tuning of an XGBoost model for regression on the Boston Housing dataset. The script should use Amazon SageMaker's XGBoost estimator and be ready for deployment on SageMaker. It should optimize for validation:accuracy, and optimize the hyperparameters alpha and eta.

[/INST]

Response:

Here's a Python script that performs hyperparameter tuning of an XGBoost model for regression on the Boston Housing dataset using Amazon SageMaker. This script uses the XGBoost estimator and optimizes for validation:accuracy while tuning the hyperparameters `alpha` and `eta`.

```python
import boto3
import numpy as np
import pandas as pd
from sagemaker.xgboost.estimator import XGBoost
from sagemaker.tuner import IntegerParameter, ContinuousParameter, HyperparameterTuner

# Set up SageMaker session and specify the IAM role
sagemaker_session = boto3.Session(region_name='us-west-2')
role = sagemaker_session.get_execution_role()

# Specify the SageMaker XGBoost image version
xgboost_image = 'xgboost:1.3-1'

# Define the hyperparameter tuning configuration
hyperparameter_ranges = {
    'alpha': ContinuousParameter(0.1, 1.0),
    'eta': ContinuousParameter(0.01, 0.3)
}

objective_metric_name = 'validation:accuracy'
metric_definitions = [
    {
        'Name': 'validation:accuracy',
        'Regex': 'validation-accuracy: ([0-9\.]+)'
    }
]

tuner = HyperparameterTuner(
    XGBoost(
        role=role,
        train_instance_count=1,
        train_instance_type='ml.m5.large',
        sagemaker_session=sagemaker_session,
        hyperparameters={'objective': 'reg:squarederror', 'num_boost_round': 100},
        base_job_name='xgboost-tuner-boston'
    ),
    objective_metric_name,
    hyperparameter_ranges,
    metric_definitions,
    max_jobs=10,
    max_parallel_jobs=3
)

# Load the Boston Housing dataset
boston_data = pd.
read_csv('boston_housing.csv')
train_data = boston_data.sample(frac=0.8, random_state=42)
test_data = boston_data.drop(train_data.index)

train_input_path = sagemaker_session.upload_data(path=train_data, key_prefix='xgboost-tuner/input/train')
test_input_path = sagemaker_session.upload_data(path=test_data, key_prefix='xgboost-tuner/input/test')

# Set up the training and validation datasets
train_channel = sagemaker_session.input_channel(name='train', path=train_input_path)
test_channel = sagemaker_session.input_channel(name='test', path=test_input_path)

# Define the SageMaker XGBoost estimator for training
xgboost_estimator = XGBoost(
    role=role,
    train_instance_count=1,
    train_instance_type='ml.m5.large',
    sagemaker_session=sagemaker_session,
    hyperparameters={'objective': 'reg:squarederror', 'num_boost_round': 100},
    base_job_name='xgboost-tuner-boston'
)

# Perform hyperparameter tuning
tuner.fit({'train': train_channel, 'test': test_channel}, include_cls_metadata=False)

# Get the best hyperparameters and create a new XGBoost estimator
best_training_job = tuner.best_training_job()
best_hyperparameters = tuner.best_hyperparameters
print(f'Best hyperparameters: {best_hyperparameters}')

new_xgboost_estimator = XGBoost(
    role=role,
    train_instance_count=1,
    train_instance_type='ml.m5.large',
    sagemaker_session=sagemaker_session,
    hyperparameters=best_hyperparameters,
    base_job_name='xgboost-tuner-boston'
)

# Train the model with the best hyperparameters
new_xgboost_estimator.fit({'train': train_channel, 'test': test_channel}, include_cls_metadata=False)

# Save the trained model for deployment
model_data = new_xgboost_estimator.create_model()
model_name = 'xgboost-tuner-boston-model'
sagemaker_session.create_model(model_name=model_name, model_data=model_data)

# Deploy the model
predictor = sagemaker_session.create_predictor(model_name,
                                                base_job_name='xgboost-tuner-boston',
                                                deploy_prebuilt_image=True,
                                                entry_point='inference.py',
                                                source_dir='xgboost-tuner',
                                                instance_type='ml.m5.large',
                                                initial_instance_count=1)
```

Replace `'boston_housing.csv'` with the path to your Boston Housing dataset. This script assumes that the dataset has columns named `'CRIM'`, `'ZN'`, `'INDUS'`, `'CHAS'`, `'NOX'`, `'RM'`, `'AGE'`, `'DIS'`, `'RAD'`, `'TAX'`, `'PTRATIO'`, `'B'`, `'LSTAT'`, and `'MEDV'`.

To run this script, you need to have the AWS CLI and Boto3 installed. Make sure to configure your AWS credentials using `aws configure`.

The script also requires a file named `inference.py` in the `xgboost-tuner` directory for deployment. Here's a simple example of `inference.py`:

```python
import json
import xgboost as xgb
import numpy as np

def model_fn(model_dir):
    model = xgb.Booster(model_file=os.path.join(model_dir, 'model.joblib'))
    return model

def input_fn(request_body, content_type):
    if content_type == 'text/csv':
        features = np.loadtxt(pd.compat.StringIO(request_body), delimiter=',')
        return {'features': features.reshape(1, -1)}
    raise ValueError("Request body must be csv-formatted.")

def predict_fn(input_data, model):
    features = input_data['features']
    prediction = model.predict(features)
    return {'prediction': float(prediction)}
```

Make sure to install the required libraries (`boto3`, `numpy`, `pandas`, `xgboost`, and `sagemaker`) before running the script.

Code Transformation: Java to Python

There are instances where users need to convert code written in one programing language to another. This is known as a cross-language transformation task, and foundation models can help automate the process.

Prompt:

<s>[INST]
<<SYS>>
You are a programmer specialized in Java and Python languages. Your code should be efficient, well-documented, and include examples of usage.
<</SYS>>
Convert the following Java code to its equivalent in Python:

<java>
import java.util.ArrayList;

public class Main {
public static void main(String[] args) {
ArrayList<String> cars = new ArrayList<String>();
cars.add("Volvo");
cars.add("BMW");
cars.add("Ford");
cars.add("Mazda");
for (int i = 0; i < cars.size(); i++) {
System.out.println(cars.get(i));
}
}
}
</java>
[/INST]

Response:

<python>
import ArrayList from java.util import ArrayList

def main():
# Create an ArrayList of strings
cars = ArrayList[str]()

# Add elements to the ArrayList
cars.add("Volvo")
cars.add("BMW")
cars.add("Ford")
cars.add("Mazda")

# Iterate over the ArrayList and print each element
for i in range(cars.size()):
print(cars.get(i))

if __name__ == "__main__":
main()
</python>

Note: This Python code uses the Java ArrayList class, which is part of the JPype library that allows Python programs to call Java code. 
The code above will only work in an environment that has JPype installed and configured.

For pure Python solution, here is the equivalent code:

<python>
def main():
# Create a list of strings
cars = ["Volvo", "BMW", "Ford", "Mazda"]

# Iterate over the list and print each element
for car in cars:
print(car)

if __name__ == "__main__":
main()
</python>

This Python code uses a built-in list data structure instead of the Java ArrayList class. The code above is more idiomatic and efficient in Python.

AWS CDK code for a three-tier web application

The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework for defining cloud infrastructure as code with modern programming languages and deploying it through AWS CloudFormation.

The three-tier architecture pattern provides a general framework to ensure decoupled and independently scalable application components can be separately developed, managed, and maintained (often by distinct teams). A three-tier architecture is the most popular implementation of a multi-tier architecture and consists of a single presentation tier, logic tier, and data tier:

  • Presentation tier: Component that the user directly interacts with (for example, webpages and mobile app UIs).
  • Logic tier: Code required to translate user actions to application functionality (for example, CRUD database operations and data processing).
  • Data tier: Storage media (for example, databases, object stores, caches, and file systems) that hold the data relevant to the application.

Prompt:

<s><<SYS>>
You are a programmer specialized in AWS, and programming in Python and the AWS Cloud Development Kit. Your code should be efficient, well-documented, and include examples of usage.
<</SYS>>[INST]
Create a CDK script in Python for configuring and deploying a three tier web application in AWS. The architecture should consist of these three tiers:

- Presentation tier: The front-end application is all static content (HTML, CSS, JavaScript and images) which are generated by React utilities like create-react-app. Amazon CloudFront hosts all these objects. The web application, when used, downloads all the resources to the browser and starts to run from there. The web application connects to the backend calling the APIs.

- Logic tier: Logic layer is built using Lambda functions fronted by API Gateway REST APIs. This architecture shows multiple exposed services. There are multiple different Lambda functions each handling a different aspect of the application. The Lambda functions are behind API Gateway and accessible using API URL paths.

- Data tier: DynamoDB is used for the data storage.
[/INST]

Response:

Sure, here's an example CDK script in Python that configures and deploys a three-tier web application in AWS:
```python
from aws_cdk import (
aws_apigateway as apigw,
aws_lambda as _lambda,
aws_lambda_event_sources as lambda_event_sources,
aws_logs as logs,
aws_dynamodb as ddb,
core
)

class ThreeTierWebAppStack(core.Stack):

def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
super().__init__(scope, id, **kwargs)

# Data tier
self.table = ddb.Table(
self, "DataTable",
partition_key=ddb.Attribute(name="id", type=ddb.AttributeType.STRING),
billing_mode=ddb.BillingMode.PAY_PER_REQUEST
)

# Logic tier
self.lambda_function = _lambda.Function(
self, "LambdaFunction",
runtime=_lambda.Runtime.PYTHON_3_8,
code=_lambda.Code.from_asset("lambda"),
handler="lambda_function.lambda_handler",
log_retention=logs.RetentionDays.ONE_DAY,
environment={
"TABLE_NAME": self.table.table_name
}
)

self.lambda_function.add_to_role_policy(
core.PolicyStatement(
actions=[
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:DeleteItem"
],
resources=[self.table.table_arn]
)
)

self.api = apigw.LambdaRestApi(
self, "Api",
handler=self.lambda_function,
proxy=False
)

# Presentation tier
self.distribution = aws_cloudfront.Distribution(
self, "Distribution",
default_root_object="index.html",
default_behavior_options=aws_cloudfront.BehaviorOptions(
origin=aws_cloudfront.Origin(
domain_name=self.api.url,
origin_path="/integration/v1"
),
viewer_protocol_policy=aws_cloudfront.ViewerProtocolPolicy.REDIRECT_TO_HTTPS
)
)
```
Here's a brief explanation of the code:

* The `ThreeTierWebAppStack` class extends the `core.Stack` class and is the main entry point for the CDK script.
* The `table` attribute is a DynamoDB table used for data storage.
* The `lambda_function` attribute is a Lambda function that handles the business logic of the application. It has an IAM role that allows it to read and write to the DynamoDB table.
* The `api` attribute is an API Gateway REST API that exposes the Lambda function as a web service.
* The `distribution` attribute is a CloudFront distribution that serves the static content of the application. It uses the API Gateway as the origin and redirects all HTTP traffic to HTTPS.

To use this script, you need to create a `lambda` directory in the same directory as the script and put your Lambda function code in a file named `lambda_function.py` in that directory. You also need to install the AWS CDK and its dependencies using `pip install aws-cdk.core aws-cdk.aws-lambda aws-cdk.aws-lambda_event_sources aws-cdk.aws-logs aws-cdk.aws-dynamodb aws-cdk.aws-apigateway aws-cdk.aws-cloudfront`.

Once you have done that, you can deploy the stack using `cdk deploy`. This will create the DynamoDB table, the Lambda function, the API Gateway, and the CloudFront distribution. The URL of the CloudFront distribution will be printed to the console after the deployment is complete. You can use this URL to access the web application.

Additional considerations

The following are some additional considerations when implementing these models:

  • Different models will produce different results, so you should conduct experiments with different foundation models and different prompts for your use case to achieve the desired results.
  • The analyses provided are not meant to replace human judgement. You should be mindful of potential hallucinations when working with generative AI, and use the analysis only as a tool to assist and speed up code generation.

Clean up

Delete the model endpoints deployed using Amazon SageMaker for Code Llama and Mistral to avoid incurring any additional costs in your account.

Shut down any SageMaker Notebook instances that were created for deploying or running the examples showcased in this blog post to avoid any notebook instance costs associated with the account.

Conclusion

The combination of exceptional capabilities from foundation models like Code Llama 70B and Mixtral 8x7B and the powerful machine learning platform of Sagemaker, presents a unique opportunity for developers and data scientists to revolutionize their coding workflows. The cutting-edge capabilities of FMs empower customers to generate high-quality code, infill missing sections, and engage in natural language interactions, all while using the scalability, security, and compliance of AWS.

The examples highlighted in this blog post demonstrate these models’ advanced capabilities in generating complex code for various machine learning tasks, such as natural language processing, reinforcement learning, distributed training, and hyperparameter tuning, all tailored for deployment on SageMaker. Developers and data scientists can now streamline their workflows, accelerate development cycles, and unlock new levels of productivity in the AWS Cloud.

Embrace the future of AI-assisted coding and unlock new levels of productivity with Code Llama 70B and Mixtral 8x7B on Amazon SageMaker. Start your journey today and experience the transformative power of this groundbreaking language model.

References

  1. Code Llama 70B is now available in Amazon SageMaker JumpStart
  2. Fine-tune Code Llama on Amazon SageMaker JumpStart
  3. Mixtral-8x7B is now available in Amazon SageMaker JumpStart

About the Authors

Shikhar Kwatra is an AI/ML Solutions Architect at Amazon Web Services based in California. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and supports the GSI partners in building strategic industry solutions on AWS. Shikhar enjoys playing guitar, composing music, and practicing mindfulness in his spare time.

Jose Navarro is an AI/ML Solutions Architect at AWS based in Spain. Jose helps AWS customers—from small startups to large enterprises—architect and take their end-to-end machine learning use cases to production. In his spare time, he loves to exercise, spend quality time with friends and family, and catch up on AI news and papers.

Farooq Sabir is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He holds PhD and MS degrees in Electrical Engineering from the University of Texas at Austin and an MS in Computer Science from Georgia Institute of Technology. He has over 15 years of work experience and also likes to teach and mentor college students. At AWS, he helps customers formulate and solve their business problems in data science, machine learning, computer vision, artificial intelligence, numerical optimization, and related domains. Based in Dallas, Texas, he and his family love to travel and go on long road trips.

Read More

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

Today, we are excited to announce that the Jina Embeddings v2 model, developed by Jina AI, is available for customers through Amazon SageMaker JumpStart to deploy with one click for running model inference. This state-of-the-art model supports an impressive 8,192-tokens context length. You can deploy this model with SageMaker JumpStart, a machine learning (ML) hub with foundation models, built-in algorithms, and pre-built ML solutions that you can deploy with just a few clicks.

Text embedding refers to the process of transforming text into numerical representations that reside in a high-dimensional vector space. Text embeddings have a broad range of applications in enterprise artificial intelligence (AI), including the following:

  • Multimodal search for ecommerce
  • Content personalization
  • Recommender systems
  • Data analytics

Jina Embeddings v2 is a state-of-the-art collection of text embedding models, trained by Berlin-based Jina AI, that boast high performance on several public benchmarks.

In this post, we walk through how to discover and deploy the jina-embeddings-v2 model as part of a Retrieval Augmented Generation (RAG)-based question answering system in SageMaker JumpStart. You can use this tutorial as a starting point for a variety of chatbot-based solutions for customer service, internal support, and question answering systems based on internal and private documents.

What is RAG?

RAG is the process of optimizing the output of a large language model (LLM) so it references an authoritative knowledge base outside of its training data sources before generating a response.

LLMs are trained on vast volumes of data and use billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization’s internal knowledge base, all without the need to retrain the model. It’s a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts.

What does Jina Embeddings v2 bring to RAG applications?

A RAG system uses a vector database to serve as a knowledge retriever. It must extract a query from a user’s prompt and send it to a vector database to reliably find as much semantic information as possible. The following diagram illustrates the architecture of a RAG application with Jina AI and Amazon SageMaker.

Jina Embeddings v2 is the preferred choice for experienced ML scientists for the following reasons:

  • State-of-the-art performance – We have shown on various text embedding benchmarks that Jina Embeddings v2 models excel on tasks such as classification, reranking, summarization, and retrieval. Some of the benchmarks demonstrating their performance are MTEB, an independent study of combining embedding models with reranking models, and the LoCo benchmark by a Stanford University group.
  • Long input-context length – Jina Embeddings v2 models support 8,192 input tokens. This makes the models especially powerful at tasks such as clustering for long documents like legal text or product documentation.
  • Support for bilingual text input Recent research shows that multilingual models without specific language training show strong biases towards English grammatical structures in embeddings. Jina AI’s bilingual embedding models include jina-embeddings-v2-base-de, jina-embeddings-v2-base-zh, jina-embeddings-v2-base-es, and jina-embeddings-v2-base-code. They were trained to encode texts in a combination of English-German, English-Chinese, English-Spanish, and English-Code, respectively, allowing the use of either language as the query or target document in retrieval applications.
  • Cost-effectiveness of operating – Jina Embeddings v2 provides high performance on information retrieval tasks with relatively small models and compact embedding vectors. For example, jina-embeddings-v2-base-de has a size of 322 MB with a performance score of 60.1%. A smaller vector size means a great amount of cost savings while storing them in a vector database.

What is SageMaker JumpStart?

With SageMaker JumpStart, ML practitioners can choose from a growing list of best-performing foundation models. Developers can deploy foundation models to dedicated SageMaker instances within a network-isolated environment, and customize models using SageMaker for model training and deployment.

You can now discover and deploy a Jina Embeddings v2 model with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines and Amazon SageMaker Debugger. With SageMaker JumpStart, the model is deployed in an AWS secure environment and under your VPC controls, helping provide data security.

Jina Embeddings models are available in AWS Marketplace so you can integrate them directly into your deployments when working in SageMaker.

AWS Marketplace enables you to find third-party software, data, and services that run on AWS and manage them from a centralized location. AWS Marketplace includes thousands of software listings and simplifies software licensing and procurement with flexible pricing options and multiple deployment methods.

Solution overview

We’ve prepared a notebook that constructs and runs a RAG question answering system using Jina Embeddings and the Mixtral 8x7B LLM in SageMaker JumpStart.

In the following sections, we give you an overview of the main steps needed to bring a RAG application to life using generative AI models on SageMaker JumpStart. Although we omit some of the boilerplate code and installation steps in this post for reasons of readability, you can access the full Python notebook to run yourself.

Connecting to a Jina Embeddings v2 endpoint

To start using Jina Embeddings v2 models, complete the following steps:

  1. In SageMaker Studio, choose JumpStart in the navigation pane.
  2. Search for “jina” and you will see the provider page link and models available from Jina AI.
  3. Choose Jina Embeddings v2 Base – en, which is Jina AI’s English language embeddings model.
  4. Choose Deploy.
  5. In the dialog that appears, choose Subscribe, which will redirect you to the model’s AWS Marketplace listing, where you can subscribe to the model after accepting the terms of usage.
  6. After subscribing, return to the Sagemaker Studio and choose Deploy.
  7. You will be redirected to the endpoint configuration page, where you can select the instance most suitable for your use case and provide a name for the endpoint.
  8. Choose Deploy.

After you create the endpoint, you can connect to it with the following code snippet:

from jina_sagemaker import Client
 
client = Client(region_name=region)
# Make sure that you’ve given the same name my-jina-embeddings-endpoint to the Jumpstart endpoint in the previous step.
endpoint_name = "my-jina-embeddings-endpoint"
 
client.connect_to_endpoint(endpoint_name=endpoint_name)

Preparing a dataset for indexing

In this post, we use a public dataset from Kaggle (CC0: Public Domain) that contains audio transcripts from the popular YouTube channel Kurzgesagt – In a Nutshell, which has over 20 million subscribers.

Each row in this dataset contains the title of a video, its URL, and the corresponding text transcript.

Enter the following code:

df.head()

Because the transcript of these videos can be quite long (around 10 minutes), in order to find only the relevant content for answering users’ questions and not other parts of the transcripts that are unrelated, you can chunk each of these transcripts before indexing them:

def chunk_text(text, max_words=1024):
    """
    Divide text into chunks where each chunk contains the maximum number of full sentences under `max_words`.
    """
    sentences = text.split('.')
    chunk = []
    word_count = 0
 
    for sentence in sentences:
        sentence = sentence.strip(".")
        if not sentence:
          continue
 
        words_in_sentence = len(sentence.split())
        if word_count + words_in_sentence <= max_words:
            chunk.append(sentence)
            word_count += words_in_sentence
        else:
            # Yield the current chunk and start a new one
            if chunk:
              yield '. '.join(chunk).strip() + '.'
            chunk = [sentence]
            word_count = words_in_sentence
 
    # Yield the last chunk if it's not empty
    if chunk:
        yield ' '.join(chunk).strip() + '.'

The parameter max_words defines the maximum number of full words that can be in a chunk of indexed text. Many chunking strategies exist in academic and non-peer-reviewed literature that are more sophisticated than a simple word limit. However, for the purpose of simplicity, we use this technique in this post.

Index text embeddings for vector search

After you chunk the transcript text, you obtain embeddings for each chunk and link each chunk back to the original transcript and video title:

def generate_embeddings(text_df):
    """
    Generate an embedding for each chunk created in the previous step.
    """

    chunks = list(chunk_text(text_df['Text']))
    embeddings = []
 
    for i, chunk in enumerate(chunks):
      response = client.embed(texts=[chunk])
      chunk_embedding = response[0]['embedding']
      embeddings.append(np.array(chunk_embedding))
 
    text_df['chunks'] = chunks
    text_df['embeddings'] = embeddings
    return text_df
 
print("Embedding text chunks ...")
df = df.progress_apply(generate_embeddings, axis=1)

The dataframe df contains a column titled embeddings that can be put into any vector database of your choice. Embeddings can then be retrieved from the vector database using a function such as find_most_similar_transcript_segment(query, n), which will retrieve the n closest documents to the given input query by a user.

Prompt a generative LLM endpoint

For question answering based on an LLM, you can use the Mistral 7B-Instruct model on SageMaker JumpStart:

from sagemaker.jumpstart.model import JumpStartModel
from string import Template

# Define the LLM to be used and deploy through Jumpstart.
jumpstart_model = JumpStartModel(model_id="huggingface-llm-mistral-7b-instruct", role=role)
model_predictor = jumpstart_model.deploy()

# Define the prompt template to be passed to the LLM
prompt_template = Template("""
  <s>[INST] Answer the question below only using the given context.
  The question from the user is based on transcripts of videos from a YouTube
    channel.
  The context is presented as a ranked list of information in the form of
    (video-title, transcript-segment), that is relevant for answering the
    user's question.
  The answer should only use the presented context. If the question cannot be
    answered based on the context, say so.
 
  Context:
  1. Video-title: $title_1, transcript-segment: $segment_1
  2. Video-title: $title_2, transcript-segment: $segment_2
  3. Video-title: $title_3, transcript-segment: $segment_3
 
  Question: $question
 
  Answer: [/INST]
""")

Query the LLM

Now, for a query sent by a user, you first find the semantically closest n chunks of transcripts from any video of Kurzgesagt (using vector distance between embeddings of chunks and the users’ query), and provide those chunks as context to the LLM for answering the users’ query:

# Define the query and insert it into the prompt template together with the context to be used to answer the question
question = "Can climate change be reversed by individuals' actions?"
search_results = find_most_similar_transcript_segment(question)
 
prompt_for_llm = prompt_template.substitute(
    question = question,
    title_1 = df.iloc[search_results[0][1]]["Title"].strip(),
    segment_1 = search_results[0][0],
    title_2 = df.iloc[search_results[1][1]]["Title"].strip(),
    segment_2 = search_results[1][0],
    title_3 = df.iloc[search_results[2][1]]["Title"].strip(),
    segment_3 = search_results[2][0]
)

# Generate the answer to the question passed in the propt
payload = {"inputs": prompt_for_llm}
model_predictor.predict(payload)

Based on the preceding question, the LLM might respond with an answer such as the following:

Based on the provided context, it does not seem that individuals can solve climate change solely through their personal actions. While personal actions such as using renewable energy sources and reducing consumption can contribute to mitigating climate change, the context suggests that larger systemic changes are necessary to address the issue fully.

Clean up

After you’re done running the notebook, make sure to delete all the resources that you created in the process so your billing is stopped. Use the following code:

model_predictor.delete_model()
model_predictor.delete_endpoint()

Conclusion

By taking advantage of the features of Jina Embeddings v2 to develop RAG applications, together with the streamlined access to state-of-the-art models on SageMaker JumpStart, developers and businesses are now empowered to create sophisticated AI solutions with ease.

Jina Embeddings v2’s extended context length, support for bilingual documents, and small model size enables enterprises to quickly build natural language processing use cases based on their internal datasets without relying on external APIs.

Get started with SageMaker JumpStart today, and refer to the GitHub repository for the complete code to run this sample.

Connect with Jina AI

Jina AI remains committed to leadership in bringing affordable and accessible AI embeddings technology to the world. Our state-of-the-art text embedding models support English and Chinese and soon will support German, with other languages to follow.

For more information about Jina AI’s offerings, check out the Jina AI website or join our community on Discord.


About the Authors

Francesco Kruk is Product Managment intern at Jina AI and is completing his Master’s at ETH Zurich in Management, Technology, and Economics. With a strong business background and his knowledge in machine learning, Francesco helps customers implement RAG solutions using Jina Embeddings in an impactful way.

Saahil Ognawala is Head of Product at Jina AI based in Munich, Germany. He leads the development of search foundation models and collaborates with clients worldwide to enable quick and efficient deployment of state-of-the-art generative AI products. With an academic background in machine learning, Saahil is now interested in scaled applications of generative AI in the knowledge economy.

Roy Allela is a Senior AI/ML Specialist Solutions Architect at AWS based in Munich, Germany. Roy helps AWS customers—from small startups to large enterprises—train and deploy large language models efficiently on AWS. Roy is passionate about computational optimization problems and improving the performance of AI workloads.

Read More