Improve accuracy of Amazon Rekognition Face Search with user vectors

Improve accuracy of Amazon Rekognition Face Search with user vectors

In various industries, such as financial services, telecommunications, and healthcare, customers use a digital identity process, which usually involves several steps to verify end-users during online onboarding or step-up authentication. An example of one step that can be used is face search, which can help determine whether a new end-user’s face matches those associated with an existing account.

Building an accurate face search system involves several steps. The system must be able to detect human faces in images, extract the faces into vector representations, store face vectors in a database, and compare new faces against existing entries. Amazon Rekognition makes this effortless by giving you pre-trained models that are invoked via simple API calls.

Amazon Rekognition enables you to achieve very high face search accuracy with a single face image. In some cases, you can use multiple images of the same person’s face to create user vectors and improve accuracy even further. This is especially helpful when images have variations in lighting, poses, and appearances.

In this post, we demonstrate how to use the Amazon Rekognition Face Search APIs with user vectors to increase the similarity score for true matches and decrease the similarity score for true non-matches.

We compare the results of performing face matching with and without user vectors.

Amazon Rekognition face matching

Amazon Rekognition face matching enables measuring the similarity of a face vector extracted from one image to a face vector extracted from another image. A pair of face images is said to be a true match if both images contain the face of the same person, and a true non-match otherwise. Amazon Rekognition returns a score for the similarity of the source and target faces. The minimum similarity score is 0, implying very little similarity, and the maximum is 100.

For comparing a source face with a collection of target faces (1:N matching), Amazon Rekognition allows you to create a Collection object and populate it with faces from images using API calls.

When adding a face to a collection, Amazon Rekognition doesn’t store the actual image of the face but rather the face vector, a mathematical representation of the face. With the SearchFaces API, you can compare a source face with one or several collections of target faces.

In June 2023, AWS launched user vectors, a new capability that significantly improves face search accuracy by using multiple face images of a user. Now, you can create user vectors, which aggregate multiple face vectors of the same user. User vectors offer higher face search accuracy with more robust depictions, because they contain varying degrees of lighting, sharpness, pose, appearance, and more. This improves the accuracy compared to searching against individual face vectors.

In the following sections, we outline the process of using Amazon Rekognition user vectors. We guide you through creating a collection, storing face vectors in that collection, aggregating those face vectors into user vectors, and then comparing the results of searching against those individual face vectors and user vectors.

Solution overview

For this solution, we use an Amazon Rekognition collection of users, each with its associated indexed face vectors from a number of different images of faces for each user.

Let’s look at the workflow to build a collection with users and faces:

  1. Create an Amazon Rekognition collection.
  2. For each user, create a user in the collection.
  3. For each image of the user, add the face to the collection (IndexFaces, which returns face ID corresponding to each face vector).
  4. Associate all indexed face IDs with the user (this is necessary for user vectors).

Then, we will compare the following workflows:

Searching with a new given input image against individual face vectors in our collection:

  1. Get all faces from an image (DetectFaces).
  2. For each face, compare against individual faces in our collection (SearchFacesByImage).

Searching with a new given input image against user vectors in our collection:

  1. Get all faces from an image (DetectFaces).
  2. For each face, compare to the user vector (SearchUsersByImage).

Now let’s describe the solution in details.

Prerequisites

Add the following policy to your AWS Identity and Access Management (IAM) user or role. The policy grants you permission to the relevant Amazon Rekognition APIs and allows access to an Amazon Simple Storage Service (Amazon S3) bucket to store the images:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "RekognitionPermissions",
            "Effect": "Allow",
            "Action": [
                "rekognition:CreateCollection",
                "rekognition:DeleteCollection",
                "rekognition:CreateUser",
                "rekognition:IndexFaces",
                "rekognition:DetectFaces",
                "rekognition:AssociateFaces",
                "rekognition:SearchUsersByImage",
                "rekognition:SearchFacesByImage"
            ],
            "Resource": "*"
        },
        {
            "Sid": "S3BucketPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<replace_with_your_bucket>/*",
                "arn:aws:s3:::<replace_with_your_bucket>"
            ]
        }
    ]
}

Create an Amazon Rekognition collection and add users and faces

First, we create an S3 bucket to store users’ images. We organize the bucket by creating a folder for each user that contains their personal images. Our images folder looks like the following structure:

── images
│   ├── photo.jpeg
│   ├── Swami
│   │   ├── Swami1.jpeg
│   │   └── Swami2.jpeg
│   └── Werner
│       ├── Werner1.jpeg
│       ├── Werner2.jpeg
│       └── Werner3.jpeg

Our S3 bucket has a directory for each user that stores their images. There are currently two folders, and each contains several images. You can add more folders for your users, each containing one or more images to be indexed.

Next, we create our Amazon Rekognition collection. We have supplied helpers.py, which contains different methods that we use:

  • create_collection – Create a new collection
  • delete_collection – Delete a collection
  • create_user – Create a new user in a collection
  • add_faces_to_collection – Add faces to collection
  • associate_faces – Associate face_ids to a user in a collection
  • get_subdirs – Get all subdirectories under the S3 prefix
  • get_files – Get all files under the S3 prefix

The following is an example method for creating an Amazon Rekognition collection:

import boto3
session = boto3.Session()
client = session.client('rekognition')

def create_collection(collection_id):
    try:
        # Create a collection
        print('Creating collection:' + collection_id)
        response = client.create_collection(CollectionId=collection_id)
        print('Collection ARN: ' + response['CollectionArn'])
        print('Status code: ' + str(response['StatusCode']))
        print('Done...')
    except client.exceptions.ResourceAlreadyExistsException:
        print('Resource already exits...')

Create the collection with the following code:

import helpers
collection_id = "faces-collection"
helpers.create_collection(collection_id)

Next, let’s add the face vectors into our collection and aggregate them into user vectors.

For each user in the S3 directory, we create a user vector in the collection. Then we index the face images for each user into the collection as individual face vectors, which generates face IDs. Lastly, we associate the face IDs to the appropriate user vector.

This creates two types of vectors in our collection:

  • Individual face vectors
  • User vectors, which are built based on the face vector IDs supplied using the method associate_faces

See the following code:

bucket = '<replace_with_your_bucket>'
prefix = 'images/'

# Get all the users directories from s3 containing the images
folder_list = helpers.get_subdirs(bucket, prefix)
print(f"Found users folders: {folder_list}")
print()

for user_id in folder_list:
    face_ids = []
    helpers.create_user(collection_id, user_id)
    # Get all files per user under the s3 user directory
    images = helpers.get_files(bucket, prefix + user_id + "/")
    print (f"Found images={images} for {user_id}")
    for image in images:
        face_id = helpers.add_faces_to_collection(bucket, image, collection_id)
        face_ids.append(face_id)
    helpers.associate_faces(collection_id, user_id, face_ids)
    print()

We use the following methods:

  • get_subdirs – Returns a list of all the users’ directories. In our example, the value is [Swami,Werner].
  • get_files – Returns all the images files under the S3 prefix for the user.
  • face_ids – This is a list containing all the face IDs belonging to a user. We use this list when calling the AssociateFaces API.

As explained earlier, you can add more users by adding folders for them (the folder dictates the user ID) and add your images in that folder (no ordering is required for the files).

Now that our environment is set up and we have both individual face vectors and user vectors, let’s compare our search quality against each of them. To do that, we use a new photo with multiple people and attempt to match their faces against our collection, first against the individual face vectors and then against the user vectors.

Face search of image against a collection of individual face vectors

To search against our individual face vectors, we use the Amazon Rekognition SearchFacesByImage API. This function uses a source face image to search against individual face vectors in our collection and returns faces that match our defined similarity score threshold.

An important consideration is that the SearchFacesByImage API will only operate on the largest face detected in the image. If multiple faces are present, you need to crop each individual face and pass it separately to the method for identification.

For extracting faces details from an image (such as their location on the image), we use the Amazon Rekognition DetectFaces API.

The following detect_faces_in_image method detects faces in an image. For each face, it performs the following actions:

  • Print its bounding box location
  • Crop the face from the image and check if such face exists in the collection and print the user or ‘Unknown’
  • Print the similarity score

The example Python code uses the Pillow library for doing the image manipulations (such as printing, drawing, and cropping).

We use a similarity score threshold of 99%, which is a common setting for identity verification use cases.

Run the following code:

import detect_users
from PIL import Image

# The image we would like to match faces against our collection.
file_key= "images/photo.jpeg"

img = detect_users.detect_faces_in_image(
    bucket, 
    file_key, 
    collection_id, 
    threshold=99
)
img.show() # or in Jupyter use display(img)

file_key is the S3 object key we want to match against our collection. We have supplied an example image (photo.jpeg) under the images folder.

The following image shows our results.

Using a threshold of 99%, only one person was identified. Dr. Werner Vogels was flagged as Unknown. If we run the same code using a lower threshold of 90 (set threshold=90), we get the following results.

Now we see Dr. Werner Vogel’s face has a similarity score of 96.86%. Next, let’s check if we can get the similarity score above our defined threshold by using user vectors.

Face search of image against a collection of user vectors

To search against our user vectors, we use the Amazon Rekognition SearchUsersByImage API. This function uses a source face image to search against user vectors in our collection and returns users that match our defined similarity score threshold.

The same consideration is relevant here – the SearchUsersByImage API will only operate on the largest face detected in the image. If there are multiple faces present, you need to crop each individual face and pass it separately to the method for identification.

For extracting faces details from an image (such as their location on the image), we use the Amazon Rekognition DetectFaces API.

The following detect_users_in_image method detects faces in an image. For each face, it performs the following actions:

  • Print its bounding box location
  • Crop the face from the image and check if such user face exists in our collection and print the user or ‘Unknown’
  • Print the similarity score

See the following code:

import boto3
import io
import math
from PIL import Image, ImageDraw, ImageFont

def detect_users_in_image(bucket, key, collection_id, threshold=80):

    session = boto3.Session()
    client = session.client('rekognition')

    # Load image from S3 bucket
    s3_connection = boto3.resource('s3')
    s3_object = s3_connection.Object(bucket, key)
    s3_response = s3_object.get()

    stream = io.BytesIO(s3_response['Body'].read())
    image = Image.open(stream)

    # Call DetectFaces to find faces in image
    response = client.detect_faces(
        Image={'S3Object': {'Bucket': bucket, 'Name': key}},
        Attributes=['ALL']
    )

    imgWidth, imgHeight = image.size
    draw = ImageDraw.Draw(image)

    # Calculate and display bounding boxes for each detected face
    for faceDetail in response['FaceDetails']:
        print('The detected face is between ' + str(faceDetail['AgeRange']['Low'])
              + ' and ' + str(faceDetail['AgeRange']['High']) + ' years old')

        box = faceDetail['BoundingBox']
        left = imgWidth * box['Left']
        top = imgHeight * box['Top']
        width = imgWidth * box['Width']
        height = imgHeight * box['Height']

        print('Left: ' + '{0:.0f}'.format(left))
        print('Top: ' + '{0:.0f}'.format(top))
        print('Face Width: ' + "{0:.0f}".format(width))
        print('Face Height: ' + "{0:.0f}".format(height))

        points = (
            (left, top),
            (left + width, top),
            (left + width, top + height),
            (left, top + height),
            (left, top)
        )

        # Crop the face box and convert it to byte array
        face = image.crop((left, top, left + width, top + height))
        imgByteArr = image_to_byte_array(face, image.format)

        # Search for a user in our collection using the cropped image
        user_response = client.search_users_by_image(
            CollectionId=collection_id,
            Image={'Bytes': imgByteArr},
            UserMatchThreshold=threshold
        )
        # print (user_response)

        # Extract user id and the similarity from the response
        if (user_response['UserMatches']):
            similarity = user_response['UserMatches'][0]['Similarity']
            similarity = (math.trunc(similarity * 100) / 100) if isinstance(similarity, float) else similarity
            user_id = user_response['UserMatches'][0]['User']['UserId']
            print(f"User {user_id} was found, similarity of {similarity}%")
            print("")
        else:
            user_id = "Unknown"
            similarity = 0

        draw.line(points, fill='#00d400', width=4)
        font = ImageFont.load_default(size=25)
        draw.text((left, top - 30), user_id, fill='#00d400', font=font)
        if similarity > 0:
            draw.text((left, top + 1), str(similarity), fill='#00d400', font=font)

    return image

The function returns a modified image with the results that can be saved to Amazon S3 or printed. The function also outputs statistics about the estimated ages of the faces to the terminal.

Run the following code:

import detect_users
from PIL import Image

# The image we would like to match faces against our collection.
file_key= "images/photo.jpeg"

img = detect_users.detect_users_in_image(
    bucket, 
    file_key, 
    collection_id, 
    threshold=99
)
img.show() # or in Jupyter use display(img)

The following image shows our results.

The users that exist in our collection were identified correctly with high similarity (over 99%).

We were able to increase the similarity score by using three face vectors per user vector. As we increase the number of face vectors used, we expect the similarity score for true matches to also increase. You can use up to 100 face vectors per user vector.

An end-to-end example code can be found in the GitHub repository. It includes a detailed Jupyter notebook that you can run on Amazon SageMaker Studio (or other alternatives).

Clean up

To delete the collection, use the following code:

helpers.delete_collection(collection_id)

Conclusion

In this post, we presented how to use Amazon Rekognition user vectors to implement face search against a collection of users’ faces. We demonstrated how to improve face search accuracy by using multiple face images per user and compared it against individual face vectors. Additionally, we described how you can use the different Amazon Rekognition APIs to detect faces. The provided example code serves as a solid foundation for constructing a functional face search system.

For more information about Amazon Rekognition user vectors, refer to Searching faces in a collection. If you’re new to Amazon Rekognition, you can use our Free Tier, which lasts 12 months and includes processing 5,000 images per month and storing 1,000 user vector objects per month.


About the Authors

Arik Porat is a Senior Startups Solutions Architect at Amazon Web Services. He works with startups to help them build and design their solutions in the cloud, and is passionate about machine learning and container-based solutions. In his spare time, Arik likes to play chess and video games.

Eliran Efron is a Startups Solutions Architect at Amazon Web Services. Eliran is a data and compute enthusiast, assisting startups designing their system architectures. In his spare time, Eliran likes to build and race cars in Touring races and build IoT devices.

Read More

Accelerate ML workflows with Amazon SageMaker Studio Local Mode and Docker support

Accelerate ML workflows with Amazon SageMaker Studio Local Mode and Docker support

We are excited to announce two new capabilities in Amazon SageMaker Studio that will accelerate iterative development for machine learning (ML) practitioners: Local Mode and Docker support. ML model development often involves slow iteration cycles as developers switch between coding, training, and deployment. Each step requires waiting for remote compute resources to start up, which delays validating implementations and getting feedback on changes.

With Local Mode, developers can now train and test models, debug code, and validate end-to-end pipelines directly on their SageMaker Studio notebook instance without the need for spinning up remote compute resources. This reduces the iteration cycle from minutes down to seconds, boosting developer productivity. Docker support in SageMaker Studio notebooks enables developers to effortlessly build Docker containers and access pre-built containers, providing a consistent development environment across the team and avoiding time-consuming setup and dependency management.

Local Mode and Docker support offer a streamlined workflow for validating code changes and prototyping models using local containers running on a SageMaker Studio notebook

instance. In this post, we guide you through setting up Local Mode in SageMaker Studio, running a sample training job, and deploying the model on an Amazon SageMaker endpoint from a SageMaker Studio notebook.

SageMaker Studio Local Mode

SageMaker Studio introduces Local Mode, enabling you to run SageMaker training, inference, batch transform, and processing jobs directly on your JupyterLab, Code Editor, or SageMaker Studio Classic notebook instances without requiring remote compute resources. Benefits of using Local Mode include:

  • Instant validation and testing of workflows right within integrated development environments (IDEs)
  • Faster iteration through local runs for smaller-scale jobs to inspect outputs and identify issues early
  • Improved development and debugging efficiency by eliminating the wait for remote training jobs
  • Immediate feedback on code changes before running full jobs in the cloud

The following figure illustrates the workflow using Local Mode on SageMaker.

Workflow using Local Mode on SageMaker

To use Local Mode, set instance_type='local' when running SageMaker Python SDK jobs such as training and inference. This will run them on the instances used by your SageMaker Studio IDEs instead of provisioning cloud resources.

Although certain capabilities such as distributed training are only available in the cloud, Local Mode removes the need to switch contexts for quick iterations. When you’re ready to take advantage of the full power and scale of SageMaker, you can seamlessly run your workflow in the cloud.

Docker support in SageMaker Studio

SageMaker Studio now also enables building and running Docker containers locally on your SageMaker Studio notebook instance. This new feature allows you to build and validate Docker images in SageMaker Studio before using them for SageMaker training and inference.

The following diagram illustrates the high-level Docker orchestration architecture within SageMaker Studio.

high-level Docker orchestration architecture within SageMaker Studio

With Docker support in SageMaker Studio, you can:

  • Build Docker containers with integrated models and dependencies directly within SageMaker Studio
  • Eliminate the need for external Docker build processes to simplify image creation
  • Run containers locally to validate functionality before deploying models to production
  • Reuse local containers when deploying to SageMaker for training and hosting

Although some advanced Docker capabilities like multi-container and custom networks are not supported as of this writing, the core build and run functionality is available to accelerate developing containers for bring your own container (BYOC) workflows.

Prerequisites

To use Local Mode in SageMaker Studio applications, you must complete the following prerequisites:

  • For pulling images from Amazon Elastic Container Registry (Amazon ECR), the account hosting the ECR image must provide access permission to the user’s Identity and Access Management (IAM) role. The domain’s role must also allow Amazon ECR access.
  • To enable Local Mode and Docker capabilities, you must set the EnableDockerAccess parameter to true for the domain’s DockerSettings using the AWS Command Line Interface (AWS CLI). This allows users in the domain to use Local Mode and Docker features. By default, Local Mode and Docker are disabled in SageMaker Studio. Any existing SageMaker Studio apps will need to be restarted for the Docker service update to take effect. The following is an example AWS CLI command for updating a SageMaker Studio domain:
aws sagemaker --region <REGION> 
update-domain --domain-id <DOMAIN-ID> 
--domain-settings-for-update '{"DockerSettings": {"EnableDockerAccess": "ENABLED"}}'
  • You need to update the SageMaker IAM role in order to be able to push Docker images to Amazon ECR:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr:CompleteLayerUpload",
        "ecr:UploadLayerPart",
        "ecr:InitiateLayerUpload",
        "ecr:BatchCheckLayerAvailability",
        "ecr:PutImage"
      ],
      "Resource": "arn:aws:ecr:us-east-2:123456789012:repository/<repositoryname>"
    },
    {
      "Effect": "Allow",
      "Action": "ecr:GetAuthorizationToken",
      "Resource": "*"
    }
  ]
}

Run Python files in SageMaker Studio spaces using Local Mode

SageMaker Studio JupyterLab and Code Editor (based on Code-OSS, Visual Studio Code – Open Source), extends SageMaker Studio so you can write, test, debug, and run your analytics and ML code using the popular lightweight IDE. For more details on how to get started with SageMaker Studio IDEs, refer to Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools and New – Code Editor, based on Code-OSS VS Code Open Source now available in Amazon SageMaker Studio. Complete the following steps:

  • Create a new Code Editor or JupyterLab space called my-sm-code-editor-space or my-sm-jupyterlab-space, respectively.
  • Choose Create spaceRun Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode
  • Choose the ml.m5.large instance and set storage to 32 GB.
  • Choose Run spaceRun Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode
  • Open the JupyterLab or Code Editor space and clone the GitHub repo.  Run Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode
  • Clone the GitHub repo, with /home/sagemaker-user/ as the target folder.

Run Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode

  • Create a new terminal.  Run Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode
  • Install the Docker CLI and Docker Compose plugin following the instructions in the following GitHub repo. If chained commands fail, run the commands one at a time.

Run Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode You must update the SageMaker SDK to the latest version.

  • Run pip install sagemaker -Uq in the terminal.

For Code Editor only, you need to set the Python environment to run in the current terminal.

  • In Code Editor, on the File menu¸ choose Preferences and Settings.

Run Python files in SageMaker Studio spaces using Local Mode

  • Search for and select Terminal: Execute in File Dir.

Run Python files in SageMaker Studio spaces using Local Mode

  • In Code Editor or JupyterLab, open the scikit_learn_script_mode_local_training_and_serving folder and run the scikit_learn_script_mode_local_training_and_serving.py file.

You can run the script by choosing Run in Code Editor or using the CLI in a JupyterLab terminal. Run Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode You will be able to see how the model is trained locally. Then you deploy the model to a SageMaker endpoint locally, and calculate the root mean square error (RMSE). Run Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode

Simulate training and inference in SageMaker Studio Classic using Local Mode

You can also use a notebook in SageMaker Studio Classic to run a small-scale training job on CIFAR10 using Local Mode, deploy the model locally, and perform inference.

Set up your notebook

To set up the notebook, complete the following steps:

  • Open SageMaker Studio Classic and clone the following GitHub repo.

Simulate training and inference in SageMaker Studio Classic using Local Mode

  • Open the pytorch_local_mode_cifar10.ipynb notebook in blog/pytorch_cnn_cifar10.

Simulate training and inference in SageMaker Studio Classic using Local Mode

  • For Image, choose PyTorch 2.1.0 Python 3.10 CPU Optimized.

Simulate training and inference in SageMaker Studio Classic using Local Mode Confirm that your notebook shows the correct instance and kernel selection. Simulate training and inference in SageMaker Studio Classic using Local Mode

  • Open a terminal by choosing Launch Terminal in the current SageMaker image.

Simulate training and inference in SageMaker Studio Classic using Local Mode

  • Install the Docker CLI and Docker Compose plugin following the instructions in the following GitHub repo.

Because you’re using Docker from SageMaker Studio Classic, remove sudo when running commands because the terminal already runs under superuser. For SageMaker Studio Classic, the installation commands depend on the SageMaker Studio app image OS. For example, DLC-based framework images are Ubuntu based, in which the following instructions would work. However, for a Debian-based image like DataScience Images, you must follow the instructions in the following GitHub repo. If chained commands fail, run the commands one at a time. You should see the Docker version displayed. Simulate training and inference in SageMaker Studio Classic using Local Mode

  • Leave the terminal window open, go back to the notebook, and start running it cell by cell.

Make sure to run the cell with pip install -U sagemaker so you’re using the latest version of the SageMaker Python SDK.

Local training

When you start running the local SageMaker training job, you will see the following log lines:

INFO:sagemaker.local.image:'Docker Compose' found using Docker CLI.
INFO:sagemaker.local.local_session:Starting training job

This indicates that the training was running locally using Docker.

Simulate training and inference in SageMaker Studio Classic using Local Mode

Be patient while the pytorch-training:2.1-cpu-py310 Docker image is pulled. Due to its large size (5.2 GB), it could take a few minutes.

Docker images will be stored in the SageMaker Studio app instance’s root volume, which is not accessible to end-users. The only way to access and interact with Docker images is via the exposed Docker API operations.

From a user confidentiality standpoint, the SageMaker Studio platform never accesses or stores user-specific images.

When the training is complete, you’ll be able to see the following success log lines:

8zlz1zbfta-sagemaker-local exited with code 0
Aborting on container exit...
Container 8zlz1zbfta-sagemaker-local  Stopping
Container 8zlz1zbfta-sagemaker-local  Stopped
INFO:sagemaker.local.image:===== Job Complete =====

Simulate training and inference in SageMaker Studio Classic using Local Mode

Local inference

Complete the following steps:

  • Deploy the SageMaker endpoint using SageMaker Local Mode.

Be patient while the pytorch-inference:2.1-cpu-py310 Docker image is pulled. Due to its large size (4.32 GB), it could take a few minutes.

Simulate training and inference in SageMaker Studio Classic using Local Mode

  • Invoke the SageMaker endpoint deployed locally using the test images.

Simulate training and inference in SageMaker Studio Classic using Local Mode

You will be able to see the predicted classes: frog, ship, car, and plane:

Predicted:  frog ship  car plane

Simulate training and inference in SageMaker Studio Classic using Local Mode

  • Because the SageMaker Local endpoint is still up, navigate back to the open terminal window and list the running containers:

docker ps

You’ll be able to see the running pytorch-inference:2.1-cpu-py310 container backing the SageMaker endpoint.

Simulate training and inference in SageMaker Studio Classic using Local Mode

  • To shut down the SageMaker local endpoint and stop the running container, because you can only run one local endpoint at a time, run the cleanup code.

Simulate training and inference in SageMaker Studio Classic using Local Mode

  • To make sure the Docker container is down, you can navigate to the opened terminal window, run docker ps, and make sure there are no running containers.
  • If you see a container running, run docker stop <CONTAINER_ID> to stop it.

Tips for using SageMaker Local Mode

If you’re using SageMaker for the first time, refer to Train machine learning models. To learn more about deploying models for inference with SageMaker, refer to Deploy models for inference.

Keep in mind the following recommendations:

  • Print input and output files and folders to understand dataset and model loading
  • Use 1–2 epochs and small datasets for quick testing
  • Pre-install dependencies in a Dockerfile to optimize environment setup
  • Isolate serialization code in endpoints for debugging

Configure Docker installation as a Lifecycle Configuration

You can define the Docker install process as a Lifecycle Configuration (LCC) script to simplify setup each time a new SageMaker Studio space starts. LCCs are scripts that SageMaker runs during events like space creation. Refer to the JupyterLab, Code Editor, or SageMaker Studio Classic LCC setup (using docker install cli as reference) to learn more.

Configure Docker installation as a Lifecycle Configuration

Configure Docker installation as a Lifecycle Configuration

Build and test custom Docker images in SageMaker Studio spaces

In this step, you install Docker inside the JupyterLab (or Code Editor) app space and use Docker to build, test, and publish custom Docker images with SageMaker Studio spaces. Spaces are used to manage the storage and resource needs of some SageMaker Studio applications. Each space has a 1:1 relationship with an instance of an application. Every supported application that is created gets its own space. To learn more about SageMaker spaces, refer to Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools. Make sure you provision a new space with at least 30 GB of storage to allow sufficient storage for Docker images and artifacts.

Install Docker inside a space

To install the Docker CLI and Docker Compose plugin inside a JupyterLab space, run the commands in the following GitHub repo. SageMaker Studio only supports Docker version 20.10.X.

Build Docker images

To confirm that Docker is installed and working inside your JupyterLab space, run the following code:

# to verify docker service
sagemaker-user@default:~$ docker version
Client: Docker Engine - Community
Version:           24.0.7
API version:       1.41 (downgraded from 1.43)
Go version:        go1.20.10
Git commit:        afdd53b
Built:             Thu Oct 26 09:07:41 2023
OS/Arch:           linux/amd64
Context:           default

Server:
Engine:
Version:          20.10.25
API version:      1.41 (minimum version 1.12)
Go version:       go1.20.10
Git commit:       5df983c
Built:            Fri Oct 13 22:46:59 2023
OS/Arch:          linux/amd64
Experimental:     false
containerd:
Version:          1.7.2
GitCommit:        0cae528dd6cb557f7201036e9f43420650207b58
runc:
Version:          1.1.7
GitCommit:        f19387a6bec4944c770f7668ab51c4348d9c2f38
docker-init:
Version:          0.19.0
GitCommit:        de40ad0

To build a custom Docker image inside a JupyterLab (or Code Editor) space, complete the following steps:

  • Create an empty Dockerfile:

touch Dockerfile

  • Edit the Dockerfile with the following commands, which create a simple flask web server image from the base python:3.10.13-bullseye image hosted on Docker Hub:
# Use the specified Python base image
FROM python:3.10.13-bullseye

# Create a code dir
RUN mkdir /code/

# Set the working directory in the container
WORKDIR /code

# Upgrade pip and install required packages
RUN python3 -m pip install --upgrade pip && 
python3 -m pip install flask

# Copy the app.py file to the container
COPY app.py /code/

# Set the command to run the app
ENTRYPOINT ["python", "app.py"]

The following code shows the contents of an example flask application file app.py:

from flask import Flask, jsonify

app = Flask(__name__)

@app.route('/')
def hello():
return jsonify({"response": "Hello"})

if __name__ == '__main__':
app.run(host='0.0.0.0', port=6006)

Additionally, you can update the reference Dockerfile commands to include packages and artifacts of your choice.

  • Build a Docker image using the reference Dockerfile:

docker build --network sagemaker --tag myflaskapp:v1 --file ./Dockerfile .

Include --network sagemaker in your docker build command, otherwise the build will fail. Containers can’t be run in Docker default bridge or custom Docker networks. Containers are run in same network as the SageMaker Studio application container. Users can only use sagemaker for the network name.

  • When your build is complete, validate if the image exists. Re-tag the build as an ECR image and push. If you run into permission issues, run the aws ecr get-login-password… command and try to rerun the Docker push/pull:
sagemaker-user@default:~$ docker image list
REPOSITORY      TAG       IMAGE ID       CREATED          SIZE
myflaskapp      v1        d623f1538f20   27 minutes ago   489MB

sagemaker-user@default:~$ docker tag myflaskapp:v1 123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:v1

sagemaker-user@default:~$ docker image list
REPOSITORY                                                  TAG       IMAGE ID       CREATED          SIZE
123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp     latest    d623f1538f20   27 minutes ago   489MB
myflaskapp                                                  v1        d623f1538f20   27 minutes ago   489MB

sagemaker-user@default:~$ aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com

sagemaker-user@default:~$ docker push 123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:latest

Test Docker images

Having Docker installed inside a JupyterLab (or Code Editor) SageMaker Studio space allows you to test pre-built or custom Docker images as containers (or containerized applications). In this section, we use the docker run command to provision Docker containers inside a SageMaker Studio space to test containerized workloads like REST web services and Python scripts. Complete the following steps:

sagemaker-user@default:~$ docker image list
REPOSITORY                                                  TAG       IMAGE ID       CREATED       SIZE
  • If the test image doesn’t exist, run docker pull to pull the image into your local machine:

sagemaker-user@default:~$ docker pull 123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:v1

  • If you encounter authentication issues, run the following commands:

aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com

  • Create a container to test your workload:

docker run --network sagemaker 123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:v1

This spins up a new container instance and runs the application defined using Docker’s ENTRYPOINT:

sagemaker-user@default:~$ docker run --network sagemaker 905418447590.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:v1
* Serving Flask app 'app'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:6006
* Running on http://169.255.255.2:6006
  • To test if your web endpoint is active, navigate to the URL https://<sagemaker-space-id>.studio.us-east-2.sagemaker.aws/jupyterlab/default/proxy/6006/.

You should see a JSON response similar to following screenshot.

Configure Docker installation as a Lifecycle Configuration

Clean up

To avoid incurring unnecessary charges, delete the resources that you created while running the examples in this post:

  1. In your SageMaker Studio domain, choose Studio Classic in the navigation pane, then choose Stop.
  2. In your SageMaker Studio domain, choose JupyterLab or Code Editor in the navigation pane, choose your app, and then choose Stop.

Conclusion

SageMaker Studio Local Mode and Docker support empower developers to build, test, and iterate on ML implementations faster without leaving their workspace. By providing instant access to test environments and outputs, these capabilities optimize workflows and improve productivity. Try out SageMaker Studio Local Model and Docker support using our quick onboard feature, which allows you to spin up a new domain for single users within minutes. Share your thoughts in the comments section!


About the Authors

Shweta SinghShweta Singh is a Senior Product Manager in the Amazon SageMaker Machine Learning (ML) platform team at AWS, leading SageMaker Python SDK. She has worked in several product roles in Amazon for over 5 years. She has a Bachelor of Science degree in Computer Engineering and Masters of Science in Financial Engineering, both from New York University

Eitan SelaEitan Sela is a Generative AI and Machine Learning Specialist Solutions Architect ta AWS. He works with AWS customers to provide guidance and technical assistance, helping them build and operate Generative AI and Machine Learning solutions on AWS. In his spare time, Eitan enjoys jogging and reading the latest machine learning articles.

Pranav MurthyPranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

Mufaddal RohawalaMufaddal Rohawala is a Software Engineer at AWS. He works on the SageMaker Python SDK library for Amazon SageMaker. In his spare time, he enjoys travel, outdoor activities and is a soccer fan.

Read More

Significant new capabilities make it easier to use Amazon Bedrock to build and scale generative AI applications – and achieve impressive results

Significant new capabilities make it easier to use Amazon Bedrock to build and scale generative AI applications – and achieve impressive results

We introduced Amazon Bedrock to the world a little over a year ago, delivering an entirely new way to build generative artificial intelligence (AI) applications. With the broadest selection of first- and third-party foundation models (FMs) as well as user-friendly capabilities, Amazon Bedrock is the fastest and easiest way to build and scale secure generative AI applications. Now tens of thousands of customers are using Amazon Bedrock to build and scale impressive applications. They are innovating quickly, easily, and securely to advance their AI strategies. And we’re supporting their efforts by enhancing Amazon Bedrock with exciting new capabilities including even more model choice and features that make it easier to select the right model, customize the model for a specific use case, and safeguard and scale generative AI applications.

Customers across diverse industries from finance to travel and hospitality to healthcare to consumer technology are making remarkable progress. They are realizing real business value by quickly moving generative AI applications into production to improve customer experiences and increase operational efficiency. Consider the New York Stock Exchange (NYSE), the world’s largest capital market processing billions of transactions each day. NYSE is leveraging Amazon Bedrock’s choice of FMs and cutting-edge AI generative capabilities across several use cases, including the processing of thousands of pages of regulations to provide answers in easy-to-understand language

Global airline United Airlines modernized their Passenger Service System to translate legacy passenger reservation codes into plain English so that agents can provide swift and efficient customer support. LexisNexis Legal & Professional, a leading global provider of information and analytics, developed a personalized legal generative AI assistant on Lexis+ AI. LexisNexis customers receive trusted results two times faster than the nearest competing product and can save up to five hours per week for legal research and summarization. And HappyFox, an online help desk software, selected Amazon Bedrock for its security and performance, boosting the efficiency of its AI-powered automated ticket system in its customer support solution by 40% and agent productivity by 30%.

And across Amazon, we are continuing to innovate with generative AI to deliver more immersive, engaging experiences for our customers. Just last week Amazon Music announced Maestro. Maestro is an AI playlist generator powered by Amazon Bedrock that gives Amazon Music subscribers an easier, more fun way to create playlists based on prompts. Maestro is now rolling out in beta to a small number of U.S. customers on all tiers of Amazon Music.

With Amazon Bedrock, we’re focused on the key areas that customers need to build production-ready, enterprise-grade generative AI applications at the right cost and speed. Today I’m excited to share new features that we’re announcing across the areas of model choice, tools for building generative AI applications, and privacy and security.

1. Amazon Bedrock expands model choice with Llama 3 models and helps you find the best model for your needs

In these early days, customers are still learning and experimenting with different models to determine which ones to use for various purposes. They want to be able to easily try the latest models, and test which capabilities and features will give them the best results and cost characteristics for their use cases. The majority of Amazon Bedrock customers use more than one model, and Amazon Bedrock provides the broadest selection of first- and third-party large language models (LLMs) and other FMs.  This includes models from AI21 labs, Anthropic, Cohere, Meta, Mistral AI, and Stability AI, as well as our own Amazon Titan models. In fact, Joel Hron, head of AI and Thomson Reuters Labs at Thomson Reuters recently said this about their adoption of Amazon Bedrock, “Having the ability to use a diverse range of models as they come out was a key driver for us, especially given how quickly this space is evolving.” The cutting-edge models of the Mistral AI model family including Mistral 7B, Mixtral 8x7B, and Mistral Large have customers excited about their high performance in text generation, summarization, Q&A, and code generation. Since we introduced the Anthropic Claude 3 model family, thousands of customers have experienced how Claude 3 Haiku, Sonnet, and Opus have established new benchmarks across cognitive tasks with unrivaled intelligence, speed, and cost-efficiency. After the initial evaluation using Claude 3 Haiku and Opus in Amazon Bedrock, BlueOcean.ai, a brand intelligence platform, saw a cost reduction of over 50% when they were able to consolidate four separate API calls into a single, more efficient call.

Masahiro Oba, General Manager, Group Federated Governance of DX Platform at Sony Group corporation shared,

“While there are many challenges with applying generative AI to the business, Amazon Bedrock’s diverse capabilities help us to tailor generative AI applications to Sony’s business. We are able to take advantage of not only the powerful LLM capabilities of Claude 3, but also capabilities that help us safeguard applications at the enterprise-level. I’m really proud to be working with the Bedrock team to further democratize generative AI within the Sony Group.”

I recently sat down with Aaron Linsky, CTO of Artificial Investment Associate Labs at Bridgewater Associates, a premier asset management firm, where they are using generative AI to enhance their “Artificial Investment Associate,” a major leap forward for their customers. It builds on their experience of giving rules-based expert advice for investment decision-making. With Amazon Bedrock, they can use the best available FMs, such as Claude 3, for different tasks-combining fundamental market understanding with the flexible reasoning capabilities of AI. Amazon Bedrock allows for seamless model experimentation, enabling Bridgewater to build a powerful, self-improving investment system that marries systematic advice with cutting-edge capabilities–creating an evolving, AI-first process.

To bring even more model choice to customers, today, we are making Meta Llama 3 models available in Amazon Bedrock. Llama 3’s Llama 3 8B and Llama 3 70B models are designed for building, experimenting, and responsibly scaling generative AI applications. These models were significantly improved from the previous model architecture, including scaling up pretraining, as well as instruction fine-tuning approaches. Llama 3 8B excels in text summarization, classification, sentiment analysis, and translation, ideal for limited resources and edge devices. Llama 3 70B shines in content creation, conversational AI, language understanding, R&D, enterprises, accurate summarization, nuanced classification/sentiment analysis, language modeling, dialogue systems, code generation, and instruction following. Read more about Meta Llama 3 now available in Amazon Bedrock.

We are also announcing support coming soon for Cohere’s Command R and Command R+ enterprise FMs. These models are highly scalable and optimized for long-context tasks like retrieval-augmented generation (RAG) with citations to mitigate hallucinations, multi-step tool use for automating complex business tasks, and support for 10 languages for global operations. Command R+ is Cohere’s most powerful model optimized for long-context tasks, while Command R is optimized for large-scale production workloads. With the Cohere models coming soon in Amazon Bedrock, businesses can build enterprise-grade generative AI applications that balance strong accuracy and efficiency for day-to-day AI operations beyond proof-of-concept.

Amazon Titan Image Generator now generally available and Amazon Titan Text Embeddings V2 coming soon

In addition to adding the most capable 3P models, Amazon Titan Image Generator is generally available today. With Amazon Titan Image Generator, customers in industries like advertising, e-commerce, media, and entertainment can efficiently generate realistic, studio-quality images in large volumes and at low cost, utilizing natural language prompts. They can edit generated or existing images using text prompts, configure image dimensions, or specify the number of image variations to guide the model. By default, every image produced by Amazon Titan Image Generator contains an invisible watermark, which aligns with AWS’s commitment to promoting responsible and ethical AI by reducing the spread of misinformation. The Watermark Detection feature identifies images created by Image Generator, and is designed to be tamper-resistant, helping increase transparency around AI-generated content. Watermark Detection helps mitigate intellectual property risks and enables content creators, news organizations, risk analysts, fraud-detection teams, and others, to better identify and mitigate dissemination of misleading AI-generated content. Read more about Watermark Detection for Titan Image Generator.

Coming soon, Amazon Titan Text Embeddings V2 efficiently delivers more relevant responses for critical enterprise use cases like search. Efficient embeddings models are crucial to performance when leveraging RAG to enrich responses with additional information. Embeddings V2 is optimized for RAG workflows and provides seamless integration with Knowledge Bases for Amazon Bedrock to deliver more informative and relevant responses efficiently. Embeddings V2 enables a deeper understanding of data relationships for complex tasks like retrieval, classification, semantic similarity search, and enhancing search relevance. Offering flexible embedding sizes of 256, 512, and 1024 dimensions, Embeddings V2 prioritizes cost reduction while retaining 97% of the accuracy for RAG use cases, out-performing other leading models. Additionally, the flexible embedding sizes cater to diverse application needs, from low-latency mobile deployments to high-accuracy asynchronous workflows.

New Model Evaluation simplifies the process of accessing, comparing, and selecting LLMs and FMs

Choosing the appropriate model is a critical first step toward building any generative AI application. LLMs can vary drastically in performance based on the task, domain, data modalities, and other factors. For example, a biomedical model is likely to outperform general healthcare models in specific medical contexts, whereas a coding model may face challenges with natural language processing tasks. Using an excessively powerful model could lead to inefficient resource usage, while an underpowered model might fail to meet minimum performance standards – potentially providing incorrect results. And selecting an unsuitable FM at a project’s onset could undermine stakeholder confidence and trust.

With so many models to choose from, we want to make it easier for customers to pick the right one for their use case.

Amazon Bedrock’s Model Evaluation tool, now generally available, simplifies the selection process by enabling benchmarking and comparison against specific datasets and evaluation metrics, ensuring developers select the model that best aligns with their project goals. This guided experience allows developers to evaluate models across criteria tailored to each use case. Through Model Evaluation, developers select candidate models to assess – public options, imported custom models, or fine-tuned versions. They define relevant test tasks, datasets, and evaluation metrics, such as accuracy, latency, cost projections, and qualitative factors. Read more about Model Evaluation in Amazon Bedrock.

The ability to select from the top-performing FMs in Amazon Bedrock has been extremely beneficial for Elastic Security. James Spiteri, Director of Product Management at Elastic shared,

“With just a few clicks, we can assess a single prompt across multiple models simultaneously. This model evaluation functionality enables us to compare the outputs, metrics, and associated costs across different models, allowing us to make an informed decision on which model would be most suitable for what we are trying to accomplish. This has significantly streamlined our process, saving us a considerable amount of time in deploying our applications to production.”

2. Amazon Bedrock offers capabilities to tailor generative AI to your business needs

While models are incredibly important, it takes more than a model to build an application that is useful for an organization. That’s why Amazon Bedrock has capabilities to help you easily tailor generative AI solutions to specific use cases. Customers can use their own data to privately customize applications through fine-tuning or by using Knowledge Bases for a fully managed RAG experience to deliver more relevant, accurate, and customized responses. Agents for Amazon Bedrock allows developers to define specific tasks, workflows, or decision-making processes, enhancing control and automation while ensuring consistent alignment with an intended use case. Starting today, you can now use Agents with Anthropic Claude 3 Haiku and Sonnet models. We are also introducing an updated AWS console experience, supporting a simplified schema and return of control to make it easy for developers to get started. Read more about Agents for Amazon Bedrock, now faster and easier to use.

With new Custom Model Import, customers can leverage the full capabilities of Amazon Bedrock with their own models

All these features are essential to building generative AI applications, which is why we wanted to make them available to even more customers including those who have already invested significant resources in fine-tuning LLMs with their own data on different services or in training custom models from scratch. Many customers have customized models available on Amazon SageMaker, which provides the broadest array of over 250 pre-trained FMs. These FMs include cutting-edge models such as Mistral, Llama2, CodeLlama, Jurassic-2, Jamba, pplx-7B, 70B, and the impressive Falcon 180B. Amazon SageMaker helps with getting data organized and fine-tuned, building scalable and efficient training infrastructure, and then deploying models at scale in a low latency, cost-efficient manner. It has been a game changer for developers in preparing their data for AI, managing experiments, training models faster (e.g. Perplexity AI trains models 40% faster in Amazon SageMaker), lowering inference latency (e.g. Workday has reduced inference latency by 80% with Amazon SageMaker), and improving developer productivity (e.g. NatWest reduced its time-to-value for AI from 12-18 months to under seven months using Amazon SageMaker). However, operationalizing these customized models securely and integrating them into applications for specific business use cases still has challenges.

That is why today we’re introducing Amazon Bedrock Custom Model Import, which enables organizations to leverage their existing AI investments along with Amazon Bedrock’s capabilities. With Custom Model Import, customers can now import and access their own custom models built on popular open model architectures including Flan-T5, Llama, and Mistral, as a fully managed application programming interface (API) in Amazon Bedrock. Customers can take models that they customized on Amazon SageMaker, or other tools, and easily add them to Amazon Bedrock. After an automated validation, they can seamlessly access their custom model, as with any other model in Amazon Bedrock. They get all the same benefits, including seamless scalability and powerful capabilities to safeguard their applications, adherence to responsible AI principles – as well as the ability to expand a model’s knowledge base with RAG, easily create agents to complete multi-step tasks, and carry out fine tuning to keep teaching and refining models. All without needing to manage the underlying infrastructure.

With this new capability, we’re making it easy for organizations to choose a combination of Amazon Bedrock models and their own custom models while maintaining the same streamlined development experience. Today, Amazon Bedrock Custom Model Import is available in preview and supports three of the most popular open model architectures and with plans for more in the future. Read more about Custom Model Import for Amazon Bedrock.

ASAPP is a generative AI company with a 10-year history of building ML models.

“Our conversational generative AI voice and chat agent leverages these models to redefine the customer service experience. To give our customers end to end automation, we need LLM agents, knowledge base, and model selection flexibility. With Custom Model Import, we will be able to use our existing custom models in Amazon Bedrock. Bedrock will allow us to onboard our customers faster, increase our pace of innovation, and accelerate time to market for new product capabilities.”

– Priya Vijayarajendran, President, Technology.

3. Amazon Bedrock provides a secure and responsible foundation to implement safeguards easily

As generative AI capabilities progress and expand, building trust and addressing ethical concerns becomes even more important. Amazon Bedrock addresses these concerns by leveraging AWS’s secure and trustworthy infrastructure with industry-leading security measures, robust data encryption, and strict access controls.

Guardrails for Amazon Bedrock, now generally available, helps customers prevent harmful content and manage sensitive information within an application.

We also offer Guardrails for Amazon Bedrock, which is now generally available. Guardrails offers industry-leading safety protection, giving customers the ability to define content policies, set application behavior boundaries, and implement safeguards against potential risks. Guardrails for Amazon Bedrock is the only solution offered by a major cloud provider that enables customers to build and customize safety and privacy protections for their generative AI applications in a single solution. It helps customers block as much as 85% more harmful content than protection natively provided by FMs on Amazon Bedrock. Guardrails provides comprehensive support for harmful content filtering and robust personal identifiable information (PII) detection capabilities. Guardrails works with all LLMs in Amazon Bedrock as well as fine-tuned models, driving consistency in how models respond to undesirable and harmful content. You can configure thresholds to filter content across six categories – hate, insults, sexual, violence, misconduct (including criminal activity), and prompt attack (jailbreak and prompt injection). You can also define a set of topics or words that need to be blocked in your generative AI application, including harmful words, profanity, competitor names, and products. For example, a banking application can configure a guardrail to detect and block topics related to investment advice. A contact center application summarizing call center transcripts can use PII redaction to remove PIIs in call summaries, or a conversational chatbot can use content filters to block harmful content. Read more about Guardrails for Amazon Bedrock.

Companies like Aha!, a software company that helps more than 1 million people bring their product strategy to life, uses Amazon Bedrock to power many of their generative AI capabilities.

“We have full control over our information through Amazon Bedrock’s data protection and privacy policies, and can block harmful content through Guardrails for Amazon Bedrock. We just built on it to help product managers discover insights by analyzing feedback submitted by their customers. This is just the beginning. We will continue to build on advanced AWS technology to help product development teams everywhere prioritize what to build next with confidence.”

With even more choice of leading FMs and features that help you evaluate models and safeguard applications as well as leverage your prior investments in AI along with the capabilities of Amazon Bedrock, today’s launches make it even easier and faster for customers to build and scale generative AI applications. This blog post highlights only a subset of the new features. You can learn more about everything we’ve launched in the resources of this post, including asking questions and summarizing data from a single document without setting up a vector database in Knowledge Bases and the general availability of support for multiple data sources with Knowledge Bases.

Early adopters leveraging Amazon Bedrock’s capabilities are gaining a crucial head start – driving productivity gains, fueling ground-breaking discoveries across domains, and delivering enhanced customer experiences that foster loyalty and engagement. I’m excited to see what our customers will do next with these new capabilities.

As my mentor Werner Vogels always says “Now Go Build” and I’ll add “…with Amazon Bedrock!”

Resources

Check out the following resources to learn more about this announcement:


About the author

Swami Sivasubramanian is Vice President of Data and Machine Learning at AWS. In this role, Swami oversees all AWS Database, Analytics, and AI & Machine Learning services. His team’s mission is to help organizations put their data to work with a complete, end-to-end data solution to store, access, analyze, and visualize, and predict.

Read More

Building scalable, secure, and reliable RAG applications using Knowledge Bases for Amazon Bedrock

Building scalable, secure, and reliable RAG applications using Knowledge Bases for Amazon Bedrock

Generative artificial intelligence (AI) has gained significant momentum with organizations actively exploring its potential applications. As successful proof-of-concepts transition into production, organizations are increasingly in need of enterprise scalable solutions. However, to unlock the long-term success and viability of these AI-powered solutions, it is crucial to align them with well-established architectural principles.

The AWS Well-Architected Framework provides best practices and guidelines for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. Aligning generative AI applications with this framework is essential for several reasons, including providing scalability, maintaining security and privacy, achieving reliability, optimizing costs, and streamlining operations. Embracing these principles is critical for organizations seeking to use the power of generative AI and drive innovation.

This post explores the new enterprise-grade features for Knowledge Bases on Amazon Bedrock and how they align with the AWS Well-Architected Framework. With Knowledge Bases for Amazon Bedrock, you can quickly build applications using Retrieval Augmented Generation (RAG) for use cases like question answering, contextual chatbots, and personalized search.

Here are some features which we will cover:

  1. AWS CloudFormation support
  2. Private network policies for Amazon OpenSearch Serverless
  3. Multiple S3 buckets as data sources
  4. Service Quotas support
  5. Hybrid search, metadata filters, custom prompts for the RetreiveAndGenerate API, and maximum number of retrievals.

AWS Well-Architected design principles

RAG-based applications built using Knowledge Bases for Amazon Bedrock can greatly benefit from following the AWS Well-Architected Framework. This framework has six pillars that help organizations make sure their applications are secure, high-performing, resilient, efficient, cost-effective, and sustainable:

  • Operational Excellence – Well-Architected principles streamline operations, automate processes, and enable continuous monitoring and improvement of generative AI app performance.
  • Security – Implementing strong access controls, encryption, and monitoring helps secure sensitive data used in your organization’s knowledge base and prevent misuse of generative AI.
  • Reliability – Well-Architected principles guide the design of resilient and fault-tolerant systems, providing consistent value delivery to users.
  • Performance Optimization – Choosing the appropriate resources, implementing caching strategies, and proactively monitoring performance metrics ensure that applications deliver fast and accurate responses, leading to optimal performance and an enhanced user experience.
  • Cost Optimization – Well-Architected guidelines assist in optimizing resource usage, using cost-saving services, and monitoring expenses, resulting in long-term viability of generative AI projects.
  • Sustainability – Well-Architected principles promote efficient resource utilization and minimizing carbon footprints, addressing the environmental impact of growing generative AI usage.

By aligning with the Well-Architected Framework, organizations can effectively build and manage enterprise-grade RAG applications using Knowledge Bases for Amazon Bedrock. Now, let’s dive deep into the new features launched within Knowledge Bases for Amazon Bedrock.

AWS CloudFormation support

For organizations building RAG applications, it’s important to provide efficient and effective operations and consistent infrastructure across different environments. This can be achieved by implementing practices such as automating deployment processes. To accomplish this, Knowledge Bases for Amazon Bedrock now offers support for AWS CloudFormation.

With AWS CloudFormation and the AWS Cloud Development Kit (AWS CDK), you can now create, update, and delete knowledge bases and associated data sources. Adopting AWS CloudFormation and the AWS CDK for managing knowledge bases and associated data sources not only streamlines the deployment process, but also promotes adherence to the Well-Architected principles. By performing operations (applications, infrastructure) as code, you can provide consistent and reliable deployments in multiple AWS accounts and AWS Regions, and maintain versioned and auditable infrastructure configurations.

The following is a sample CloudFormation script in JSON format for creating and updating a knowledge base in Amazon Bedrock:

{
    "Type" : "AWS::Bedrock::KnowledgeBase", 
    "Properties" : {
        "Name": String,
        "RoleArn": String,
        "Description": String,
        "KnowledgeBaseConfiguration": {
  		"Type" : String,
  		"VectorKnowledgeBaseConfiguration" : VectorKnowledgeBaseConfiguration
},
        "StorageConfiguration": StorageConfiguration,            
    } 
}

Type specifies a knowledge base as a resource in a top-level template. Minimally, you must specify the following properties:

  • Name – Specify a name for the knowledge base.
  • RoleArn – Specify the Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role with permissions to invoke API operations on the knowledge base. For more information, see Create a service role for Knowledge bases for Amazon Bedrock.
  • KnowledgeBaseConfiguration – Specify the embeddings configuration of the knowledge base. The following sub-properties are required:
    • Type – Specify the value VECTOR.
    • VectorKnowledgeBaseConfiguration – Contains details about the model used to create vector embeddings for the knowledge base.
  • StorageConfiguration – Specify information about the vector store in which the data source is stored. The following sub-properties are required:
    • Type – Specify the vector store service that you are using.
    • You would also need to select one of the vector stores supported by Knowledge Bases such OpenSearchServerless, Pinecone or Amazon PostgreSQL and provide configuration for the selected vector store.

For details on all the fields and providing configuration of various vector stores supported by Knowledge Bases for Amazon Bedrock, refer to AWS::Bedrock::KnowledgeBase.

Redis Enterprise Cloud vector stores are not supported as of this writing in AWS CloudFormation. For latest information, please refer to the documentation above.

After you create a knowledge base, you need to create a data source from the Amazon Simple Storage Service (Amazon S3) bucket containing the files for your knowledge base. It calls the CreateDataSource and DeleteDataSource APIs.

The following is the sample CloudFormation script in JSON format:

{
    "Type" : "AWS::Bedrock::DataSource", 
    "Properties" : {
        "KnowledgeBaseId": String,
        "Name": String,
        "RoleArn": String,
        "Description": String,
        "DataSourceConfiguration": {
  		"S3Configuration" : S3DataSourceConfiguration,
  		"Type" : String
},
ServerSideEncryptionConfiguration":ServerSideEncryptionConfiguration,           
"VectorIngestionConfiguration": VectorIngestionConfiguration
    } 
}

Type specifies a data source as a resource in a top-level template. Minimally, you must specify the following properties:

  • Name – Specify a name for the data source.
  • KnowledgeBaseId – Specify the ID of the knowledge base for the data source to belong to.
  • DataSourceConfiguration – Specify information about the S3 bucket containing the data source. The following sub-properties are required:
    • Type – Specify the value S3.
    • S3Configuration – Contains details about the configuration of the S3 object containing the data source.
  • VectorIngestionConfiguration – Contains details about how to ingest the documents in a data source. You need to provide “ChunkingConfiguration” where you can define your chunking strategy.
  • ServerSideEncryptionConfiguration – Contains the configuration for server-side encryption, where you can provide the Amazon Resource Name (ARN) of the AWS KMS key used to encrypt the resource.

For more information about setting up data sources in Amazon Bedrock, see Set up a data source for your knowledge base.

Note: You cannot change the chunking configuration after you create the data source.

The CloudFormation template allows you to define and manage your knowledge base resources using infrastructure as code (IaC). By automating the setup and management of the knowledge base, you can provide a consistent infrastructure across different environments. This approach aligns with the Operational Excellence pillar, which emphasizes performing operations as code. By treating your entire workload as code, you can automate processes, create consistent responses to events, and ultimately reduce human errors.

Private network policies for Amazon OpenSearch Serverless

For companies building RAG applications, it’s critical that the data remains secure and the network traffic does not go to public internet. To support this, Knowledge Bases for Amazon Bedrock now supports private network policies for Amazon OpenSearch Serverless.

Knowledge Bases for Amazon Bedrock provides an option for using OpenSearch Serverless as a vector store. You can now access OpenSearch Serverless collections that have a private network policy, which further enhances the security posture for your RAG application. To achieve this, you need to create an OpenSearch Serverless collection and configure it for private network access. First, create a vector index within the collection to store the embeddings. Then, while creating the collection, set Network access settings to Private and specify the VPC endpoint for access. Importantly, you can now provide private network access to OpenSearch Serverless collections specifically for Amazon Bedrock. To do this, select AWS service private access and specify bedrock.amazonaws.com as the service.

This private network configuration makes sure that your embeddings are stored securely and are only accessible by Amazon Bedrock, enhancing the overall security and privacy of your knowledge bases. It aligns closely with the Security Pillar of controlling traffic at all layers, because all network traffic is kept within the AWS backbone with these settings.

So far, we have explored the automation of creating, deleting, and updating knowledge base resources and the enhanced security through private network policies for OpenSearch Serverless to store vector embeddings securely. Now, let’s understand how to build more reliable, comprehensive, and cost-optimized RAG applications.

Multiple S3 buckets as data sources

Knowledge Bases for Amazon Bedrock now supports adding multiple S3 buckets as data sources within single knowledge base, including cross-account access. This enhancement increases the knowledge base’s comprehensiveness and accuracy by allowing users to aggregate and use information from various sources seamlessly.

The following are key features:

  • Multiple S3 buckets – Knowledge Bases for Amazon Bedrock can now incorporate data from multiple S3 buckets, enabling users to combine and use information from different sources effortlessly. This feature promotes data diversity and makes sure that relevant information is readily available for RAG-based applications.
  • Cross-account data access – Knowledge Bases for Amazon Bedrock supports the configuration of S3 buckets as data sources across different accounts. You can provide the necessary credentials to access these data sources, expanding the range of information that can be incorporated into their knowledge bases.
  • Efficient data management – When a data source or knowledge base is deleted, the related or existing items in the vector stores are automatically removed. This feature makes sure that the knowledge base remains up to date and free from obsolete or irrelevant data, maintaining the integrity and accuracy of the RAG process.

By supporting multiple S3 buckets as data sources, the need for creating multiple knowledge bases or redundant data copies is eliminated, thereby optimizing cost and promoting cloud financial management. Furthermore, the cross-account access capabilities enable the development of resilient architectures, aligning with the Reliability pillar of the AWS Well-Architected Framework, providing high availability and fault tolerance.

Other recently announced features for Knowledge Bases

To further enhance the reliability of your RAG application, Knowledge Bases for Amazon Bedrock now extends support for Service Quotas. This feature provides a single pane of glass to view applied AWS quota values and usage. For example, you now have quick access to information such as the allowed number of `RetrieveAndGenerate API requests per second.

This feature allows you to effectively manage resource quotas, prevent overprovisioning, and limit API request rates to safeguard services from potential abuse.

You can also enhance your application’s performance by using recently announced features like hybrid search, filtering based on metadata, custom prompts for the RetreiveAndGenerate API, and maximum number of retrievals. These features collectively improve the accuracy, relevance, and consistency of generated responses, and align with the Performance Efficiency pillar of the AWS Well-Architected Framework.

Knowledge Bases for Amazon Bedrock aligns with the Sustainability pillar of the AWS Well-Architected Framework by using managed services and optimizing resource utilization. As a fully managed service, Knowledge Bases for Amazon Bedrock removes the burden of provisioning, managing, and scaling the underlying infrastructure, thereby reducing the environmental impact associated with operating and maintaining these resources.

Additionally, by aligning with the AWS Well-Architected principles, organizations can design and operate their RAG applications in a sustainable manner. Practices such as automating deployments through AWS CloudFormation, implementing private network policies for secure data access, and using efficient services like OpenSearch Serverless contribute to minimizing the environmental impact of these workloads.

Overall, Knowledge Bases for Amazon Bedrock, combined with the AWS Well-Architected Framework, empowers organizations to build scalable, secure, and reliable RAG applications while prioritizing environmental sustainability through efficient resource utilization and the adoption of managed services.

Conclusion

The new enterprise-grade features, such as AWS CloudFormation support, private network policies, the ability to use multiple S3 buckets as data sources, and support for Service Quotas, make it straightforward to build scalable, secure, and reliable RAG applications with Knowledge Bases for Amazon Bedrock. Using AWS managed services and following Well-Architected best practices allows organizations to focus on delivering innovative generative AI solutions while providing operational excellence, robust security, and efficient resource utilization. As you build applications on AWS, aligning RAG applications with the AWS Well-Architected Framework provides a solid foundation for building enterprise-grade solutions that drive business value while adhering to industry standards.

For additional resources, refer to the following:


About the authors

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Nitin Eusebius is a Sr. Enterprise Solutions Architect at AWS, experienced in Software Engineering, Enterprise Architecture, and AI/ML. He is deeply passionate about exploring the possibilities of generative AI. He collaborates with customers to help them build well-architected applications on the AWS platform, and is dedicated to solving technology challenges and assisting with their cloud journey.

Pallavi Nargund is a Principal Solutions Architect at AWS. In her role as a cloud technology enabler, she works with customers to understand their goals and challenges, and give prescriptive guidance to achieve their objective with AWS offerings. She is passionate about women in technology and is a core member of Women in AI/ML at Amazon. She speaks at internal and external conferences such as AWS re:Invent, AWS Summits, and webinars. Outside of work she enjoys volunteering, gardening, cycling and hiking.

Read More

Integrate HyperPod clusters with Active Directory for seamless multi-user login

Integrate HyperPod clusters with Active Directory for seamless multi-user login

Amazon SageMaker HyperPod is purpose-built to accelerate foundation model (FM) training, removing the undifferentiated heavy lifting involved in managing and optimizing a large training compute cluster. With SageMaker HyperPod, you can train FMs for weeks and months without disruption.

Typically, HyperPod clusters are used by multiple users: machine learning (ML) researchers, software engineers, data scientists, and cluster administrators. They edit their own files, run their own jobs, and want to avoid impacting each other’s work. To achieve this multi-user environment, you can take advantage of Linux’s user and group mechanism and statically create multiple users on each instance through lifecycle scripts. The drawback to this approach, however, is that user and group settings are duplicated across multiple instances in the cluster, making it difficult to configure them consistently on all instances, such as when a new team member joins.

To solve this pain point, we can use Lightweight Directory Access Protocol (LDAP) and LDAP over TLS/SSL (LDAPS) to integrate with a directory service such as AWS Directory Service for Microsoft Active Directory. With the directory service, you can centrally maintain users and groups, and their permissions.

In this post, we introduce a solution to integrate HyperPod clusters with AWS Managed Microsoft AD, and explain how to achieve a seamless multi-user login environment with a centrally maintained directory.

Solution overview

The solution uses the following AWS services and resources:

We also use AWS CloudFormation to deploy a stack to create the prerequisites for the HyperPod cluster: VPC, subnets, security group, and Amazon FSx for Lustre volume.

The following diagram illustrates the high-level solution architecture.

Architecture diagram for HyperPod and Active Directory integration

In this solution, HyperPod cluster instances use the LDAPS protocol to connect to the AWS Managed Microsoft AD via an NLB. We use TLS termination by installing a certificate to the NLB. To configure LDAPS in HyperPod cluster instances, the lifecycle script installs and configures System Security Services Daemon (SSSD)—an open source client software for LDAP/LDAPS.

Prerequisites

This post assumes you already know how to create a basic HyperPod cluster without SSSD. For more details on how to create HyperPod clusters, refer to Getting started with SageMaker HyperPod and the HyperPod workshop.

Also, in the setup steps, you will use a Linux machine to generate a self-signed certificate and obtain an obfuscated password for the AD reader user. If you don’t have a Linux machine, you can create an EC2 Linux instance or use AWS CloudShell.

Create a VPC, subnets, and a security group

Follow the instructions in the Own Account section of the HyperPod workshop. You will deploy a CloudFormation stack and create prerequisite resources such as VPC, subnets, security group, and FSx for Lustre volume. You need to create both a primary subnet and backup subnet when deploying the CloudFormation stack, because AWS Managed Microsoft AD requires at least two subnets with different Availability Zones.

In this post, for simplicity, we use the same VPC, subnets, and security group for both the HyperPod cluster and directory service. If you need to use different networks between the cluster and directory service, make sure security groups and route tables are configured so that they can communicate each other.

Create AWS Managed Microsoft AD on Directory Service

Complete the following steps to set up your directory:

  1. On the Directory Service console, choose Directories in the navigation pane.
  2. Choose Set up directory.
  3. For Directory type, select AWS Managed Microsoft AD.
  4. Choose Next.
    Directory type selection screen
  5. For Edition, select Standard Edition.
  6. For Directory DNS name, enter your preferred directory DNS name (for example, hyperpod.abc123.com).
  7. For Admin password¸ set a password and save it for later use.
  8. Choose Next.
    Directory creation configuration screen
  9. In the Networking section, specify the VPC and two private subnets you created.
  10. Choose Next.
    Directory network configuration screen
  11. Review the configuration and pricing, then choose Create directory.
    Directory creation confirmation screen
    The directory creation starts. Wait until the status changes from Creating to Active, which can take 20–30 minutes.
  12. When the status changes to Active, open the detail page of the directory and take note of the DNS addresses for later use.Directory details screen

Create an NLB in front of Directory Service

To create the NLB, complete the following steps:

  1. On the Amazon EC2 console, choose Target groups in the navigation pane.
  2. Choose Create target groups.
  3. Create a target group with the following parameters:
    1. For Choose a target type, select IP addresses.
    2. For Target group name, enter LDAP.
    3. For Protocol: Port, choose TCP and enter 389.
    4. For IP address type, select IPv4.
    5. For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
    6. For Health check protocol, choose TCP.
  4. Choose Next.
    Load balancing target creation configuration screen
  5. In the Register targets section, register the directory service’s DNS addresses as the targets.
  6. For Ports, choose Include as pending below.Load balancing target registration screenThe addresses are added in the Review targets section with Pending status.
  7. Choose Create target group.Load balancing target review screen
  8. On the Load Balancers console, choose Create load balancer.
  9. Under Network Load Balancer, choose Create.Load balancer type choosing screen
  10. Configure an NLB with the following parameters:
    1. For Load balancer name, enter a name (for example, nlb-ds).
    2. For Scheme, select Internal.
    3. For IP address type, select IPv4.NLB creation basic configuration section
    4. For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
    5. Under Mappings, select the two private subnets and their CIDR ranges (which you created with the CloudFormation template).
    6. For Security groups, choose CfStackName-SecurityGroup-XYZXYZ (which you created with the CloudFormation template).NLB creation network mapping and security groups configurations
  11. In the Listeners and routing section, specify the following parameters:
    1. For Protocol, choose TCP.
    2. For Port, enter 389.
    3. For Default action, choose the target group named LDAP.

    Here, we are adding a listener for LDAP. We will add LDAPS later.

  12. Choose Create load balancer.NLB listeners routing configuration screenWait until the status changes from Provisioning to Active, which can take 3–5 minutes.
  13. When the status changes to Active, open the detail page of the provisioned NLB and take note of the DNS name (xyzxyz.elb.region-name.amazonaws.com) for later use.NLB details screen

Create a self-signed certificate and import it to Certificate Manager

To create a self-signed certificate, complete the following steps:

  1. On your Linux-based environment (local laptop, EC2 Linux instance, or CloudShell), run the following OpenSSL commands to create a self-signed certificate and private key:
    $ openssl genrsa 2048 > ldaps.key
    
    $ openssl req -new -key ldaps.key -out ldaps_server.csr
    
    You are about to be asked to enter information that will be incorporated
    into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN.
    There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.
    -----
    Country Name (2 letter code) [AU]:US
    State or Province Name (full name) [Some-State]:Washington
    Locality Name (eg, city) []:Bellevue
    Organization Name (eg, company) [Internet Widgits Pty Ltd]:CorpName
    Organizational Unit Name (eg, section) []:OrgName
    Common Name (e.g., server FQDN or YOUR name) []:nlb-ds-abcd1234.elb.region.amazonaws.com
    Email Address []:your@email.address.com
    
    Please enter the following 'extra' attributes
    to be sent with your certificate request
    A challenge password []:
    An optional company name []:
    
    $ openssl x509 -req -sha256 -days 365 -in ldaps_server.csr -signkey ldaps.key -out ldaps.crt
    
    Certificate request self-signature ok
    subject=C = US, ST = Washington, L = Bellevue, O = CorpName, OU = OrgName, CN = nlb-ds-abcd1234.elb.region.amazonaws.com, emailAddress = your@email.address.com
    
    $ chmod 600 ldaps.key

  2. On the Certificate Manager console, choose Import.
  3. Enter the certificate body and private key, from the contents of ldaps.crt and ldaps.key respectively.
  4. Choose Next.Certificate importing screen
  5. Add any optional tags, then choose Next.Certificate tag editing screen
  6. Review the configuration and choose Import.Certificate import review screen

Add an LDAPS listener

We added a listener for LDAP already in the NLB. Now we add a listener for LDAPS with the imported certificate. Complete the following steps:

  1. On the Load Balancers console, navigate to the NLB details page.
  2. On the Listeners tab, choose Add listener.NLB listers screen with add listener button
  3. Configure the listener with the following parameters:
    1. For Protocol, choose TLS.
    2. For Port, enter 636.
    3. For Default action, choose LDAP.
    4. For Certificate source, select From ACM.
    5. For Certificate, enter what you imported in ACM.
  4. Choose Add.NLB listener configuration screenNow the NLB listens to both LDAP and LDAPS. It is recommended to delete the LDAP listener because it transmits data without encryption, unlike LDAPS.NLB listerners list with LDAP and LDAPS

Create an EC2 Windows instance to administer users and groups in the AD

To create and maintain users and groups in the AD, complete the following steps:

  1. On the Amazon EC2 console, choose Instances in the navigation pane.
  2. Choose Launch instances.
  3. For Name, enter a name for your instance.
  4. For Amazon Machine Image, choose Microsoft Windows Server 2022 Base.
  5. For Instance type, choose t2.micro.
  6. In the Network settings section, provide the following parameters:
    1. For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
    2. For Subnet, choose either of two subnets you created with the CloudFormation template.
    3. For Common security groups, choose CfStackName-SecurityGroup-XYZXYZ (which you created with the CloudFormation template).
  7. For Configure storage, set storage to 30 GB gp2.
  8. In the Advanced details section, for Domain join directory¸ choose the AD you created.
  9. For IAM instance profile, choose an AWS Identity and Access Management (IAM) role with at least the AmazonSSMManagedEC2InstanceDefaultPolicy policy.
  10. Review the summary and choose Launch instance.

Create users and groups in AD using the EC2 Windows instance

With Remote Desktop, connect to the EC2 Windows instance you created in the previous step. Using an RDP client is recommended over using a browser-based Remote Desktop so that you can exchange the contents of the clipboard with your local machine using copy-paste operations. For more details about connecting to EC2 Windows instances, refer to Connect to your Windows instance.

If you are prompted for a login credential, use hyperpodAdmin (where hyperpod is the first part of your directory DNS name) as the user name, and use the admin password you set to the directory service.

  1. When the Windows desktop screen opens, choose Server Manager from the Start menu.Dashboard screen on Server Manager
  2. Choose Local Server in the navigation pane, and confirm that the domain is what you specified to the directory service.Local Server screen on Server Manager
  3. On the Manage menu, choose Add Roles and Features.Drop down menu opened from Manage button
  4. Choose Next until you are at the Features page.Add Roles and Features Wizard
  5. Expand the feature Remote Server Administration Tools, expand Role Administration Tools, and select AD DS and AD LDS Tools and Active Directory Rights Management Service.
  6. Choose Next and Install.Features selection screenFeature installation starts.
  7. When the installation is complete, choose Close.Feature installation progress screen
  8. Open Active Directory Users and Computers from the Start menu.Active Directory Users and Computers window
  9. Under hyperpod.abc123.com, expand hyperpod.
  10. Choose (right-click) hyperpod, choose New, and choose Organizational Unit.Context menu opened to create an Organizational Unit
  11. Create an organizational unit called Groups.Organizational Unit ceation dialog
  12. Choose (right-click) Groups, choose New, and choose Group.Context menu opened to create groups
  13. Create a group called ClusterAdmin.Group creation dialog for ClusterAdmin
  14. Create a second group called ClusterDev.Group creation dialog for ClusterDev
  15. Choose (right-click) Users, choose New, and choose User.
  16. Create a new user.User creation dialog
  17. Choose (right-click) the user and choose Add to a group.Context menu opened to add a user to a group
  18. Add your users to the groups ClusterAdmin or ClusterDev.Group selection screen to add a user to a groupUsers added to the ClusterAdmin group will have sudo privilege on the cluster.

Create a ReadOnly user in AD

Create a user called ReadOnly under Users. The ReadOnly user is used by the cluster to programmatically access users and groups in AD.

User creation dialog to create ReadOnly user

Take note of the password for later use.

Password entering screen for ReadOnly user

(For SSH public key authentication) Add SSH public keys to users

By storing an SSH public key to a user in AD, you can log in without entering a password. You can use an existing key pair, or you can create a new key pair with OpenSSH’s ssh-keygen command. For more information about generating a key pair, refer to Create a key pair for your Amazon EC2 instance.

  1. In Active Directory Users and Computers, on the View menu, enable Advanced Features.View menu opened to enable Advanced Features
  2. Open the Properties dialog of the user.
  3. On the Attribute Editor tab, choose altSecurityIdentities choose Edit.Attribute Editor tab on User Properties dialog
  4. For Value to add, choose Add.
  5. For Values, add an SSH public key.
  6. Choose OK.Attribute editing dialog for altSecurityIdentitiesConfirm that the SSH public key appears as an attribute.Attribute Editor tab with altSecurityIdentities configured

Get an obfuscated password for the ReadOnly user

To avoid including a plain text password in the SSSD configuration file, you obfuscate the password. For this step, you need a Linux environment (local laptop, EC2 Linux instance, or CloudShell).

Install the sssd-tools package on the Linux machine to install the Python module pysss for obfuscation:

# Ubuntu
$ sudo apt install sssd-tools

# Amazon Linux
$ sudo yum install sssd-tools

Run the following one-line Python script. Input the password of the ReadOnly user. You will get the obfuscated password.

$ python3 -c "import getpass,pysss; print(pysss.password().encrypt(getpass.getpass('AD reader user password: ').strip(), pysss.password().AES_256))"
AD reader user password: (Enter ReadOnly user password) 
AAAQACK2....

Create a HyperPod cluster with an SSSD-enabled lifecycle script

Next, you create a HyperPod cluster with LDAPS/Active Directory integration.

  1. Find the configuration file config.py in your lifecycle script directory, open it with your text editor, and edit the properties in the Config class and SssdConfig class:
    1. Set True for enable_sssd to enable setting up SSSD.
    2. The SssdConfig class contains configuration parameters for SSSD.
    3. Make sure you use the obfuscated password for the ldap_default_authtok property, not a plain text password.
    # Basic configuration parameters
    class Config:
             :
        # Set true if you want to install SSSD for ActiveDirectory/LDAP integration.
        # You need to configure parameters in SssdConfig as well.
        enable_sssd = True
    # Configuration parameters for ActiveDirectory/LDAP/SSSD
    class SssdConfig:
    
        # Name of domain. Can be default if you are not sure.
        domain = "default"
    
        # Comma separated list of LDAP server URIs
        ldap_uri = "ldaps://nlb-ds-xyzxyz.elb.us-west-2.amazonaws.com"
    
        # The default base DN to use for performing LDAP user operations
        ldap_search_base = "dc=hyperpod,dc=abc123,dc=com"
    
        # The default bind DN to use for performing LDAP operations
        ldap_default_bind_dn = "CN=ReadOnly,OU=Users,OU=hyperpod,DC=hyperpod,DC=abc123,DC=com"
    
        # "password" or "obfuscated_password". Obfuscated password is recommended.
        ldap_default_authtok_type = "obfuscated_password"
    
        # You need to modify this parameter with the obfuscated password, not plain text password
        ldap_default_authtok = "placeholder"
    
        # SSH authentication method - "password" or "publickey"
        ssh_auth_method = "publickey"
    
        # Home directory. You can change it to "/home/%u" if your cluster doesn't use FSx volume.
        override_homedir = "/fsx/%u"
    
        # Group names to accept SSH login
        ssh_allow_groups = {
            "controller" : ["ClusterAdmin", "ubuntu"],
            "compute" : ["ClusterAdmin", "ClusterDev", "ubuntu"],
            "login" : ["ClusterAdmin", "ClusterDev", "ubuntu"],
        }
    
        # Group names for sudoers
        sudoers_groups = {
            "controller" : ["ClusterAdmin", "ClusterDev"],
            "compute" : ["ClusterAdmin", "ClusterDev"],
            "login" : ["ClusterAdmin", "ClusterDev"],
        }
    

  2. Copy the certificate file ldaps.crt to the same directory (where config.py exists).
  3. Upload the modified lifecycle script files to your Amazon Simple Storage Service (Amazon S3) bucket, and create a HyperPod cluster with it.
  4. Wait until the status changes to InService.

Verification

Let’s verify the solution by logging in to the cluster with SSH. Because the cluster was created in a private subnet, you can’t directly SSH into the cluster from your local environment. You can choose from two options to connect to the cluster.

Option 1: SSH login through AWS Systems Manager

You can use AWS Systems Manager as a proxy for the SSH connection. Add a host entry to the SSH configuration file ~/.ssh/config using the following example. For the HostName field, specify the Systems Manger target name in the format of sagemaker-cluster:[cluster-id]_[instance-group-name]-[instance-id]. For the IdentityFile field, specify the file path to the user’s SSH private key. This field is not required if you chose password authentication.

Host MyCluster-LoginNode
    HostName sagemaker-cluster:abcd1234_LoginGroup-i-01234567890abcdef
    User user1
    IdentityFile ~/keys/my-cluster-ssh-key.pem
    ProxyCommand aws --profile default --region us-west-2 ssm start-session --target %h --document-name AWS-StartSSHSession --parameters portNumber=%p

Run the ssh command using the host name you specified. Confirm you can log in to the instance with the specified user.

$ ssh MyCluster-LoginNode
   :
   :
   ____              __  ___     __             __ __                  ___          __
  / __/__ ____ ____ /  |/  /__ _/ /_____ ____  / // /_ _____  ___ ____/ _ ___  ___/ /
 _ / _ `/ _ `/ -_) /|_/ / _ `/  '_/ -_) __/ / _  / // / _ / -_) __/ ___/ _ / _  /
/___/_,_/_, /__/_/  /_/_,_/_/_\__/_/   /_//_/_, / .__/__/_/ /_/   ___/_,_/
         /___/                                    /___/_/
You're on the controller
Instance Type: ml.m5.xlarge
user1@ip-10-1-111-222:~$

At this point, users can still use the Systems Manager default shell session to log in to the cluster as ssm-user with administrative privileges. To block the default Systems Manager shell access and enforce SSH access, you can configure your IAM policy by referring to the following example:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ssm:StartSession",
                "ssm:TerminateSession"
            ],
            "Resource": [
                "arn:aws:sagemaker:us-west-2:123456789012:cluster/abcd1234efgh",
                "arn:aws:ssm:us-west-2:123456789012:document/AWS-StartSSHSession"
            ],
            "Condition": {
                "BoolIfExists": {
                    "ssm:SessionDocumentAccessCheck": "true"
                }
            }
        }
    ]
}

For more details on how to enforce SSH access, refer to Start a session with a document by specifying the session documents in IAM policies.

Option 2: SSH login through bastion host

Another option to access the cluster is to use a bastion host as a proxy. You can use this option when the user doesn’t have permission to use Systems Manager sessions, or to troubleshoot when Systems Manager is not working.

  1. Create a bastion security group that allows inbound SSH access (TCP port 22) from your local environment.
  2. Update the security group for the cluster to allow inbound SSH access from the bastion security group.
  3. Create an EC2 Linux instance.
  4. For Amazon Machine Image, choose Ubuntu Server 20.04 LTS.
  5. For Instance type, choose t3.small.
  6. In the Network settings section, provide the following parameters:
    1. For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
    2. For Subnet, choose the public subnet you created with the CloudFormation template.
    3. For Common security groups, choose the bastion security group you created.
  7. For Configure storage, set storage to 8 GB.
  8. Identify the public IP address of the bastion host and the private IP address of the target instance (for example, the login node of the cluster), and add two host entries in the SSH config, by referring to the following example:
    Host Bastion
        HostName 11.22.33.44
        User ubuntu
        IdentityFile ~/keys/my-bastion-ssh-key.pem
    
    Host MyCluster-LoginNode-with-Proxy
        HostName 10.1.111.222
        User user1
        IdentityFile ~/keys/my-cluster-ssh-key.pem
        ProxyCommand ssh -q -W %h:%p Bastion

  9. Run the ssh command using the target host name you specified earlier, and confirm you can log in to the instance with the specified user:
    $ ssh MyCluster-LoginNode-with-Proxy
       :
       :
       ____              __  ___     __             __ __                  ___          __
      / __/__ ____ ____ /  |/  /__ _/ /_____ ____  / // /_ _____  ___ ____/ _ ___  ___/ /
     _ / _ `/ _ `/ -_) /|_/ / _ `/  '_/ -_) __/ / _  / // / _ / -_) __/ ___/ _ / _  /
    /___/_,_/_, /__/_/  /_/_,_/_/_\__/_/   /_//_/_, / .__/__/_/ /_/   ___/_,_/
             /___/                                    /___/_/
    You're on the controller
    Instance Type: ml.m5.xlarge
    user1@ip-10-1-111-222:~$

Clean up

Clean up the resources in the following order:

  1. Delete the HyperPod cluster.
  2. Delete the Network Load Balancer.
  3. Delete the load balancing target group.
  4. Delete the certificate imported to Certificate Manager.
  5. Delete the EC2 Windows instance.
  6. Delete the EC2 Linux instance for the bastion host.
  7. Delete the AWS Managed Microsoft AD.
  8. Delete the CloudFormation stack for the VPC, subnets, security group, and FSx for Lustre volume.

Conclusion

This post provided steps to create a HyperPod cluster integrated with Active Directory. This solution removes the hassle of user maintenance on large-scale clusters and allows you to manage users and groups centrally in one place.

For more information about HyperPod, check out the HyperPod workshop and the SageMaker HyperPod Developer Guide. Leave your feedback on this solution in the comments section.


About the Authors

Tomonori Shimomura is a Senior Solutions Architect on the Amazon SageMaker team, where he provides in-depth technical consultation to SageMaker customers and suggests product improvements to the product team. Before joining Amazon, he worked on the design and development of embedded software for video game consoles, and now he leverages his in-depth skills in Cloud side technology. In his free time, he enjoys playing video games, reading books, and writing software.

Giuseppe Angelo Porcelli is a Principal Machine Learning Specialist Solutions Architect for Amazon Web Services. With several years software engineering and an ML background, he works with customers of any size to understand their business and technical needs and design AI and ML solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. He has worked on projects in different domains, including MLOps, computer vision, and NLP, involving a broad set of AWS services. In his free time, Giuseppe enjoys playing football.

Monidipa Chakraborty currently serves as a Senior Software Development Engineer at Amazon Web Services (AWS), specifically within the SageMaker HyperPod team. She is committed to assisting customers by designing and implementing robust and scalable systems that demonstrate operational excellence. Bringing nearly a decade of software development experience, Monidipa has contributed to various sectors within Amazon, including Video, Retail, Amazon Go, and AWS SageMaker.

Satish Pasumarthi is a Software Developer at Amazon Web Services. With several years of software engineering and an ML background, he loves to bridge the gap between the ML and systems and is passionate to build systems that make large scale model training possible. He has worked on projects in a variety of domains, including Machine Learning frameworks, model benchmarking, building hyperpod beta involving a broad set of AWS services. In his free time, Satish enjoys playing badminton.

Read More

The executive’s guide to generative AI for sustainability

The executive’s guide to generative AI for sustainability

Organizations are facing ever-increasing requirements for sustainability goals alongside environmental, social, and governance (ESG) practices. A Gartner, Inc. survey revealed that 87 percent of business leaders expect to increase their organization’s investment in sustainability over the next years. This post serves as a starting point for any executive seeking to navigate the intersection of generative artificial intelligence (generative AI) and sustainability. It provides examples of use cases and best practices for using generative AI’s potential to accelerate sustainability and ESG initiatives, as well as insights into the main operational challenges of generative AI for sustainability. This guide can be used as a roadmap for integrating generative AI effectively within sustainability strategies while ensuring alignment with organizational objectives.

A roadmap to generative AI for sustainability

In the sections that follow, we provide a roadmap for integrating generative AI into sustainability initiatives

1. Understand the potential of generative AI for sustainability

Generative AI has the power to transform every part of a business with its wide range of capabilities. These include the ability to analyze massive amounts of data, identify patterns, summarize documents, perform translations, correct errors, or answer questions. These capabilities can be used to add value throughout the entire value chain of your organization. Figure 1 illustrates selected examples of use cases of generative AI for sustainability across the value chain.

Figure 1: Examples of generative AI for sustainability use cases across the value chain

According to KPMG’s 2024 ESG Organization Survey, investment in ESG capabilities is another top priority for executives as organizations face increasing regulatory pressure to disclose information about ESG impacts, risks, and opportunities. Within this context, you can use generative AI to advance your organization’s ESG goals.

The typical ESG workflow consists of multiple phases, each presenting unique pain points. Generative AI offers solutions that can address these pain points throughout the process and contribute to sustainability efforts. Figure 2 provides examples illustrating how generative AI can support each phase of the ESG workflow within your organization. These examples include speeding up market trend analysis, ensuring accurate risk management and compliance, and facilitating data collection or report generation. Note that ESG workflows may vary across different verticals, organizational maturities, and legislative frameworks. Factors such as industry-specific regulations, company size, and regional policies can influence the ESG workflow steps. Therefore, prioritizing use cases according to your specific needs and context and defining a clear plan to measure success is essential for optimal effectiveness.

Figure 2: Mapping generative AI benefits across the ESG workflow

2. Recognize the operational challenges of generative AI for sustainability

Understanding and appropriately addressing the challenges of implementing generative AI is crucial for organizations aiming to use its potential to address the organization’s sustainability goals and ESG initiatives. These challenges include collecting and managing high-quality data, integrating generative AI into existing IT systems, navigating ethical concerns, filling skills gaps and setting the organization up for success by bringing in key stakeholders such as the chief information security officer (CISO) or chief financial officer (CFO) early so you build responsibly. Legal challenges are a huge blocker for transitioning from proof of concept (POC) to production. Therefore, it’s essential to involve legal teams early in the process to build with compliance in mind. Figure 3 provides an overview of the main operational challenges of generative AI for sustainability.

Figure 3: Operational challenges of generative AI for sustainability

3. Set the right data foundations

As a CEO aiming to use generative AI to achieve sustainability goals, remember that data is your differentiator. Companies that lack ready access to high-quality data will not be able to customize generative AI models with their own data, thus missing out on realizing the full scaling potential of generative AI and creating a competitive advantage. Invest in acquiring diverse and high-quality datasets to enrich and accelerate your ESG initiatives. You can use resources such as the Amazon Sustainability Data Initiative or the AWS Data Exchange to simplify and expedite the acquisition and analysis of comprehensive datasets. Alongside external data acquisition, prioritize internal data management to maximize the potential of generative AI and use its capabilities in analyzing your organizational data and uncovering new insights.

From an operational standpoint, you can embrace foundation model ops (FMOps) and large language model ops (LLMOps) to make sure your sustainability efforts are data-driven and scalable. This involves documenting data lineage, data versioning, automating data processing, and monitoring data management costs.

4. Identify high-impact opportunities

You can use Amazon’s working backwards principle to pinpoint opportunities within your sustainability strategy where generative AI can make a significant impact. Prioritize projects that promise immediate enhancements in key areas within your organization. While ESG remains a key aspect of sustainability, tapping into industry-specific expertise across sectors such as energy, supply chain, and manufacturing, transportation, or agriculture can uncover diverse generative AI for sustainability use cases tailored to your business’s applications. Moreover, exploring alternative avenues, such as using generative AI for improving research and development, enabling customer self-service, optimizing energy usage in buildings or slowing down deforestation, can also provide impactful opportunities for sustainable innovation.

5. Use the right tools

Failing to use the appropriate tools can add complexity, compromise security, and reduce effectiveness in using generative AI for sustainability. The right tool should offer you choice and flexibility and enable you to customize your solutions to specific needs and requirements.

Figure 4 illustrates the AWS generative AI stack as of 2023, which offers a set of capabilities that encompass choice, breadth, and depth across all layers. Moreover, it is built on a data-first approach, ensuring that every aspect of its offerings is designed with security and privacy in mind.

Examples of tools you can use to advance sustainability initiatives are:

Amazon Bedrock – a fully managed service that provides access to high-performing FMs from leading AI companies through a single API, enabling you to choose the right model for your sustainability use cases.

AWS Trainium2 – Purpose-built for high-performance training of FMs and LLMs, Trainium2 provides up to 2x better energy efficiency (performance/watt) compared to first-generation Trainium chips.

Inferentia2-based Amazon EC2 Inf2 instances – These instances offer up to 50 percent better performance/watt over comparable Amazon Elastic Compute Cloud (Amazon EC2) instances. Purpose-built to handle deep learning models at scale, Inf2 instances are indispensable for deploying ultra-large models while meeting sustainability goals through improved energy efficiency.

Figure 4: AWS generative AI stack

6. Use the right approach

Generative AI isn’t a one-size-fits-all solution. Tailoring your approach by choosing the right modality and optimization strategy is crucial for maximizing its impact on sustainability initiatives. Figure 5 offers an overview on generative AI modalities and optimization strategies, including prompt engineering, Retrieval Augmented Generation, and fine-tuning or continued pre-training.

Figure 5: Generative AI modalities

In addition, figure 6 outlines the main generative AI optimization strategies, including prompt engineering, Retrieval Augmented Generation, and fine-tuning or continued pre-training.

Figure 6: Generative AI optimization strategies

7. Simplify the development of your applications by using generative AI agents

Generative AI agents offer a unique opportunity to drive sustainability initiatives forward with their advanced capabilities of automating a wide range of routine and repetitive tasks, such as data entry, customer support inquiries, and content generation. Moreover, they can orchestrate complex, multistep workflows by breaking down tasks into smaller, manageable steps, coordinating various actions, and ensuring the efficient execution of processes within your organization. For example, you can use Agents for Amazon Bedrock to configure an agent that monitors and analyzes energy usage patterns across your operations and identifies opportunities for energy savings. Alternatively, you can create a specialized agent that monitors compliance with sustainability regulations in real time.

8. Build robust feedback mechanisms for evaluation

Take advantage of feedback insights for strategic improvements, whether adjusting generative AI models or redefining objectives to ensure agility and alignment with sustainability challenges. Consider the following guidelines:

Implement real-time monitoring – Set up monitoring systems to track generative AI performance against sustainability benchmarks, focusing on efficiency and environmental impact. Establish a metrics pipeline to provide insights into the sustainability contributions of your generative AI initiatives.

Engage stakeholders for human-in-the-loop evaluation – Rely on human-in-the-loop auditing and regularly collect feedback from internal teams, customers, and partners to gauge the impact of generative AI–driven processes on the organization’s sustainability benchmarks. This enhances transparency and promotes trust in your commitment to sustainability.

Use automated testing for continuous improvement – With tools such as RAGAS and LangSmith, you can use LLM-based evaluation to identify and correct inaccuracies or hallucinations, facilitating rapid optimization of generative AI models in line with sustainability goals.

9. Measure impact and maximize ROI from generative AI for sustainability

Establish clear key performance indicators (KPIs) that capture the environmental impact, such as carbon footprint reduction, alongside economic benefits, such as cost savings or enhanced business agility. This dual focus ensures that your investments not only contribute to programs focused on environmental sustainability but also reinforces the business case for sustainability while empowering you to drive innovation and competitive advantage in sustainable practices. Share success stories internally and externally to inspire others and demonstrate your organization’s commitment to sustainability leadership.

10. Minimize resource usage throughout the generative AI lifecycle

In some cases, generative AI itself can have a high energy cost. To achieve maximum impact, consider the trade-off between the benefits of using generative AI for sustainability initiatives and the energy efficiency of the technology itself. Make sure to gain a deep understanding of the iterative generative AI lifecycle and optimize each phase for environmental sustainability. Typically, the journey into generative AI begins with identifying specific application requirements. From there, you have the option to either train your model from scratch or use an existing one. In most cases, opting for an existing model and customizing it is preferred. Following this step and evaluating your system thoroughly is essential before deployment. Lastly, continuous monitoring enables ongoing refinement and adjustments. Throughout this lifecycle, implementing AWS Well-Architected Framework best practices is recommended. Refer to Figure 7 for an overview of the generative AI lifecycle.

Figure 7: The generative AI lifecycle

11. Manage risks and implement responsibly

While generative AI holds significant promise for working towards your organization’s sustainability goals, it also poses challenges such as toxicity and hallucinations. Striking the right balance between innovation and the responsible use of generative AI is fundamental for mitigating risks and enabling responsible AI innovation. This balance must account for the assessment of risk in terms of several factors such as quality, disclosures, or reporting. To achieve this, adopting specific tools and capabilities and working with your security team experts to adopt security best practices is necessary. Scaling generative AI in a safe and secure manner requires putting in place guardrails that are customized to your use cases and aligned with responsible AI policies.

12. Invest in educating and training your teams

Continuously upskill your team and empower them with the right skills to innovate and actively contribute to achieving your organization’s sustainability goals. Identify relevant resources for sustainability and generative AI to ensure your teams stay updated with the essential skills required in both areas.

Conclusion

In this post, we provided a guide for executives to integrate generative AI into their sustainability strategies, focusing on both sustainability and ESG goals. The adoption of generative AI in sustainability efforts is not just about technological innovation. It is about fostering a culture of responsibility, innovation, and continuous improvement. By prioritizing high-quality data, identifying impactful opportunities, and fostering stakeholders’ engagement, companies can harness the transformative power of generative AI to not only achieve but surpass their sustainability goals.

How can AWS help?

Explore the AWS Solutions Library to discover ways to build sustainability solutions on AWS.

The AWS Generative AI Innovation Center can assist you in the process with expert guidance on ideation, strategic use case identification, execution, and scaling to production.

To learn more about how Amazon is using AI to reach our climate pledge commitment of net-zero carbon by 2040, explore the 7 ways AI is helping Amazon build a more sustainable future and business.


About the Authors

Wafae BakkaliDr. Wafae Bakkali is a Data Scientist at AWS. As a generative AI expert, Wafae is driven by the mission to empower customers in solving their business challenges through the utilization of generative AI techniques, ensuring they do so with maximum efficiency and sustainability.

Dr. Mehdi Noori is a Senior Scientist at AWS Generative AI Innovation Center. With a passion for bridging technology and innovation in the sustainability field, he assists AWS customers in unlocking the potential of Generative AI, turning potential challenges into opportunities for rapid experimentation and innovation. By focusing on scalable, measurable, and impactful uses of advanced AI technologies and streamlining the path to production, he helps customers achieve their sustainability goals.

Rahul Sareen is the GM for Sustainability Solutions and GTM at AWS. Rahul has a team of high performing individuals consisting of sustainability strategists, GTM specialists and technology architects to create great business outcomes for customer’s sustainability goals (everything from carbon emission tracking, sustainable packaging and operations, circular economy to renewable energy). Rahul’s team provides technical expertise (ML, GenAI, IoT) to solve sustainability use cases

Read More

Introducing automatic training for solutions in Amazon Personalize

Introducing automatic training for solutions in Amazon Personalize

Amazon Personalize is excited to announce automatic training for solutions. Solution training is fundamental to maintain the effectiveness of a model and make sure recommendations align with users’ evolving behaviors and preferences. As data patterns and trends change over time, retraining the solution with the latest relevant data enables the model to learn and adapt, enhancing its predictive accuracy. Automatic training generates a new solution version, mitigating model drift and keeping recommendations relevant and tailored to end-users’ current behaviors while including the newest items. Ultimately, automatic training provides a more personalized and engaging experience that adapts to changing preferences.

Amazon Personalize accelerates your digital transformation with machine learning (ML), making it effortless to integrate personalized recommendations into existing websites, applications, email marketing systems, and more. Amazon Personalize enables developers to quickly implement a customized personalization engine, without requiring ML expertise. Amazon Personalize provisions the necessary infrastructure and manages the entire ML pipeline, including processing the data, identifying features, using the appropriate algorithms, and training, optimizing, and hosting the customized models based on your data. All your data is encrypted to be private and secure.

In this post, we guide you through the process of configuring automatic training, so your solutions and recommendations maintain their accuracy and relevance.

Solution overview

A solution refers to the combination of an Amazon Personalize recipe, customized parameters, and one or more solution versions (trained models). When you create a custom solution, you specify a recipe matching your use case and configure training parameters. For this post, you configure automatic training in the training parameters.

Prerequisites

To enable automatic training for your solutions, you first need to set up Amazon Personalize resources. Start by creating a dataset group, schemas, and datasets representing your items, interactions, and user data. For instructions, refer to Getting Started (console) or Getting Started (AWS CLI).

After you finish importing your data, you are ready to create a solution.

Create a solution

To set up automatic training, complete the following steps:

  1. On the Amazon Personalize console, create a new solution.
  2. Specify a name for your solution, choose the type of solution you want to create, and choose your recipe.
  3. Optionally, add any tags. For more information about tagging Amazon Personalize resources, see Tagging Amazon Personalize resources.
  4. To use automatic training, in the Automatic training section, select Turn on and specify your training frequency.

Automatic training is enabled by default to train one time every 7 days. You can configure the training cadence to suit your business needs, ranging from one time every 1–30 days.

  1. If your recipe generates item recommendations or user segments, optionally use the Columns for training section to choose the columns Amazon Personalize considers when training solution versions.
  2. In the Hyperparameter configuration section, optionally configure any hyperparameter options based on your recipe and business needs.
  3. Provide any additional configurations, then choose Next.
  4. Review the solution details and confirm that your automatic training is configured as expected.
  5. Choose Create solution.

Amazon Personalize will automatically create your first solution version. A solution version refers to a trained ML model. When a solution version is created for the solution, Amazon Personalize trains the model backing the solution version based on the recipe and training configuration. It can take up to 1 hour for the solution version creation to start.

The following is sample code for creating a solution with automatic training using the AWS SDK:

import boto3 
personalize = boto3.client('personalize')

solution_config = {
    "autoTrainingConfig": {
        "schedulingExpression": "rate(3 days)"
    }
}

recipe = "arn:aws:personalize:::recipe/aws-similar-items"
name = "test_automatic_training"
response = personalize.create_solution(name=name, recipeArn=recipe_arn, datasetGroupArn=dataset_group_arn, 
                            performAutoTraining=True, solutionConfig=solution_config)

print(response['solutionArn'])
solution_arn = response['solutionArn'])

After a solution is created, you can confirm whether automatic training is enabled on the solution details page.

You can also use the following sample code to confirm via the AWS SDK that automatic training is enabled:

response = personalize.describe_solution(solutionArn=solution_arn)
print(response)

Your response will contain the fields performAutoTraining and autoTrainingConfig, displaying the values you set in the CreateSolution call.

On the solution details page, you will also see the solution versions that are created automatically. The Training type column specifies whether the solution version was created manually or automatically.

You can also use the following sample code to return a list of solution versions for the given solution:

response = personalize.list_solution_versions(solutionArn=solution_arn)['solutionVersions']
print("List Solution Version responsen")
for val in response:
    print(f"SolutionVersion: {val}")
    print("n")

Your response will contain the field trainingType, which specifies whether the solution version was created manually or automatically.

When your solution version is ready, you can create a campaign for your solution version.

Create a campaign

A campaign deploys a solution version (trained model) to generate real-time recommendations. With Amazon Personalize, you can streamline your workflow and automate the deployment of the latest solution version to campaigns via automatic syncing. To set up auto sync, complete the following steps:

  1. On the Amazon Personalize console, create a new campaign.
  2. Specify a name for your campaign.
  3. Choose the solution you just created.
  4. Select Automatically use the latest solution version.
  5. Set the minimum provisioned transactions per second.
  6. Create your campaign.

The campaign is ready when its status is ACTIVE.

The following is sample code for creating a campaign with syncWithLatestSolutionVersion set to true using the AWS SDK. You must also append the suffix $LATEST to the solutionArn in solutionVersionArn when you set syncWithLatestSolutionVersion to true.

campaign_config = {
    "syncWithLatestSolutionVersion": True
}
resource_name = "test_campaign_sync"
solution_version_arn = "arn:aws:personalize:<region>:<accountId>:solution/<solution_name>/$LATEST"
response = personalize.create_campaign(name=resource_name, solutionVersionArn=solution_version_arn, campaignConfig=campaign_config)
campaign_arn = response['campaignArn']
print(campaign_arn)

On the campaign details page, you can see whether the campaign selected has auto sync enabled. When enabled, your campaign will automatically update to use the most recent solution version, whether it was automatically or manually created.

Use the following sample code to confirm via the AWS SDK that syncWithLatestSolutionVersion is enabled:

response = personalize.describe_campaign(campaignArn=campaign_arn)
Print(response)

Your response will contain the field syncWithLatestSolutionVersion under campaignConfig, displaying the value you set in the CreateCampaign call.

You can enable or disable the option to automatically use the latest solution version on the Amazon Personalize console after a campaign is created by updating your campaign. Similarly, you can enable or disable syncWithLatestSolutionVersion with UpdateCampaign using the AWS SDK.

Conclusion

With automatic training, you can mitigate model drift and maintain recommendation relevance by streamlining your workflow and automating the deployment of the latest solution version in Amazon Personalize.

For more information about optimizing your user experience with Amazon Personalize, see the Amazon Personalize Developer Guide.


About the authors

Ba’Carri Johnson is a Sr. Technical Product Manager working with AWS AI/ML on the Amazon Personalize team. With a background in computer science and strategy, she is passionate about product innovation. In her spare time, she enjoys traveling and exploring the great outdoors.

Ajay Venkatakrishnan is a Software Development Engineer on the Amazon Personalize team. In his spare time, he enjoys writing and playing soccer.

Pranesh Anubhav is a Senior Software Engineer for Amazon Personalize. He is passionate about designing machine learning systems to serve customers at scale. Outside of his work, he loves playing soccer and is an avid follower of Real Madrid.

Read More

Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

We are excited to announce a new version of the Amazon SageMaker Operators for Kubernetes using the AWS Controllers for Kubernetes (ACK). ACK is a framework for building Kubernetes custom controllers, where each controller communicates with an AWS service API. These controllers allow Kubernetes users to provision AWS resources like buckets, databases, or message queues simply by using the Kubernetes API.

Release v1.2.9 of the SageMaker ACK Operators adds support for inference components, which until now were only available through the SageMaker API and the AWS Software Development Kits (SDKs). Inference components can help you optimize deployment costs and reduce latency. With the new inference component capabilities, you can deploy one or more foundation models (FMs) on the same Amazon SageMaker endpoint and control how many accelerators and how much memory is reserved for each FM. This helps improve resource utilization, reduces model deployment costs on average by 50%, and lets you scale endpoints together with your use cases. For more details, see Amazon SageMaker adds new inference capabilities to help reduce foundation model deployment costs and latency.

The availability of inference components through the SageMaker controller enables customers who use Kubernetes as their control plane to take advantage of inference components while deploying their models on SageMaker.

In this post, we show how to use SageMaker ACK Operators to deploy SageMaker inference components.

How ACK works

To demonstrate how ACK works, let’s look at an example using Amazon Simple Storage Service (Amazon S3). In the following diagram, Alice is our Kubernetes user. Her application depends on the existence of an S3 bucket named my-bucket.

How ACK Works

The workflow consists of the following steps:

  1. Alice issues a call to kubectl apply, passing in a file that describes a Kubernetes custom resource describing her S3 bucket. kubectl apply passes this file, called a manifest, to the Kubernetes API server running in the Kubernetes controller node.
  2. The Kubernetes API server receives the manifest describing the S3 bucket and determines if Alice has permissions to create a custom resource of kind s3.services.k8s.aws/Bucket, and that the custom resource is properly formatted.
  3. If Alice is authorized and the custom resource is valid, the Kubernetes API server writes the custom resource to its etcd data store.
  4. It then responds to Alice that the custom resource has been created.
  5. At this point, the ACK service controller for Amazon S3, which is running on a Kubernetes worker node within the context of a normal Kubernetes Pod, is notified that a new custom resource of kind s3.services.k8s.aws/Bucket has been created.
  6. The ACK service controller for Amazon S3 then communicates with the Amazon S3 API, calling the S3 CreateBucket API to create the bucket in AWS.
  7. After communicating with the Amazon S3 API, the ACK service controller calls the Kubernetes API server to update the custom resource’s status with information it received from Amazon S3.

Key components

The new inference capabilities build upon SageMaker’s real-time inference endpoints. As before, you create the SageMaker endpoint with an endpoint configuration that defines the instance type and initial instance count for the endpoint. The model is configured in a new construct, an inference component. Here, you specify the number of accelerators and amount of memory you want to allocate to each copy of a model, together with the model artifacts, container image, and number of model copies to deploy.

You can use the new inference capabilities from Amazon SageMaker Studio, the SageMaker Python SDK, AWS SDKs, and AWS Command Line Interface (AWS CLI). They are also supported by AWS CloudFormation. Now you also can use them with SageMaker Operators for Kubernetes.

Solution overview

For this demo, we use the SageMaker controller to deploy a copy of the Dolly v2 7B model and a copy of the FLAN-T5 XXL model from the Hugging Face Model Hub on a SageMaker real-time endpoint using the new inference capabilities.

Prerequisites

To follow along, you should have a Kubernetes cluster with the SageMaker ACK controller v1.2.9 or above installed. For instructions on how to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster with Amazon Elastic Compute Cloud (Amazon EC2) Linux managed nodes using eksctl, see Getting started with Amazon EKS – eksctl. For instructions on installing the SageMaker controller, refer to Machine Learning with the ACK SageMaker Controller.

You need access to accelerated instances (GPUs) for hosting the LLMs. This solution uses one instance of ml.g5.12xlarge; you can check the availability of these instances in your AWS account and request these instances as needed via a Service Quotas increase request, as shown in the following screenshot.

Service Quotas Increase Request

Create an inference component

To create your inference component, define the EndpointConfig, Endpoint, Model, and InferenceComponent YAML files, similar to the ones shown in this section. Use kubectl apply -f <yaml file> to create the Kubernetes resources.

You can list the status of the resource via kubectl describe <resource-type>; for example, kubectl describe inferencecomponent.

You can also create the inference component without a model resource. Refer to the guidance provided in the API documentation for more details.

EndpointConfig YAML

The following is the code for the EndpointConfig file:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: EndpointConfig
metadata:
  name: inference-component-endpoint-config
spec:
  endpointConfigName: inference-component-endpoint-config
  executionRoleARN: <EXECUTION_ROLE_ARN>
  productionVariants:
  - variantName: AllTraffic
    instanceType: ml.g5.12xlarge
    initialInstanceCount: 1
    routingConfig:
      routingStrategy: LEAST_OUTSTANDING_REQUESTS

Endpoint YAML

The following is the code for the Endpoint file:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Endpoint
metadata:
  name: inference-component-endpoint
spec:
  endpointName: inference-component-endpoint
  endpointConfigName: inference-component-endpoint-config

Model YAML

The following is the code for the Model file:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Model
metadata:
  name: dolly-v2-7b
spec:
  modelName: dolly-v2-7b
  executionRoleARN: <EXECUTION_ROLE_ARN>
  containers:
  - image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04
    environment:
      HF_MODEL_ID: databricks/dolly-v2-7b
      HF_TASK: text-generation
---
apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Model
metadata:
  name: flan-t5-xxl
spec:
  modelName: flan-t5-xxl
  executionRoleARN: <EXECUTION_ROLE_ARN>
  containers:
  - image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04
    environment:
      HF_MODEL_ID: google/flan-t5-xxl
      HF_TASK: text-generation

InferenceComponent YAMLs

In the following YAML files, given that the ml.g5.12xlarge instance comes with 4 GPUs, we are allocating 2 GPUs, 2 CPUs and 1,024 MB of memory to each model:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: InferenceComponent
metadata:
  name: inference-component-dolly
spec:
  inferenceComponentName: inference-component-dolly
  endpointName: inference-component-endpoint
  variantName: AllTraffic
  specification:
    modelName: dolly-v2-7b
    computeResourceRequirements:
      numberOfAcceleratorDevicesRequired: 2
      numberOfCPUCoresRequired: 2
      minMemoryRequiredInMb: 1024
  runtimeConfig:
    copyCount: 1
apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: InferenceComponent
metadata:
  name: inference-component-flan
spec:
  inferenceComponentName: inference-component-flan
  endpointName: inference-component-endpoint
  variantName: AllTraffic
  specification:
    modelName: flan-t5-xxl
    computeResourceRequirements:
      numberOfAcceleratorDevicesRequired: 2
      numberOfCPUCoresRequired: 2
      minMemoryRequiredInMb: 1024
  runtimeConfig:
    copyCount: 1

Invoke models

You can now invoke the models using the following code:

import boto3
import json

sm_runtime_client = boto3.client(service_name="sagemaker-runtime")
payload = {"inputs": "Why is California a great place to live?"}

response_dolly = sm_runtime_client.invoke_endpoint(
    EndpointName="inference-component-endpoint",
    InferenceComponentName="inference-component-dolly",
    ContentType="application/json",
    Accept="application/json",
    Body=json.dumps(payload),
)
result_dolly = json.loads(response_dolly['Body'].read().decode())
print(result_dolly)

response_flan = sm_runtime_client.invoke_endpoint(
    EndpointName="inference-component-endpoint",
    InferenceComponentName="inference-component-flan",
    ContentType="application/json",
    Accept="application/json",
    Body=json.dumps(payload),
)
result_flan = json.loads(response_flan['Body'].read().decode())
print(result_flan)

Update an inference component

To update an existing inference component, you can update the YAML files and then use kubectl apply -f <yaml file>. The following is an example of an updated file:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: InferenceComponent
metadata:
  name: inference-component-dolly
spec:
  inferenceComponentName: inference-component-dolly
  endpointName: inference-component-endpoint
  variantName: AllTraffic
  specification:
    modelName: dolly-v2-7b
    computeResourceRequirements:
      numberOfAcceleratorDevicesRequired: 2
      numberOfCPUCoresRequired: 4 # Update the numberOfCPUCoresRequired.
      minMemoryRequiredInMb: 1024
  runtimeConfig:
    copyCount: 1

Delete an inference component

To delete an existing inference component, use the command kubectl delete -f <yaml file>.

Availability and pricing

The new SageMaker inference capabilities are available today in AWS Regions US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Jakarta, Mumbai, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Stockholm), Middle East (UAE), and South America (São Paulo). For pricing details, visit Amazon SageMaker Pricing.

Conclusion

In this post, we showed how to use SageMaker ACK Operators to deploy SageMaker inference components. Fire up your Kubernetes cluster and deploy your FMs using the new SageMaker inference capabilities today!


About the Authors

Rajesh Ramchander is a Principal ML Engineer in Professional Services at AWS. He helps customers at various stages in their AI/ML and GenAI journey, from those that are just getting started all the way to those that are leading their business with an AI-first strategy.

Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington D.C.

Suryansh Singh is a Software Development Engineer at AWS SageMaker and works on developing ML-distributed infrastructure solutions for AWS customers at scale.

Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.

Johna LiuJohna Liu is a Software Development Engineer in the Amazon SageMaker team. Her current work focuses on helping developers efficiently host machine learning models and improve inference performance. She is passionate about spatial data analysis and using AI to solve societal problems.

Read More

Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker – Part 2

Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker – Part 2

In Part 1 of this series, we presented a solution that used the Amazon Titan Multimodal Embeddings model to convert individual slides from a slide deck into embeddings. We stored the embeddings in a vector database and then used the Large Language-and-Vision Assistant (LLaVA 1.5-7b) model to generate text responses to user questions based on the most similar slide retrieved from the vector database. We used AWS services including Amazon Bedrock, Amazon SageMaker, and Amazon OpenSearch Serverless in this solution.

In this post, we demonstrate a different approach. We use the Anthropic Claude 3 Sonnet model to generate text descriptions for each slide in the slide deck. These descriptions are then converted into text embeddings using the Amazon Titan Text Embeddings model and stored in a vector database. Then we use the Claude 3 Sonnet model to generate answers to user questions based on the most relevant text description retrieved from the vector database.

You can test both approaches for your dataset and evaluate the results to see which approach gives you the best results. In Part 3 of this series, we evaluate the results of both methods.

Solution overview

The solution provides an implementation for answering questions using information contained in text and visual elements of a slide deck. The design relies on the concept of Retrieval Augmented Generation (RAG). Traditionally, RAG has been associated with textual data that can be processed by large language models (LLMs). In this series, we extend RAG to include images as well. This provides a powerful search capability to extract contextually relevant content from visual elements like tables and graphs along with text.

This solution includes the following components:

  • Amazon Titan Text Embeddings is a text embeddings model that converts natural language text, including single words, phrases, or even large documents, into numerical representations that can be used to power use cases such as search, personalization, and clustering based on semantic similarity.
  • Claude 3 Sonnet is the next generation of state-of-the-art models from Anthropic. Sonnet is a versatile tool that can handle a wide range of tasks, from complex reasoning and analysis to rapid outputs, as well as efficient search and retrieval across vast amounts of information.
  • OpenSearch Serverless is an on-demand serverless configuration for Amazon OpenSearch Service. We use OpenSearch Serverless as a vector database for storing embeddings generated by the Amazon Titan Text Embeddings model. An index created in the OpenSearch Serverless collection serves as the vector store for our RAG solution.
  • Amazon OpenSearch Ingestion (OSI) is a fully managed, serverless data collector that delivers data to OpenSearch Service domains and OpenSearch Serverless collections. In this post, we use an OSI pipeline API to deliver data to the OpenSearch Serverless vector store.

The solution design consists of two parts: ingestion and user interaction. During ingestion, we process the input slide deck by converting each slide into an image, generating descriptions and text embeddings for each image. We then populate the vector data store with the embeddings and text description for each slide. These steps are completed prior to the user interaction steps.

In the user interaction phase, a question from the user is converted into text embeddings. A similarity search is run on the vector database to find a text description corresponding to a slide that could potentially contain answers to the user question. We then provide the slide description and the user question to the Claude 3 Sonnet model to generate an answer to the query. All the code for this post is available in the GitHub repo.

The following diagram illustrates the ingestion architecture.

The workflow consists of the following steps:

  1. Slides are converted to image files (one per slide) in JPG format and passed to the Claude 3 Sonnet model to generate text description.
  2. The data is sent to the Amazon Titan Text Embeddings model to generate embeddings. In this series, we use the slide deck Train and deploy Stable Diffusion using AWS Trainium & AWS Inferentia from the AWS Summit in Toronto, June 2023 to demonstrate the solution. The sample deck has 31 slides, therefore we generate 31 sets of vector embeddings, each with 1536 dimensions. We add additional metadata fields to perform rich search queries using OpenSearch’s powerful search capabilities.
  3. The embeddings are ingested into an OSI pipeline using an API call.
  4. The OSI pipeline ingests the data as documents into an OpenSearch Serverless index. The index is configured as the sink for this pipeline and is created as part of the OpenSearch Serverless collection.

The following diagram illustrates the user interaction architecture.

The workflow consists of the following steps:

  1. A user submits a question related to the slide deck that has been ingested.
  2. The user input is converted into embeddings using the Amazon Titan Text Embeddings model accessed using Amazon Bedrock. An OpenSearch Service vector search is performed using these embeddings. We perform a k-nearest neighbor (k-NN) search to retrieve the most relevant embeddings matching the user query.
  3. The metadata of the response from OpenSearch Serverless contains a path to the image and description corresponding to the most relevant slide.
  4. A prompt is created by combining the user question and the image description. The prompt is provided to Claude 3 Sonnet hosted on Amazon Bedrock.
  5. The result of this inference is returned to the user.

We discuss the steps for both stages in the following sections, and include details about the output.

Prerequisites

To implement the solution provided in this post, you should have an AWS account and familiarity with FMs, Amazon Bedrock, SageMaker, and OpenSearch Service.

This solution uses the Claude 3 Sonnet and Amazon Titan Text Embeddings models hosted on Amazon Bedrock. Make sure that these models are enabled for use by navigating to the Model access page on the Amazon Bedrock console.

If models are enabled, the Access status will state Access granted.

If the models are not available, enable access by choosing Manage model access, selecting the models, and choosing Request model access. The models are enabled for use immediately.

Use AWS CloudFormation to create the solution stack

You can use AWS CloudFormation to create the solution stack. If you have created the solution for Part 1 in the same AWS account, be sure to delete that before creating this stack.

AWS Region Link
us-east-1
us-west-2

After the stack is created successfully, navigate to the stack’s Outputs tab on the AWS CloudFormation console and note the values for MultimodalCollectionEndpoint and OpenSearchPipelineEndpoint. You use these in the subsequent steps.

The CloudFormation template creates the following resources:

  • IAM roles – The following AWS Identity and Access Management (IAM) roles are created. Update these roles to apply least-privilege permissions, as discussed in Security best practices.
    • SMExecutionRole with Amazon Simple Storage Service (Amazon S3), SageMaker, OpenSearch Service, and Amazon Bedrock full access.
    • OSPipelineExecutionRole with access to the S3 bucket and OSI actions.
  • SageMaker notebook – All code for this post is run using this notebook.
  • OpenSearch Serverless collection – This is the vector database for storing and retrieving embeddings.
  • OSI pipeline – This is the pipeline for ingesting data into OpenSearch Serverless.
  • S3 bucket – All data for this post is stored in this bucket.

The CloudFormation template sets up the pipeline configuration required to configure the OSI pipeline with HTTP as source and the OpenSearch Serverless index as sink. The SageMaker notebook 2_data_ingestion.ipynb displays how to ingest data into the pipeline using the Requests HTTP library.

The CloudFormation template also creates network, encryption and data access policies required for your OpenSearch Serverless collection. Update these policies to apply least-privilege permissions.

The CloudFormation template name and OpenSearch Service index name are referenced in the SageMaker notebook 3_rag_inference.ipynb. If you change the default names, make sure you update them in the notebook.

Test the solution

After you have created the CloudFormation stack, you can test the solution. Complete the following steps:

  1. On the SageMaker console, choose Notebooks in the navigation pane.
  2. Select MultimodalNotebookInstance and choose Open JupyterLab.
  3. In File Browser, traverse to the notebooks folder to see notebooks and supporting files.

The notebooks are numbered in the sequence in which they run. Instructions and comments in each notebook describe the actions performed by that notebook. We run these notebooks one by one.

  1. Choose 1_data_prep.ipynb to open it in JupyterLab.
  2. On the Run menu, choose Run All Cells to run the code in this notebook.

This notebook will download a publicly available slide deck, convert each slide into the JPG file format, and upload these to the S3 bucket.

  1. Choose 2_data_ingestion.ipynb to open it in JupyterLab.
  2. On the Run menu, choose Run All Cells to run the code in this notebook.

In this notebook, you create an index in the OpenSearch Serverless collection. This index stores the embeddings data for the slide deck. See the following code:

session = boto3.Session()
credentials = session.get_credentials()
auth = AWSV4SignerAuth(credentials, g.AWS_REGION, g.OS_SERVICE)

os_client = OpenSearch(
  hosts = [{'host': host, 'port': 443}],
  http_auth = auth,
  use_ssl = True,
  verify_certs = True,
  connection_class = RequestsHttpConnection,
  pool_maxsize = 20
)

index_body = """
{
  "settings": {
    "index.knn": true
  },
  "mappings": {
    "properties": {
      "vector_embedding": {
        "type": "knn_vector",
        "dimension": 1536,
        "method": {
          "name": "hnsw",
          "engine": "nmslib",
          "parameters": {}
        }
      },
      "image_path": {
        "type": "text"
      },
      "slide_text": {
        "type": "text"
      },
      "slide_number": {
        "type": "text"
      },
      "metadata": { 
        "properties" :
          {
            "filename" : {
              "type" : "text"
            },
            "desc":{
              "type": "text"
            }
          }
      }
    }
  }
}
"""
index_body = json.loads(index_body)
try:
  response = os_client.indices.create(index_name, body=index_body)
  logger.info(f"response received for the create index -> {response}")
except Exception as e:
  logger.error(f"error in creating index={index_name}, exception={e}")

You use the Claude 3 Sonnet and Amazon Titan Text Embeddings models to convert the JPG images created in the previous notebook into vector embeddings. These embeddings and additional metadata (such as the S3 path and description of the image file) are stored in the index along with the embeddings. The following code snippet shows how Claude 3 Sonnet generates image descriptions:

def get_img_desc(image_file_path: str, prompt: str):
    # read the file, MAX image size supported is 2048 * 2048 pixels
    with open(image_file_path, "rb") as image_file:
        input_image_b64 = image_file.read().decode('utf-8')
  
    body = json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 1000,
            "messages": [
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "image",
                            "source": {
                                "type": "base64",
                                "media_type": "image/jpeg",
                                "data": input_image_b64
                            },
                        },
                        {"type": "text", "text": prompt},
                    ],
                }
            ],
        }
    )
    
    response = bedrock.invoke_model(
        modelId=g.CLAUDE_MODEL_ID,
        body=body
    )

    resp_body = json.loads(response['body'].read().decode("utf-8"))
    resp_text = resp_body['content'][0]['text'].replace('"', "'")

    return resp_text

The image descriptions are passed to the Amazon Titan Text Embeddings model to generate vector embeddings. These embeddings and additional metadata (such as the S3 path and description of the image file) are stored in the index along with the embeddings. The following code snippet shows the call to the Amazon Titan Text Embeddings model:

def get_text_embedding(bedrock: botocore.client, prompt_data: str) -> np.ndarray:
    body = json.dumps({
        "inputText": prompt_data,
    })    
    try:
        response = bedrock.invoke_model(
            body=body, modelId=g.TITAN_MODEL_ID, accept=g.ACCEPT_ENCODING, contentType=g.CONTENT_ENCODING
        )
        response_body = json.loads(response['body'].read())
        embedding = response_body.get('embedding')
    except Exception as e:
        logger.error(f"exception={e}")
        embedding = None

    return embedding

The data is ingested into the OpenSearch Serverless index by making an API call to the OSI pipeline. The following code snippet shows the call made using the Requests HTTP library:

data = json.dumps([{
    "image_path": input_image_s3, 
    "slide_text": resp_text, 
    "slide_number": slide_number, 
    "metadata": {
        "filename": obj_name, 
        "desc": "" 
    }, 
    "vector_embedding": embedding
}])

r = requests.request(
    method='POST', 
    url=osi_endpoint, 
    data=data,
    auth=AWSSigV4('osis'))
  1. Choose 3_rag_inference.ipynb to open it in JupyterLab.
  2. On the Run menu, choose Run All Cells to run the code in this notebook.

This notebook implements the RAG solution: you convert the user question into embeddings, find a similar image description from the vector database, and provide the retrieved description to Claude 3 Sonnet to generate an answer to the user question. You use the following prompt template:

  llm_prompt: str = """

  Human: Use the summary to provide a concise answer to the question to the best of your abilities. If you cannot answer the question from the context then say I do not know, do not make up an answer.
  <question>
  {question}
  </question>

  <summary>
  {summary}
  </summary>

  Assistant:"""

The following code snippet provides the RAG workflow:

def get_llm_response(bedrock: botocore.client, question: str, summary: str) -> str:
    prompt = llm_prompt.format(question=question, summary=summary)
    
    body = json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                ],
            }
        ],
    })
        
    try:
        response = bedrock.invoke_model(
        modelId=g.CLAUDE_MODEL_ID,
        body=body)

        response_body = json.loads(response['body'].read().decode("utf-8"))
        llm_response = response_body['content'][0]['text'].replace('"', "'")
        
    except Exception as e:
        logger.error(f"exception while slide_text={summary[:10]}, exception={e}")
        llm_response = None

    return llm_response


# create prompt and convert to embeddings question: str = "How does Inf2 compare in performance to comparable EC2 instances? I need numbers." text_embedding = get_text_embedding(bedrock, question) # vector db search vector_db_response: Dict = find_similar_data(text_embeddings) # download image for local notebook display s3_img_path = vector_db_response.get('hits', {}).get('hits')[0].get('_source').get('image_path') logger.info(f"going to answer the question="{question}" using the image "{s3_img_path}"") !aws s3 cp {s3_img_path} . local_img_path = os.path.basename(s3_img_path) display(filename=local_img_path) # Ask Claude 3 Sonnet slide_text = vector_db_response.get('hits', {}).get('hits')[0].get('_source').get('slide_text') llm_response = get_llm_response(bedrock, question, slide_text) print(llm_response)

Results

The following table contains some user questions and responses generated by our implementation. The Question column captures the user question, and the Answer column is the textual response generated by Claude 3 Sonnet. The Image column shows the k-NN slide match returned by the OpenSearch Serverless vector search.

Multimodal RAG results

Question Answer Image
How does Inf2 compare in performance to comparable EC2 instances? I need numbers. Based on the information provided in the summary, the key performance comparison numbers given for the Amazon EC2 Inf2 instances compared to other instances are: 1. Up to 4x higher throughput. 2. Up to 10x lower latency. The summary does not provide specific numbers comparing the Inf2 instances to other EC2 instance types. However, it highlights the throughput and latency improvements as the main performance advantages of the Inf2 instances over comparable instances for running generative AI models.
As per the AI/ML flywheel, what do the AWS AI/ML services provide? Based on the summary, the AWS AI/ML services provide a range of capabilities that fuel an AI/ML flywheel. This flywheel consists of innovation in developing new and better algorithms, enabling more use-cases and applications, driving wider adoption and demand, which in turn leads to further investment in research and optimization to drive more innovation. The AWS AI/ML services seem to offer the tools, resources, and infrastructure to support this continuous cycle of innovation, application development, adoption, and reinvestment in the field of artificial intelligence and machine learning.
Compared to GPT-2, how many more parameters does GPT-3 have? What is the numerical difference between the parameter size of GPT-2 and GPT-3? According to the information provided in the summary, GPT-3 from 2020 had 175B (175 billion) parameters, while GPT-2 from 2019 had 1.5B (1.5 billion) parameters. Therefore, the numerical difference between the parameter size of GPT-2 and GPT-3 is 173.5 billion parameters. GPT-3 has approximately 115 times more parameters than GPT-2.
What are quarks in particle physics? Unfortunately, the given summary does not contain any information about quarks in particle physics. The summary describes an image related to the progression of natural language processing and generative AI technologies, but it does not mention anything about particle physics or the concept of quarks.

Query your index

You can use OpenSearch Dashboards to interact with the OpenSearch API to run quick tests on your index and ingested data.

Cleanup

To avoid incurring future charges, delete the resources. You can do this by deleting the stack using the AWS CloudFormation console.

Conclusion

Enterprises generate new content all the time, and slide decks are a common way to share and disseminate information internally within the organization and externally with customers or at conferences. Over time, rich information can remain buried and hidden in non-text modalities like graphs and tables in these slide decks.

You can use this solution and the power of multimodal FMs such as the Amazon Titan Text Embeddings and Claude 3 Sonnet to discover new information or uncover new perspectives on content in slide decks. You can try different Claude models available on Amazon Bedrock by updating the CLAUDE_MODEL_ID in the globals.py file.

This is Part 2 of a three-part series. We used the Amazon Titan Multimodal Embeddings and the LLaVA model in Part 1. In Part 3, we will compare the approaches from Part 1 and Part 2.

Portions of this code are released under the Apache 2.0 License.


About the authors

Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington D.C.

Manju Prasad is a Senior Solutions Architect at Amazon Web Services. She focuses on providing technical guidance in a variety of technical domains, including AI/ML. Prior to joining AWS, she designed and built solutions for companies in the financial services sector and also for a startup. She is passionate about sharing knowledge and fostering interest in emerging talent.

Archana Inapudi is a Senior Solutions Architect at AWS, supporting a strategic customer. She has over a decade of cross-industry expertise leading strategic technical initiatives. Archana is an aspiring member of the AI/ML technical field community at AWS. Prior to joining AWS, Archana led a migration from traditional siloed data sources to Hadoop at a healthcare company. She is passionate about using technology to accelerate growth, provide value to customers, and achieve business outcomes.

Antara Raisa is an AI and ML Solutions Architect at Amazon Web Services, supporting strategic customers based out of Dallas, Texas. She also has previous experience working with large enterprise partners at AWS, where she worked as a Partner Success Solutions Architect for digital-centered customers.

Read More

Scale AI training and inference for drug discovery through Amazon EKS and Karpenter

Scale AI training and inference for drug discovery through Amazon EKS and Karpenter

This is a guest post co-written with the leadership team of Iambic Therapeutics.

Iambic Therapeutics is a drug discovery startup with a mission to create innovative AI-driven technologies to bring better medicines to cancer patients, faster.

Our advanced generative and predictive artificial intelligence (AI) tools enable us to search the vast space of possible drug molecules faster and more effectively. Our technologies are versatile and applicable across therapeutic areas, protein classes, and mechanisms of action. Beyond creating differentiated AI tools, we have established an integrated platform that merges AI software, cloud-based data, scalable computation infrastructure, and high-throughput chemistry and biology capabilities. The platform both enables our AI—by supplying data to refine our models—and is enabled by it, capitalizing on opportunities for automated decision-making and data processing.

We measure success by our ability to produce superior clinical candidates to address urgent patient need, at unprecedented speed: we advanced from program launch to clinical candidates in just 24 months, significantly faster than our competitors.

In this post, we focus on how we used Karpenter on Amazon Elastic Kubernetes Service (Amazon EKS) to scale AI training and inference, which are core elements of the Iambic discovery platform.

The need for scalable AI training and inference

Every week, Iambic performs AI inference across dozens of models and millions of molecules, serving two primary use cases:

  • Medicinal chemists and other scientists use our web application, Insight, to explore chemical space, access and interpret experimental data, and predict properties of newly designed molecules. All of this work is done interactively in real time, creating a need for inference with low latency and medium throughput.
  • At the same time, our generative AI models automatically design molecules targeting improvement across numerous properties, searching millions of candidates, and requiring enormous throughput and medium latency.

Guided by AI technologies and expert drug hunters, our experimental platform generates thousands of unique molecules each week, and each is subjected to multiple biological assays. The generated data points are automatically processed and used to fine-tune our AI models every week. Initially, our model fine-tuning took hours of CPU time, so a framework for scaling model fine-tuning on GPUs was imperative.

Our deep learning models have non-trivial requirements: they are gigabytes in size, are numerous and heterogeneous, and require GPUs for fast inference and fine-tuning. Looking to cloud infrastructure, we needed a system that allows us to access GPUs, scale up and down quickly to handle spiky, heterogeneous workloads, and run large Docker images.

We wanted to build a scalable system to support AI training and inference. We use Amazon EKS and were looking for the best solution to auto scale our worker nodes. We chose Karpenter for Kubernetes node auto scaling for a number of reasons:

  • Ease of integration with Kubernetes, using Kubernetes semantics to define node requirements and pod specs for scaling
  • Low-latency scale-out of nodes
  • Ease of integration with our infrastructure as code tooling (Terraform)

The node provisioners support effortless integration with Amazon EKS and other AWS resources such as Amazon Elastic Compute Cloud (Amazon EC2) instances and Amazon Elastic Block Store volumes. The Kubernetes semantics used by the provisioners support directed scheduling using Kubernetes constructs such as taints or tolerations and affinity or anti-affinity specifications; they also facilitate control over the number and types of GPU instances that may be scheduled by Karpenter.

Solution overview

In this section, we present a generic architecture that is similar to the one we use for our own workloads, which allows elastic deployment of models using efficient auto scaling based on custom metrics.

The following diagram illustrates the solution architecture.

The architecture deploys a simple service in a Kubernetes pod within an EKS cluster. This could be a model inference, data simulation, or any other containerized service, accessible by HTTP request. The service is exposed behind a reverse-proxy using Traefik. The reverse proxy collects metrics about calls to the service and exposes them via a standard metrics API to Prometheus. The Kubernetes Event Driven Autoscaler (KEDA) is configured to automatically scale the number of service pods, based on the custom metrics available in Prometheus. Here we use the number of requests per second as a custom metric. The same architectural approach applies if you choose a different metric for your workload.

Karpenter monitors for any pending pods that can’t run due to lack of sufficient resources in the cluster. If such pods are detected, Karpenter adds more nodes to the cluster to provide the necessary resources. Conversely, if there are more nodes in the cluster than what is needed by the scheduled pods, Karpenter removes some of the worker nodes and the pods get rescheduled, consolidating them on fewer instances. The number of HTTP requests per second and number of nodes can be visualized using a Grafana dashboard. To demonstrate auto scaling, we run one or more simple load-generating pods, which send HTTP requests to the service using curl.

Solution deployment

In the step-by-step walkthrough, we use AWS Cloud9 as an environment to deploy the architecture. This enables all steps to be completed from a web browser. You can also deploy the solution from a local computer or EC2 instance.

To simplify deployment and improve reproducibility, we follow the principles of the do-framework and the structure of the depend-on-docker template. We clone the aws-do-eks project and, using Docker, we build a container image that is equipped with the necessary tooling and scripts. Within the container, we run through all the steps of the end-to-end walkthrough, from creating an EKS cluster with Karpenter to scaling EC2 instances.

For the example in this post, we use the following EKS cluster manifest:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: do-eks-yaml-karpenter
version: '1.28'
region: us-west-2
tags:
  karpenter.sh/discovery: do-eks-yaml-karpenter
iam:
  withOIDC: true
addons:
  - name: aws-ebs-csi-driver
    version: v1.26.0-eksbuild.1
wellKnownPolicies:
  ebsCSIController: true
managedNodeGroups:
  - name: c5-xl-do-eks-karpenter-ng
    instanceType: c5.xlarge
    instancePrefix: c5-xl
    privateNetworking: true
    minSize: 0
    desiredCapacity: 2
    maxSize: 10
    volumeSize: 300
    iam:
      withAddonPolicies:
        cloudWatch: true
        ebs: true

This manifest defines a cluster named do-eks-yaml-karpenter with the EBS CSI driver installed as an add-on. A managed node group with two c5.xlarge nodes is included to run system pods that are needed by the cluster. The worker nodes are hosted in private subnets, and the cluster API endpoint is public by default.

You could also use an existing EKS cluster instead of creating one. We deploy Karpenter by following the instructions in the Karpenter documentation, or by running the following script, which automates the deployment instructions.

The following code shows the Karpenter configuration we use in this example:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    metadata: null
    labels:
      cluster-name: do-eks-yaml-karpenter
    annotations:
      purpose: karpenter-example
    spec:
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: default
        requirements:
          - key: karpenter.sh/capacity-type
            operator: In
            values:
              - spot
              - on-demand
          - key: karpenter.k8s.aws/instance-category
            operator: In
            values:
              - c
              - m
              - r
              - g
              - p
          - key: karpenter.k8s.aws/instance-generation
            operator: Gt
            values:
              - '2'
  disruption:
    consolidationPolicy: WhenUnderutilized
    #consolidationPolicy: WhenEmpty
    #consolidateAfter: 30s
    expireAfter: 720h
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  subnetSelectorTerms:
    - tags:
      karpenter.sh/discovery: "do-eks-yaml-karpenter"
  securityGroupSelectorTerms:
    - tags:
      karpenter.sh/discovery: "do-eks-yaml-karpenter"
  role: "KarpenterNodeRole-do-eks-yaml-karpenter"
  tags:
    app: autoscaling-test
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 80Gi
        volumeType: gp3
        iops: 10000
        deleteOnTermination: true
        throughput: 125
  detailedMonitoring: true

We define a default Karpenter NodePool with the following requirements:

  • Karpenter can launch instances from both spot and on-demand capacity pools
  • Instances must be from the “c” (compute optimized), “m” (general purpose), “r” (memory optimized), or “g” and “p” (GPU accelerated) computing families
  • Instance generation must be greater than 2; for example, g3 is acceptable, but g2 is not

The default NodePool also defines disruption policies. Underutilized nodes will be removed so pods can be consolidated to run on fewer or smaller nodes. Alternatively, we can configure empty nodes to be removed after the specified time period. The expireAfter setting specifies the maximum lifetime of any node, before it is stopped and replaced if necessary. This helps reduce security vulnerabilities as well as avoid issues that are typical for nodes with long uptimes, such as file fragmentation or memory leaks.

By default, Karpenter provisions nodes with a small root volume, which can be insufficient for running AI or machine learning (ML) workloads. Some of the deep learning container images can be tens of GB in size, and we need to make sure there is enough storage space on the nodes to run pods using these images. To do that, we define EC2NodeClass with blockDeviceMappings, as shown in the preceding code.

Karpenter is responsible for auto scaling at the cluster level. To configure auto scaling at the pod level, we use KEDA to define a custom resource called ScaledObject, as shown in the following code:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: keda-prometheus-hpa
  namespace: hpa-example
spec:
  scaleTargetRef:
    name: php-apache
  minReplicaCount: 1
  cooldownPeriod: 30
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus- server.prometheus.svc.cluster.local:80
        metricName: http_requests_total
        threshold: '1'
        query: rate(traefik_service_requests_total{service="hpa-example-php-apache-80@kubernetes",code="200"}[2m])

The preceding manifest defines a ScaledObject named keda-prometheus-hpa, which is responsible for scaling the php-apache deployment and always keeps at least one replica running. It scales the pods of this deployment based on the metric http_requests_total available in Prometheus obtained by the specified query, and targets to scale up the pods so that each pod serves no more than one request per second. It scales down the replicas after the request load has been below the threshold for longer than 30 seconds.

The deployment spec for our example service contains the following resource requests and limits:

resources:
  limits:
    cpu: 500m
    nvidia.com/gpu: 1
  requests:
    cpu: 200m
    nvidia.com/gpu: 1

With this configuration, each of the service pods will use exactly one NVIDIA GPU. When new pods are created, they will be in Pending state until a GPU is available. Karpenter adds GPU nodes to the cluster as needed to accommodate the pending pods.

A load-generating pod sends HTTP requests to the service with a pre-set frequency. We increase the number of requests by increasing the number of replicas in the load-generator deployment.

A full scaling cycle with utilization-based node consolidation is visualized in a Grafana dashboard. The following dashboard shows the number of nodes in the cluster by instance type (top), the number of requests per second (bottom left), and the number of pods (bottom right).

Scaling Dashboard 1

We start with just the two c5.xlarge CPU instances that the cluster was created with. Then we deploy one service instance, which requires a single GPU. Karpenter adds a g4dn.xlarge instance to accommodate this need. We then deploy the load generator, which causes KEDA to add more service pods and Karpenter adds more GPU instances. After optimization, the state settles on one p3.8xlarge instance with 8 GPUs and one g5.12xlarge instance with 4 GPUs.

When we scale the load-generating deployment to 40 replicas, KEDA creates additional service pods to maintain the required request load per pod. Karpenter adds g4dn.metal and g4dn.12xlarge nodes to the cluster to provide the needed GPUs for the additional pods. In the scaled state, the cluster contains 16 GPU nodes and serves about 300 requests per second. When we scale down the load generator to 1 replica, the reverse process takes place. After the cooldown period, KEDA reduces the number of service pods. Then as fewer pods run, Karpenter removes the underutilized nodes from the cluster and the service pods get consolidated to run on fewer nodes. When the load generator pod is removed, a single service pod on a single g4dn.xlarge instance with 1 GPU remains running. When we remove the service pod as well, the cluster is left in the initial state with only two CPU nodes.

We can observe this behavior when the NodePool has the setting consolidationPolicy: WhenUnderutilized.

With this setting, Karpenter dynamically configures the cluster with as few nodes as possible, while providing sufficient resources for all pods to run and also minimizing cost.

The scaling behavior shown in the following dashboard is observed when the NodePool consolidation policy is set to WhenEmpty, along with consolidateAfter: 30s.

Scaling Dashboard 2

In this scenario, nodes are stopped only when there are no pods running on them after the cool-off period. The scaling curve appears smooth, compared to the utilization-based consolidation policy; however, it can be seen that more nodes are used in the scaled state (22 vs. 16).

Overall, combining pod and cluster auto scaling makes sure that the cluster scales dynamically with the workload, allocating resources when needed and removing them when not in use, thereby maximizing utilization and minimizing cost.

Outcomes

Iambic used this architecture to enable efficient use of GPUs on AWS and migrate workloads from CPU to GPU. By using EC2 GPU powered instances, Amazon EKS, and Karpenter, we were able to enable faster inference for our physics-based models and fast experiment iteration times for applied scientists who rely on training as a service.

The following table summarizes some of the time metrics of this migration.

Task CPUs GPUs
Inference using diffusion models for physics-based ML models 3,600 seconds

100 seconds

(due to inherent batching of GPUs)

ML model training as a service 180 minutes 4 minutes

The following table summarizes some of our time and cost metrics.

Task Performance/Cost
CPUs GPUs
ML model training

240 minutes

average $0.70 per training task

20 minutes

average $0.38 per training task

Summary

In this post, we showcased how Iambic used Karpenter and KEDA to scale our Amazon EKS infrastructure to meet the latency requirements of our AI inference and training workloads. Karpenter and KEDA are powerful open source tools that help auto scale EKS clusters and workloads running on them. This helps optimize compute costs while meeting performance requirements. You can check out the code and deploy the same architecture in your own environment by following the complete walkthrough in this GitHub repo.


About the Authors

Matthew Welborn is the director of Machine Learning at Iambic Therapeutics. He and his team leverage AI to accelerate the identification and development of novel therapeutics, bringing life-saving medicines to patients faster.

Paul Whittemore is a Principal Engineer at Iambic Therapeutics. He supports delivery of the infrastructure for the Iambic AI-driven drug discovery platform.

Alex Iankoulski is a Principal Solutions Architect, ML/AI Frameworks, who focuses on helping customers orchestrate their AI workloads using containers and accelerated computing infrastructure on AWS.

Read More