Amazon AWS – Page 36

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

January 28, 2025

by Paolo Di Francesco Amazon AWS

Evaluating large language models (LLMs) is crucial as LLM-based systems become increasingly powerful and relevant in our society. Rigorous testing allows us to understand an LLM’s capabilities, limitations, and potential biases, and provide actionable feedback to identify and mitigate risk. Furthermore, evaluation processes are important not only for LLMs, but are becoming essential for assessing prompt template quality, input data quality, and ultimately, the entire application stack. As LLMs take on more significant roles in areas like healthcare, education, and decision support, robust evaluation frameworks are vital for building trust and realizing the technology’s potential while mitigating risks.

Developers interested in using LLMs should prioritize a comprehensive evaluation process for several reasons. First, it assesses the model’s suitability for specific use cases, because performance can vary significantly across different tasks and domains. Evaluations are also a fundamental tool during application development to validate the quality of prompt templates. This process makes sure that solutions align with the company’s quality standards and policy guidelines before deploying them to production. Regular interval evaluation also allows organizations to stay informed about the latest advancements, making informed decisions about upgrading or switching models. Moreover, a thorough evaluation framework helps companies address potential risks when using LLMs, such as data privacy concerns, regulatory compliance issues, and reputational risk from inappropriate outputs. By investing in robust evaluation practices, companies can maximize the benefits of LLMs while maintaining responsible AI implementation and minimizing potential drawbacks.

To support robust generative AI application development, it’s essential to keep track of models, prompt templates, and datasets used throughout the process. This record-keeping allows developers and researchers to maintain consistency, reproduce results, and iterate on their work effectively. By documenting the specific model versions, fine-tuning parameters, and prompt engineering techniques employed, teams can better understand the factors contributing to their AI system’s performance. Similarly, maintaining detailed information about the datasets used for training and evaluation helps identify potential biases and limitations in the model’s knowledge base. This comprehensive approach to tracking key components not only facilitates collaboration among team members but also enables more accurate comparisons between different iterations of the AI application. Ultimately, this systematic approach to managing models, prompts, and datasets contributes to the development of more reliable and transparent generative AI applications.

In this post, we show how to use FMEval and Amazon SageMaker to programmatically evaluate LLMs. FMEval is an open source LLM evaluation library, designed to provide data scientists and machine learning (ML) engineers with a code-first experience to evaluate LLMs for various aspects, including accuracy, toxicity, fairness, robustness, and efficiency. In this post, we only focus on the quality and responsible aspects of model evaluation, but the same approach can be extended by using other libraries for evaluating performance and cost, such as LLMeter and FMBench, or richer quality evaluation capabilities like those provided by Amazon Bedrock Evaluations.

SageMaker is a data, analytics, and AI/ML platform, which we will use in conjunction with FMEval to streamline the evaluation process. We specifically focus on SageMaker with MLflow. MLflow is an open source platform for managing the end-to-end ML lifecycle, including experimentation, reproducibility, and deployment. The managed MLflow in SageMaker simplifies the deployment and operation of tracking servers, and offers seamless integration with other AWS services, making it straightforward to track experiments, package code into reproducible runs, and share and deploy models.

By combining FMEval’s evaluation capabilities with SageMaker with MLflow, you can create a robust, scalable, and reproducible workflow for assessing LLM performance. This approach can enable you to systematically evaluate models, track results, and make data-driven decisions in your generative AI development process.

Using FMEval for model evaluation

FMEval is an open-source library for evaluating foundation models (FMs). It consists of three main components:

Data config – Specifies the dataset location and its structure.
Model runner – Composes input, and invokes and extracts output from your model. Thanks to this construct, you can evaluate any LLM by configuring the model runner according to your model.
Evaluation algorithm – Computes evaluation metrics to model outputs. Different algorithms have different metrics to be specified.

You can use pre-built components because it provides native components for both Amazon Bedrock and Amazon SageMaker JumpStart, or create custom ones by inheriting from the base core component. The library supports various evaluation scenarios, including pre-computed model outputs and on-the-fly inference. FMEval offers flexibility in dataset handling, model integration, and algorithm implementation. Refer to Evaluate large language models for quality and responsibility or the Evaluating Large Language Models with fmeval paper to dive deeper into FMEval, or see the official GitHub repository.

Using SageMaker with MLflow to track experiments

The fully managed MLflow capability on SageMaker is built around three core components:

MLflow tracking server – This component can be quickly set up through the Amazon SageMaker Studio interface or using the API for more granular configurations. It functions as a standalone HTTP server that provides various REST API endpoints for monitoring, recording, and visualizing experiment runs. This allows you to keep track of your ML experiments.
MLflow metadata backend – This crucial part of the tracking server is responsible for storing all the essential information about your experiments. It keeps records of experiment names, run identifiers, parameter settings, performance metrics, tags, and locations of artifacts. This comprehensive data storage makes sure that you can effectively manage and analyze your ML projects.
MLflow artifact repository – This component serves as a storage space for all the files and objects generated during your ML experiments. These can include trained models, datasets, log files, and visualizations. The repository uses an Amazon Simple Storage Service (Amazon S3) bucket within your AWS account, making sure that your artifacts are stored securely and remain under your control.

The following diagram depicts the different components and where they run within AWS.

Code walkthrough

You can follow the full sample code from the GitHub repository.

Prerequisites

You must have the following prerequisites:

A running MLflow tracking server within an Amazon SageMaker Studio domain
A JupyterLab application within the same SageMaker Studio domain
Active subscriptions to the Amazon Bedrock models you want to evaluate and permissions to invoke these models
Permissions to deploy foundation models via Amazon SageMaker JumpStart

Refer to the documentation best practices regarding AWS Identity and Access Management (IAM) policies for SageMaker, MLflow, and Amazon Bedrock on how to set up permissions for the SageMaker execution role. Remember to always following the least privilege access principle.

Evaluate a model and log to MLflow

We provide two sample notebooks to evaluate models hosted in Amazon Bedrock (Bedrock.ipynb) and models deployed to SageMaker Hosting using SageMaker JumpStart (JumpStart.ipynb). The workflow implemented in these two notebooks is essentially the same, although a few differences are noteworthy:

Models hosted in Amazon Bedrock can be consumed directly using an API without any setup, providing a “serverless” experience, whereas models in SageMaker JumpStart require the user first to deploy the models. Although deploying models through SageMaker JumpStart is a straightforward operation, the user is responsible for managing the lifecycle of the endpoint.
ModelRunners implementations differ. FMEval provides native implementations for both Amazon Bedrock, using the BedrockModelRunner class, and SageMaker JumpStart, using the JumpStartModelRunner class. We discuss the main differences in the following section.

ModelRunner definition

For BedrockModelRunner, we need to find the model content_template. We can find this information conveniently on the Amazon Bedrock console in the API request sample section, and look at value of the body. The following example is the content template for Anthropic’s Claude 3 Haiku:

output_jmespath = "content[0].text"
content_template = """{
  "anthropic_version": "bedrock-2023-05-31",
  "max_tokens": 512,
  "temperature": 0.5,
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": $prompt
        }
      ]
    }
  ]
}"""

model_runner = BedrockModelRunner(
    model_id=model_id,
    output=output_jmespath,
    content_template=content_template,
)

For JumpStartModelRunner, we need to find the model and model_version. This information can be retrieved directly using the get_model_info_from_endpoint(endpoint_name=endpoint_name) utility provided by the SageMaker Python SDK, where endpoint_name is the name of the SageMaker endpoint where the SageMaker JumpStart model is hosted. See the following code example:

from sagemaker.jumpstart.session_utils import get_model_info_from_endpoint

model_id, model_version, , , _ = get_model_info_from_endpoint(endpoint_name=endpoint_name)

model_runner = JumpStartModelRunner(
    endpoint_name=endpoint_name,
    model_id=model_id,
    model_version=model_version,
)

DataConfig definition

For each model runner, we want to evaluate three categories: Summarization, Factual Knowledge, and Toxicity. For each of this category, we prepare a DataConfig object for the appropriate dataset. The following example shows only the data for the Summarization category:

dataset_path = Path("datasets")

dataset_uri_summarization = dataset_path / "gigaword_sample.jsonl"
if not dataset_uri_summarization.is_file():
    print("ERROR - please make sure the file, gigaword_sample.jsonl, exists.")

data_config_summarization = DataConfig(
    dataset_name="gigaword_sample",
    dataset_uri=dataset_uri_summarization.as_posix(),
    dataset_mime_type=MIME_TYPE_JSONLINES,
    model_input_location="document",
    target_output_location="summary",
)

Evaluation sets definition

We can now create an evaluation set for each algorithm we want to use in our test. For the Summarization evaluation set, replace with your own prompt according to the input signature identified earlier. fmeval uses $model_input as placeholder to get the input from your evaluation dataset. See the following code:

summarization_prompt = "Summarize the following text in one sentence: $model_input"

summarization_accuracy = SummarizationAccuracy()

evaluation_set_summarization = EvaluationSet(
  data_config_summarization,
  summarization_accuracy,
  summarization_prompt,
)

We are ready now to group the evaluation sets:

evaluation_list = [
    evaluation_set_summarization,
    evaluation_set_factual,
    evaluation_set_toxicity,
]

Evaluate and log to MLflow

We set up the MLflow experiment used to track the evaluations. We then create a new run for each model, and run all the evaluations for that model within that run, so that the metrics will all appear together. We use the model_id as the run name to make it straightforward to identify this run as part of a larger experiment, and run the evaluation using the run_evaluation_sets() defined in utils.py. See the following code:

run_name = f"{model_id}"

experiment_name = "fmeval-mlflow-simple-runs"
experiment = mlflow.set_experiment(experiment_name)

with mlflow.start_run(run_name=run_name) as run:
    run_evaluation_sets(model_runner, evaluation_list)

It is up to the user to decide how to best organize the results in MLflow. In fact, a second possible approach is to use nested runs. The sample notebooks implement both approaches to help you decide which one fits best your needs.

experiment_name = "fmeval-mlflow-nested-runs"
experiment = mlflow.set_experiment(experiment_name)

with mlflow.start_run(run_name=run_name, nested=True) as run:
    run_evaluation_sets_nested(model_runner, evaluation_list)

Run evaluations

Tracking the evaluation process involves storing information about three aspects:

The input dataset
The parameters of the model being evaluated
The scores for each evaluation

We provide a helper library (fmeval_mlflow) to abstract the logging of these aspects to MLflow, streamlining the interaction with the tracking server. For the information we want to store, we can refer to the following three functions:

log_input_dataset(data_config: DataConfig | list[DataConfig]) – Log one or more input datasets to MLflow for evaluation purposes
log_runner_parameters(model_runner: ModelRunner, custom_parameters_map: dict | None = None, model_id: str | None = None,) – Log the parameters associated with a given ModelRunner instance to MLflow
log_metrics(eval_output: list[EvalOutput], log_eval_output_artifact: bool = False) – Log metrics and artifacts for a list of SingleEvalOutput instances to MLflow.

When the evaluations are complete, we can analyze the results directly in the MLflow UI for a first visual assessment.

In the following screenshots, we show the visualization differences between logging using simple runs or nested runs.

You might want to create your own custom visualizations. For example, spider plots are often used to make visual comparison across multiple metrics. In the notebook compare_models.ipynb, we provide an example on how to use metrics stored in MLflow to generate such plots, which ultimately can also be stored in MLflow as part of your experiments. The following screenshots show some example visualizations.

Clean up

Once created, an MLflow tracking server will incur costs until you delete or stop it. Billing for tracking servers is based on the duration the servers have been running, the size selected, and the amount of data logged to the tracking servers. You can stop the tracking servers when they are not in use to save costs or delete them using the API or SageMaker Studio UI. For more details on pricing, see Amazon SageMaker pricing.

Similarly, if you deployed a model using SageMaker, endpoints are priced by deployed infrastructure time rather than by requests. You can avoid unnecessary charges by deleting your endpoints when you’re done with the evaluation.

Conclusion

In this post, we demonstrated how to create an evaluation framework for LLMs by combining SageMaker managed MLflow with FMEval. This integration provides a comprehensive solution for tracking and evaluating LLM performance across different aspects including accuracy, toxicity, and factual knowledge.

To enhance your evaluation journey, you can explore the following:

Get started with FMeval and SageMaker managed MLflow by following our code examples in the provided GitHub repository
Implement systematic evaluation practices in your LLM development workflow using the demonstrated approach
Use MLflow’s tracking capabilities to maintain detailed records of your evaluations, making your LLM development process more transparent and reproducible
Explore different evaluation metrics and datasets available in FMEval to comprehensively assess your LLM applications

By adopting these practices, you can build more reliable and trustworthy LLM applications while maintaining a clear record of your evaluation process and results.

About the authors

Paolo Di Francesco is a Senior Solutions Architect at Amazon Web Services (AWS). He holds a PhD in Telecommunications Engineering and has experience in software engineering. He is passionate about machine learning and is currently focusing on using his experience to help customers reach their goals on AWS, in particular in discussions around MLOps. Outside of work, he enjoys playing football and reading.

Dr. Alessandro Cerè is a GenAI Evaluation Specialist and Solutions Architect at AWS. He assists customers across industries and regions in operationalizing and governing their generative AI systems at scale, ensuring they meet the highest standards of performance, safety, and ethical considerations. Bringing a unique perspective to the field of AI, Alessandro has a background in quantum physics and research experience in quantum communications and quantum memories. In his spare time, he pursues his passion for landscape and underwater photography.

Create a SageMaker inference endpoint with custom model & extended container

January 27, 2025

by Aidan Ricci Amazon AWS

Amazon SageMaker provides a seamless experience for building, training, and deploying machine learning (ML) models at scale. Although SageMaker offers a wide range of built-in algorithms and pre-trained models through Amazon SageMaker JumpStart, there are scenarios where you might need to bring your own custom model or use specific software dependencies not available in SageMaker managed container images. Examples for this could include use cases like geospatial analysis, bioinformatics research, or quantum machine learning. In such cases, SageMaker allows you to extend its functionality by creating custom container images and defining custom model definitions. This approach enables you to package your model artifacts, dependencies, and inference code into a container image, which you can deploy as a SageMaker endpoint for real-time inference. This post walks you through the end-to-end process of deploying a single custom model on SageMaker using NASA’s Prithvi model. The Prithvi model is a first-of-its-kind temporal Vision transformer pre-trained by the IBM and NASA team on contiguous US Harmonised Landsat Sentinel 2 (HLS) data. It can be finetuned for image segmentation using the mmsegmentation library for use cases like burn scars detection, flood mapping, and multi-temporal crop classification. Due to its unique architecture and fine-tuning dependency on the MMCV library, it is an effective example of how to deploy complex custom models to SageMaker. We demonstrate how to use the flexibility of SageMaker to deploy your own custom model, tailored to your specific use case and requirements. Whether you’re working with unique model architectures, specialized libraries, or specific software versions, this approach empowers you to harness the scalability and management capabilities of SageMaker while maintaining control over your model’s environment and dependencies.

Solution overview

To run a custom model that needs unique packages as a SageMaker endpoint, you need to follow these steps:

If your model requires additional packages or package versions unavailable from the SageMaker managed container images, you will need to extend one of the container images. By extending a SageMaker managed container vs. creating one from scratch, you can focus on your specific use case and model development instead of the container infrastructure.
Write a Python model definition using the SageMaker inference.py file format.
Define your model artifacts and inference file within a specific file structure, archive your model files as a tar.gz file, and upload your files to Amazon Simple Storage Service (Amazon S3).
With your model code and an extended SageMaker container, use Amazon SageMaker Studio to create a model, endpoint configuration, and endpoint.
Query the inference endpoint to confirm your model is running correctly.

The following diagram illustrates the solution architecture and workflow:

Prerequisites

You need the following prerequisites before you can proceed. For this post, we use the us-east-1 AWS Region:

Have access to a POSIX based (Mac/Linux) system or SageMaker notebooks. This post doesn’t cover setting up SageMaker access and assumes a notebook accessible to the internet. However, this is not a security best practice and should not be done in production. To learn how to create a SageMaker notebook within a virtual private cloud (VPC), see Connect to SageMaker AI Within your VPC.
Make sure you have AWS Identity and Access Management (IAM) permissions for SageMaker access; S3 bucket create, read, and PutObject access; AWS CodeBuild access; Amazon Elastic Container Registry (Amazon ECR) repository access; and the ability to create IAM roles.

Download the Prithvi model artifacts and flood data fine-tuning:

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash 
sudo apt-get install git-lfs 
git lfs install 
git clone https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M 
git clone https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M-sen1floods11

Extend a SageMaker container image for your model

Although AWS provides pre-built container images optimized for deep learning on the AWS Deep Learning Containers (DLCs) GitHub for PyTorch and TensorFlow use cases, there are scenarios where models require additional libraries not included in these containers. The installation of these dependencies can take minutes or hours, so it’s more efficient to pre-build these dependencies into a custom container image. For this example, we deploy the Prithvi model, which is dependent on the MMCV library for advanced computer vision techniques. This library is not available within any of the SageMaker DLCs, so you will have to create an extended container to add it. Both MMCV and Prithvi are third-party models which have not undergone AWS security reviews, so please review these models yourself or use at your own risk. This post uses CodeBuild and a Docker Dockerfile to build the extended container.

Complete the following steps:

CodeBuild requires a source location containing the source code. Create an S3 bucket to serve as this source location using the following commands:

# generate a unique postfix 
BUCKET_POSTFIX=$(python3 -S -c "import uuid; print(str(uuid.uuid4().hex)[:10])") 
echo "export BUCKET_POSTFIX=${BUCKET_POSTFIX}" &gt;&gt; ~/.bashrc 
echo "Your bucket name will be customsagemakercontainer-codebuildsource-${BUCKET_POSTFIX}" 
# make your bucket 
aws s3 mb s3://customsagemakercontainer-codebuildsource-${BUCKET_POSTFIX}

Create an ECR repository to store the custom container image produced by the CodeBuild project. Record the repository URI as an environment variable.

CONTAINER_REPOSITORY_NAME="prithvi" 
aws ecr create-repository --repository-name ${CONTAINER_REPOSITORY_NAME} 
export REPOSITORY_URI=$(aws ecr describe-repositories --repository-names ${CONTAINER_REPOSITORY_NAME} --query 'repositories[0].repositoryUri' --output text)

Create a Dockerfile for the custom container. You use an AWS Deep Learning SageMaker framework container as the base image because it includes required dependencies such as SageMaker libraries, PyTorch, and CUDA.

This Docker container installs the Prithvi model and MMCV v1.6.2. These models are third-party models not produced by AWS and therefore may have security vulnerabilities. Use at your own risk.

cat > Dockerfile << EOF 
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker
WORKDIR /root
RUN DEBIAN_FRONTEND=noninteractive apt-get update -y
RUN DEBIAN_FRONTEND=noninteractive apt-get upgrade -y
RUN git clone https://github.com/NASA-IMPACT/hls-foundation-os.git
RUN wget https://github.com/open-mmlab/mmcv/archive/refs/tags/v1.6.2.tar.gz
RUN tar xvzf v1.6.2.tar.gz
WORKDIR /root/hls-foundation-os
RUN pip install -e . 
RUN pip install -U openmim 
WORKDIR /root/mmcv-1.6.2
RUN MMCV_WITH_OPS=1 pip install -e . -v

EOF

Create a buildspec file to define the build process for the CodeBuild project. This buildspec file will instruct CodeBuild to install the nvidia-container-toolkit to make sure the Docker container has GPU access, run the Dockerfile build, and push the built container image to your ECR repository.

cat > buildspec.yml << EOF
version: 0.2

phases:
    pre_build:
        commands:
        - echo Logging in to Amazon ECR...
        - IMAGE_TAG=sagemaker-gpu
        - curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
        - sudo yum install -y nvidia-container-toolkit
        - sudo nvidia-ctk runtime configure --runtime=docker
        - | 
          cat > /etc/docker/daemon.json << EOF
          {
              "runtimes": {
                  "nvidia": {
                      "args": [],
                      "path": "nvidia-container-runtime"
                  }
              },
              "default-runtime": "nvidia"
          }
          EOF
        - kill $(ps aux | grep -v grep | grep "/usr/local/bin/dockerd --host" | awk '{print $2}')
        - sleep 10
        - nohup /usr/local/bin/dockerd --host=unix:///var/run/docker.sock --host=tcp://127.0.0.1:2375 --storage-driver=overlay2 &
        - sleep 10
        - aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $REPOSITORY_URI_AWSDL
        - docker pull $REPOSITORY_URI_AWSDL/pytorch-inference:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker

    build:
        commands:
        - echo Build started at $(date)
        - echo Building the Docker image...
        - aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $REPOSITORY_URI
        - DOCKER_BUILDKIT=0 docker build -t $REPOSITORY_URI:latest .
        - docker tag $REPOSITORY_URI:latest $REPOSITORY_URI:$IMAGE_TAG

    post_build:
        commands:
        - echo Build completed at $(date)
        - echo Pushing the Docker images...
        - docker push $REPOSITORY_URI:latest
        - docker push $REPOSITORY_URI:$IMAGE_TAG

EOF

Zip and upload the Dockerfile and buildspec.yml files to the S3 bucket. This zip file will serve as the source code for the CodeBuild project.
- To install zip on a SageMaker notebook, run the following command:
```
sudo apt install zip
```
- With zip installed, run the following command:
```
zip prithvi_container_source.zip Dockerfile buildspec.yml
aws s3 cp prithvi_container_source.zip s3://customsagemakercontainer-codebuildsource-${BUCKET_POSTFIX}/
```

Create a CodeBuild service role so CodeBuild can access the required AWS services for the build.

First, create a file defining the role’s trust policy:

cat > create-role.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "codebuild.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

Create a file defining the service role’s permissions. This role has a few wildcard permissions (/* or *). These can give more permissions than needed and break the rule of least privilege. For more information about defining least privilege for production use cases, see Grant least privilege.

cat > put-role-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "CloudWatchLogsPolicy",
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:us-east-1:*:log-group:/aws/codebuild/*:*"
    },
    {
      "Sid": "S3GetObjectPolicy",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:GetObjectVersion"
      ],
      "Resource": [
        "arn:aws:s3:::customsagemakercontainer-codebuildsource-${BUCKET_POSTFIX}",
        "arn:aws:s3:::customsagemakercontainer-codebuildsource-${BUCKET_POSTFIX}/*"
        ]
    },
    {
      "Sid": "S3BucketIdentity",
      "Effect": "Allow",
      "Action": [
        "s3:GetBucketAcl",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::customsagemakercontainer-codebuildsource-${BUCKET_POSTFIX}"
    },
    {
      "Sid": "ECRAccess",
      "Effect": "Allow",
      "Action": [
        "ecr:GetImage",
        "ecr:BatchGetImage",
        "ecr:BatchCheckLayerAvailability",
        "ecr:CompleteLayerUpload",
        "ecr:UploadLayerPart",
        "ecr:GetDownloadUrlForLayer",
        "ecr:InitiateLayerUpload",
        "ecr:PutImage",
        "ecr:ListImages",
        "ecr:DescribeRepositories",
        "ecr:DescribeImages",
        "ecr:DescribeRegistry",
        "ecr:TagResource"
      ],
      "Resource": ["arn:aws:ecr:*:*:repository/prithvi",
                    "arn:aws:ecr:*:763104351884:repository/*"]
    },
    {
      "Sid": "ECRAuthToken",
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken"
      ],
      "Resource": "*"
    }
  ]
}
EOF

Create the CodeBuild service role:

aws iam create-role --role-name CodeBuildServiceRole --assume-role-policy-document file://create-role.json

Capture the name of the role Amazon Resource Name (ARN) from the CLI command response and record as an environment variable:
```
export CODEBUILD_SERVICE_ROLE_ARN=$(aws iam get-role --role-name CodeBuildServiceRole --query 'Role.Arn' --output text)
```

Attach the permission policies to the service role:

aws iam put-role-policy --role-name CodeBuildServiceRole --policy-name CodeBuildServiceRolePolicy --policy-document file://put-role-policy.json

Define the configurations for the CodeBuild build project using the build project JSON specification:

cat > codebuild-project.json << EOF
{
    "name": "prithvi-container-build",
    "description": "build process to modify the AWS SageMaker Deep Learning Container to include HLS/Prithvi",
    "source": {
      "type": "S3",
      "location": "customsagemakercontainer-codebuildsource-${BUCKET_POSTFIX}/prithvi_container_source.zip",
      "buildspec": "buildspec.yml",
      "insecureSsl": false
    },
    "artifacts": {
      "type": "NO_ARTIFACTS"
    },
    "cache": {
      "type": "NO_CACHE"
    },
    "environment": {
      "type": "LINUX_GPU_CONTAINER",
      "image": "aws/codebuild/amazonlinux2-x86_64-standard:5.0",
      "computeType": "BUILD_GENERAL1_SMALL",
      "environmentVariables": [
        {
          "name": "REPOSITORY_URI_AWSDL",
          "value": "763104351884.dkr.ecr.us-east-1.amazonaws.com",
          "type": "PLAINTEXT"
        },
        {
          "name": "AWS_REGION",
          "value": "us-east-1",
          "type": "PLAINTEXT"
        },
        {
            "name": "REPOSITORY_URI",
            "value": "$REPOSITORY_URI",
            "type": "PLAINTEXT"
          }
      ],
      "imagePullCredentialsType": "CODEBUILD",
      "privilegedMode": true
    },
    "serviceRole": "$CODEBUILD_SERVICE_ROLE_ARN",
    "timeoutInMinutes": 60,
    "queuedTimeoutInMinutes": 480,
    "logsConfig": {
      "cloudWatchLogs": {
        "status": "ENABLED"
      }
  }
}
EOF

Create the CodeBuild project using the codebuild-project.json specification defined in the previous step:
```
aws codebuild create-project --cli-input-json file://codebuild-project.json
```

Run a build for the CodeBuild project:

aws codebuild start-build --project-name prithvi-container-build

The build will take approximately 30 minutes to complete and cost approximately $1.50 to run. The CodeBuild compute instance type gpu1.small costs $0.05 per minute.

After you run the preceding command, you can press Ctrl+C to exit and run future commands. The build will already be running on AWS and will not be canceled by closing the command.

Monitor the status of the build using the following command and wait until you observe buildStatus=SUCCEEDED before proceeding to the next step:

export BUILD_ID=$(aws codebuild list-builds-for-project --project-name prithvi-container-build --query 'ids[0]' --output text) aws codebuild batch-get-builds --ids ${BUILD_ID} | grep buildStatus

After your CodeBuild project has completed, make sure that you do not close your terminal. The environment variables here will be used again.

Build your `inference.py` file

To run a custom model for inference on AWS, you need to build out an inference.py file that initializes your model, defines the input and output structure, and produces your inference results. In this file, you must define four functions:

model_fn – Initializes your model
input_fn – Defines how your data should be input and how to convert to a usable format
predict_fn – Takes the input data and receives the prediction
output_fn – Converts the prediction into an API call format

We use the following completed inference.py file for the SageMaker endpoint in this post. Download this inference.py to continue because it includes the helper functions to process the TIFF files needed for this model’s input. The following code is contained within the inference.py and is only shown to provide an explanation of what is being done in the file.

`model_fn`

The model_fn function builds your model, which is called and used within the predict_fn function. This function loads the model weights into a torch model checkpoint, opens the model config, defines global variables, instantiates the model, loads the model checkpoint into the model, and returns the model.

def model_fn(model_dir):
    # implement custom code to load the model
    # load weights
    weights_path = "./code/prithvi/Prithvi_100M.pt"
    checkpoint = torch.load(weights_path, map_location="cpu")

    # read model config
    model_cfg_path = "./code/prithvi/Prithvi_100M_config.yaml"
    with open(model_cfg_path) as f:
        model_config = yaml.safe_load(f)

    model_args, train_args = model_config["model_args"], model_config["train_params"]

    global means
    global stds
    means = np.array(train_args["data_mean"]).reshape(-1, 1, 1)
    stds = np.array(train_args["data_std"]).reshape(-1, 1, 1)
    # let us use only 1 frame for now (the model was trained on 3 frames)
    model_args["num_frames"] = 1

    # instantiate model
    model = MaskedAutoencoderViT(**model_args)
    model.eval()

    # load weights into model
    # strict=false because we are loading with only 1 frame, but the warning is expected
    del checkpoint['pos_embed']
    del checkpoint['decoder_pos_embed']
    _ = model.load_state_dict(checkpoint, strict=False)
    
    return model

`input_fn`

This function defines the expected input for the model and how to load the input for use in predict_fn. The endpoint expects a string URL path linked to a TIFF file you can find online from the Prithvi demo on Hugging Face. This function also defines the content type of the request sent in the body (such as application/json, image/tiff).

def input_fn(input_data, content_type):
    # decode the input data  (e.g. JSON string -> dict)
    # statistics used to normalize images before passing to the model
    raster_data = load_raster(input_data, crop=(224, 224))
    return raster_data

`predict_fn`:

In predict_fn, you create the prediction from the given input. In this case, creating the prediction image uses two helper functions specific to this endpoint (preprocess_image and enhance_raster_for_visualization). You can find both functions here. The preprocess_image function normalizes the image, then the function uses torch.no_grad to disable gradient calculations for the model. This is useful during inference to decrease inference time and reduce memory usage. Next, the function collects the prediction from the instantiated model. The mask ratio determines the number of pixels on the image zeroed out during inference. The two unpatchify functions convert the smaller patchified results produced by the model back to the original image space. The function normalized.clone() clones the normalized images and replaces the masked Regions from rec_img with the Regions from the pred_img. Finally, the function reshapes the image back into TIFF format, removes the normalization, and returns the image in raster format, which is valuable for visualization. The result of this is an image that can be converted to bytes for the user and then visualized on the user’s screen.

def predict_fn(data, model):
    normalized = preprocess_image(data)
    with torch.no_grad():
        mask_ratio = 0.5
        _, pred, mask = model(normalized, mask_ratio=mask_ratio)
        mask_img = model.unpatchify(mask.unsqueeze(-1).repeat(1, 1, pred.shape[-1])).detach().cpu()
        pred_img = model.unpatchify(pred).detach().cpu()#.numpy()
        rec_img = normalized.clone()
        rec_img[mask_img == 1] = pred_img[mask_img == 1]
        rec_img_np = (rec_img.numpy().reshape(6, 224, 224) * stds) + means
        print(rec_img_np.shape)
        return enhance_raster_for_visualization(rec_img_np, ref_img=data)

`output_fn`

output_fn returns the TIFF image received from predict_fn as an array of bytes.

def output_fn(prediction, accept):
    print(prediction.shape)
    return prediction.tobytes()

Test your `inference.py` file

Now that you have downloaded the complete inference.py file, there are two options to test your model before compressing the files and uploading them to Amazon S3:

Test the inference.py functions on your custom container within an Amazon Elastic Compute Cloud (Amazon EC2) instance
Test your endpoint on a local mode SageMaker endpoint (requires a GPU or GPU-based workspace for this model)

Model file structure, tar.gz compressing, and S3 upload

Before you start this step, download the Prithvi model artifacts and the Prithvi flood fine-tuning of the model. The first link will provide all of the model data from the base Prithvi model, and the flood fine-tuning of the model builds upon the model to perform flood plain detection on satellite images. Install git-lfs using brew on Mac or using https://git-lfs.com/ on Windows to install the GitHub repo’s large files.

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
git lfs install

git clone https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M
cd Prithvi-100M
git lfs pull
git checkout c2435efd8f92329d3ca79fc8a7a70e21e675a650
git clone https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M-sen1floods11
cd Prithvi-100M-sen1floods11
git lfs pull
git checkout d48937bf588cd506dd73bc4deca543446ca5530d

To create a SageMaker model on the SageMaker console, you must store your model data within Amazon S3 because your SageMaker endpoint will pull your model artifacts directly from Amazon S3 using a tar.gz format. Within your tar.gz file, the data must have a specific file format defined by SageMaker. The following is the file structure for the Prithvi foundation model (our requirements are installed on the container, so requirements.txt has been left intentionally blank):

./model
./model/code/inference.py
./model/code/sen1floods11_Prithvi_100M.py (extended model config)
./model/code/sen1floods11_Prithvi_100M.pth (extended model weights)
./model/code/requirements.txt
./model/code/prithvi/Prithvi_100M.pt (extended model weights)
./model/code/prithvi/Prithvi_100M_config.yaml (model config)
./model/code/prithvi/Prithvi.py (model)

This folder structure remains true for other models as well. The /code folder must hold the inference.py file and any files used within inference.py. These additional files are generally model artifacts (configs, weights, and so on). In our case, this will be the whole Prithvi base model folder as well as the weights and configs for the fine-tuned version we will use. Because we have already installed these packages within our container, this is not used; however, there still must be a requirements.txt file, otherwise your endpoint will fail to build. All other files belong in the root folder.

With the preceding file structure in place, open your terminal and route into the model folder.

Run the following command in your terminal:
```
tar -czvf model.tar.gz ./
```
The command will create a compressed version of your model files called model.tar.gz from the files in your current directory. You can now upload this file into an S3 bucket.
If using SageMaker, run the following command:
```
sudo apt-get install uuid-runtime
```

Now create a new S3 bucket. The following CLI commands create an S3 bucket and upload your model.tar.gz file:

# generate a unique postfix

BUCKET_POSTFIX=$(uuidgen --random | cut -d'-' -f1)
echo "export BUCKET_POSTFIX=${BUCKET_POSTFIX}" >> ~/.bashrc
echo "Your bucket name will be customsagemakercontainer-model-${BUCKET_POSTFIX}"

# make your bucket
aws s3 mb s3://customsagemakercontainer-model-${BUCKET_POSTFIX}

# upload to your bucket
aws s3 cp model.tar.gz s3://customsagemakercontainer-model-${BUCKET_POSTFIX}/model.tar.gz

The file you uploaded will be used in the next step to define the model to be created in the endpoint.

Create SageMaker model, SageMaker endpoint configuration, and SageMaker endpoint

You now create a SageMaker inference endpoint using the CLI. There are three steps to creating a SageMaker endpoint: create a model, create an endpoint configuration, and create an endpoint.

In this post, you will create a public SageMaker endpoint because this will simplify running and testing the endpoint. For details about how to limit access to SageMaker endpoints, refer to Deploy models with SageMaker Studio.

Complete the following steps:

Get the ECR repository’s ARN:

export REPOSITORY_ARN=$(aws ecr describe-repositories --repository-names ${CONTAINER_REPOSITORY_NAME} --query 'repositories[0].repositoryArn' --output text)

Create a role for the SageMaker service to assume. Create a file defining the role’s trust policy.

cat > create-sagemaker-role.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "sagemaker.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

Create a file defining the service role’s permissions:

cat > put-sagemaker-role-policy.json << EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::customsagemakercontainer-model-${BUCKET_POSTFIX}"
            ]
        },
        {
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::customsagemakercontainer-model-${BUCKET_POSTFIX}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:BatchPutMetrics",
                "ecr:GetAuthorizationToken",
                "ecr:ListImages"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage"
            ],
            "Resource": [
                "${REPOSITORY_ARN}"
            ]
        },
        {
            "Effect": "Allow",
            "Action": "cloudwatch:PutMetricData",
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "cloudwatch:namespace": [
                        "*SageMaker*",
                        "*Sagemaker*",
                        "*sagemaker*"
                    ]
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "logs:PutLogEvents",
                "logs:CreateLogGroup",
                "logs:DescribeLogStreams"
            ],
            "Resource": "arn:aws:logs:*:*:log-group:/aws/sagemaker/*"
        }
    ]
}
EOF

Create the SageMaker service role:

aws iam create-role --role-name SageMakerInferenceRole --assume-role-policy-document file://create-sagemaker-role.json

export SAGEMAKER_INFERENCE_ROLE_ARN=$(aws iam get-role --role-name SageMakerInferenceRole --query 'Role.Arn' --output text)

Attach the permission policies to the service role:
```
aws iam put-role-policy --role-name SageMakerInferenceRole --policy-name SageMakerInferenceServiceRolePolicy --policy-document file://put-sagemaker-role-policy.json
```
The model definition will include the role you created, the ECR container image, and the Amazon S3 location of the model.tar.gz file that you created previously.

Create a JSON file that defines the model and run the create-model command:

cat > create_model.json << EOF
{
    "ModelName": "prithvi",
    "PrimaryContainer": {
        "Image": "$REPOSITORY_URI:latest",
        "ModelDataUrl": "s3://customsagemakercontainer-model-${BUCKET_POSTFIX}/model.tar.gz"
    },
    "ExecutionRoleArn": "$SAGEMAKER_INFERENCE_ROLE_ARN"
}
EOF

aws sagemaker create-model --cli-input-json file://create_model.json

A SageMaker endpoint configuration specifies the infrastructure that the model will be hosted on. The model will be hosted on a ml.g4dn.xlarge instance for GPU-based acceleration.

Create the endpoint configuration JSON file and create the SageMaker endpoint configuration:

cat > create_endpoint_config.json << EOF
{
    "EndpointConfigName": "prithvi-endpoint-config",
    "ProductionVariants": [
        {
            "VariantName": "default-variant",
            "ModelName": "prithvi",
            "InitialInstanceCount": 1,
            "InstanceType": "ml.g4dn.xlarge",
            "InitialVariantWeight": 1.0
        }
    ]
}
EOF

aws sagemaker create-endpoint-config --cli-input-json file://create_endpoint_config.json

Create the SageMaker endpoint by referencing the endpoint configuration created in the previous step:
```
aws sagemaker create-endpoint --endpoint-name prithvi-endpoint --endpoint-config-name prithvi-endpoint-config
```
The ml.g4dn.xlarge inference endpoint will cost $0.736 per hour while running. It will take several minutes for the endpoint to finish deploying.

Check the status using the following command, waiting for it to return InService:

aws sagemaker describe-endpoint --endpoint-name prithvi-endpoint --query "EndpointStatus" --output text

When the endpoint’s status is InService, proceed to the next section.

Test your custom SageMaker inference endpoint

To test your SageMaker endpoint, you will query your endpoint with an image and display it. The following command sends a URL that references a TIFF image to the SageMaker endpoint, the model sends back a byte array, and the command reforms the byte array into an image. Open up a notebook locally or on Sagemaker Studio JupyterLab. The below code will need to be run outside of the command line to view the image

from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONSerializer

payload = "https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-1.0-100M/resolve/main/examples/HLS.L30.T13REN.2018013T172747.v2.0.B02.B03.B04.B05.B06.B07_cropped.tif"

predictor = Predictor(endpoint_name="prithvi-endpoint")
predictor.serializer = JSONSerializer()

predictions = predictor.predict(payload)

This Python code creates a predictor object for your endpoint and sets the predictor’s serializer to NumPy for the conversion on the endpoint. It queries the predictor object using a payload of a URL pointing to a TIFF image. You use a helper function to display the image and enhance the raster. You will be able to find that helper function here. After you add the helper function, display the image:

import numpy as np
import matplotlib.pyplot as plt

NO_DATA_FLOAT = 0.0001
PERCENTILES = (0.1, 99.9)
NO_DATA = -9999

nppred = np.frombuffer(predictions).reshape((224, 224, 3))

def enhance_raster_for_visualization(raster, ref_img=None):
    if ref_img is None:
        ref_img = raster
    channels = []
    for channel in range(raster.shape[0]):
        valid_mask = np.ones_like(ref_img[channel], dtype=bool)
        valid_mask[ref_img[channel] == NO_DATA_FLOAT] = False
        mins, maxs = np.percentile(ref_img[channel][valid_mask], PERCENTILES)
        normalized_raster = (raster[channel] - mins) / (maxs - mins)
        normalized_raster[~valid_mask] = 0
        clipped = np.clip(normalized_raster, 0, 1)
        channels.append(clipped)
    clipped = np.stack(channels)
    channels_last = np.moveaxis(clipped, 0, -1)[..., :3]
    rgb = channels_last[..., ::-1]
    return rgb
    
raster_for_visualization = enhance_raster_for_visualization(nppred)
plt.imshow(nppred)
plt.show()

You should observe an image that has been taken from a satellite.

Clean up

To clean up the resources from this post and avoid incurring costs, follow these steps:

Delete the SageMaker endpoint, endpoint configuration, and model.
Delete the ECR image and repository.
Delete the model.tar.gz file in the S3 bucket that was created.
Delete the customsagemakercontainer-model and customsagemakercontainer-codebuildsource S3 buckets.

Conclusion

In this post, we extended a SageMaker container to include custom dependencies, wrote a Python script to run a custom ML model, and deployed that model on the SageMaker container within a SageMaker endpoint for real-time inference. This solution produces a running GPU-enabled endpoint for inference queries. You can use this same process to create custom model SageMaker endpoints by extending other SageMaker containers and writing an inference.py file for new custom models. Furthermore, with adjustments, you could create a multi-model SageMaker endpoint for custom models or run a batch processing endpoint for scenarios where you run large batches of queries at once. These solutions enable you to go beyond the most popular models used today and customize models to fit your own unique use case.

About the Authors

Aidan is a solutions architect supporting US federal government health customers. He assists customers by developing technical architectures and providing best practices on Amazon Web Services (AWS) cloud with a focus on AI/ML services. In his free time, Aidan enjoys traveling, lifting, and cooking

Nate is a solutions architect supporting US federal government sciences customers. He assists customers in developing technical architectures on Amazon Web Services (AWS), with a focus on data analytics and high performance computing. In his free time, he enjoys skiing and golfing.

Charlotte is a solutions architect on the aerospace & satellite team at Amazon Web Services (AWS), where she helps customers achieve their mission objectives through innovative cloud solutions. Charlotte specializes in machine learning with a focus on Generative AI. In her free time, she enjoys traveling, painting, and running.

Image and video prompt engineering for Amazon Nova Canvas and Amazon Nova Reel

January 27, 2025

by Yanyan Zhang Amazon AWS

Amazon has introduced two new creative content generation models on Amazon Bedrock: Amazon Nova Canvas for image generation and Amazon Nova Reel for video creation. These models transform text and image inputs into custom visuals, opening up creative opportunities for both professional and personal projects. Nova Canvas, a state-of-the-art image generation model, creates professional-grade images from text and image inputs, ideal for applications in advertising, marketing, and entertainment. Similarly, Nova Reel, a cutting-edge video generation model, supports the creation of short videos from text and image inputs, complete with camera motion controls using natural language. Although these models are powerful tools for creative expression, their effectiveness relies heavily on how well users can communicate their vision through prompts.

This post dives deep into prompt engineering for both Nova Canvas and Nova Reel. We share proven techniques for describing subjects, environments, lighting, and artistic style in ways the models can understand. For Nova Reel, we explore how to effectively convey camera movements and transitions through natural language. Whether you’re creating marketing content, prototyping designs, or exploring creative ideas, these prompting strategies can help you unlock the full potential of the visual generation capabilities of Amazon Nova.

Solution overview

To get started with Nova Canvas and Nova Reel, you can either use the Image/Video Playground on the Amazon Bedrock console or access the models through APIs. For detailed setup instructions, including account requirements, model access, and necessary permissions, refer to Creative content generation with Amazon Nova.

Generate images with Nova Canvas

Prompting for image generation models like Amazon Nova Canvas is both an art and a science. Unlike large language models (LLMs), Nova Canvas doesn’t reason or interpret command-based instructions explicitly. Instead, it transforms prompts into images based on how well the prompt captures the essence of the desired image. Writing effective prompts is key to unlocking the full potential of the model for various use cases. In this post, we explore how to craft effective prompts, examine essential elements of good prompts, and dive into typical use cases, each with compelling example prompts.

Essential elements of good prompts

A good prompt serves as a descriptive image caption rather than command-based instructions. It should provide enough detail to clearly describe the desired outcome while maintaining brevity (limited to 1,024 characters). Instead of giving command-based instructions like “Make a beautiful sunset,” you can achieve better results by describing the scene as if you’re looking at it: “A vibrant sunset over mountains, with golden rays streaming through pink clouds, captured from a low angle.” Think of it as painting a vivid picture with words to guide the model effectively.

Effective prompts start by clearly defining the main subject and action:

Subject – Clearly define the main subject of the image. Example: “A blue sports car parked in front of a grand villa.”
Action or pose – Specify what the subject is doing or how it is positioned. Example: “The car is angled slightly towards the camera, its doors open, showcasing its sleek interior.”

Add further context:

Environment – Describe the setting or background. Example: “A grand villa overlooking Lake Como, surrounded by manicured gardens and sparkling lake waters.”

After you define the main focus of the image, you can refine the prompt further by specifying additional attributes such as visual style, framing, lighting, and technical parameters. For instance:

Lighting – Include lighting details to set the mood. Example: “Soft, diffused lighting from a cloudy sky highlights the car’s glossy surface and the villa’s stone facade.”
Camera position and framing – Provide information about perspective and composition. Example: “A wide-angle shot capturing the car in the foreground and the villa’s grandeur in the background, with Lake Como visible beyond.”
Style – Mention the visual style or medium. Example: “Rendered in a cinematic style with vivid, high-contrast details.”

Finally, you can use a negative prompt to exclude specific elements or artifacts from your composition, aligning it more closely to your vision:

Negative prompt – Use the negativeText parameter to exclude unwanted elements in your main prompt. For instance, to remove any birds or people from the image, rather than adding “no birds or people” to the prompt, add “birds, people” to the negative prompt instead. Avoid using negation words like “no,” “not,” or “without” in both the prompt and negative prompt because they might lead to unintended consequences.

Let’s bring all these elements together with an example prompt with a basic and enhanced version that shows these techniques in action.

Basic: A car in front of a house

Enhanced: A blue luxury sports car parked in front of a grand villa overlooking Lake Como. The setting features meticulously manicured gardens, with the lake’s sparkling waters and distant mountains in the background. The car’s sleek, polished surface reflects the surrounding elegance, enhanced by soft, diffused lighting from a cloudy sky. A wide-angle shot capturing the car, villa, and lake in harmony, rendered in a cinematic style with vivid, high-contrast details.

Example image generation prompts

By mastering these techniques, you can create stunning visuals for a wide range of applications using Amazon Nova Canvas. The following images have been generated using Nova Canvas at a 1280×720 pixel resolution with a CFG scale of 6.5 and seed of 0 for reproducibility (see Request and response structure for image generation for detailed explanations of these parameters). This resolution also matches the image dimensions expected by Nova Reel, allowing seamless integration for video experimentation. Let’s examine some example prompts and their resulting images to see these techniques in action.


Aerial view of sparse arctic tundra landscape, expansive white terrain with meandering frozen rivers and scattered rock formations. High-contrast black and white composition showcasing intricate patterns of ice and snow, emphasizing texture and geological diversity. Bird’s-eye perspective capturing the abstract beauty of the arctic wilderness.	An overhead shot of premium over-ear headphones resting on a reflective surface, showcasing the symmetry of the design. Dramatic side lighting accentuates the curves and edges, casting subtle shadows that highlight the product’s premium build quality

An angled view of a premium matte metal water bottle with bamboo accents, showcasing its sleek profile. The background features a soft blur of a serene mountain lake. Golden hour sunlight casts a warm glow on the bottle’s surface, highlighting its texture. Captured with a shallow depth of field for product emphasis.	Watercolor scene of a cute baby dragon with pearlescent mint-green scales crouched at the edge of a garden puddle, tiny wings raised. Soft pastel flowers and foliage frame the composition. Loose, wet-on-wet technique for a dreamy atmosphere, with sunlight glinting off ripples in the puddle.

Abstract figures emerging from digital screens, glitch art aesthetic with RGB color shifts, fragmented pixel clusters, high contrast scanlines, deep shadows cast by volumetric lighting	An intimate portrait of a seasoned fisherman, his face filling the frame. His thick gray beard is flecked with sea spray, and his knit cap is pulled low over his brow. The warm glow of sunset bathes his weathered features in golden light, softening the lines of his face while still preserving the character earned through years at sea. His eyes reflect the calm waters of the harbor behind him.

Generate video with Nova Reel

Video generation models work best with descriptive prompts rather than command-based instructions. When crafting prompts, focus on what you want to see rather than telling the model what to do. Think of writing a detailed caption or scene description like you’re explaining a video that already exists. For example, describing elements like the main subjects, what they’re doing, the setting around them, how the scene is lit, the overall artistic style, and any camera movements can help create more accurate results. The key is to paint a complete picture through description rather than giving step-by-step directions. This means instead of saying “create a dramatic scene,” you could describe “a stormy beach at sunset with crashing waves and dark clouds, filmed with a slow aerial shot moving over the coastline.” The more specific and descriptive you can be about the visual elements you want, the better the output will be. You might want to include details about the subject, action, environment, lighting, style, and camera motion.

When writing a video generation prompt for Nova Reel, be mindful of the following requirements and best practices:

Prompts must be no longer than 512 characters.
For best results, place camera movement descriptions at the start or end of your prompt.
Specify what you want to include rather than what to exclude. For example, instead of “fruit basket with no bananas,” say “fruit basket with apples, oranges, and pears.”

When describing camera movements in your video prompts, be specific about the type of motion you want—whether it’s a smooth dolly shot (moving forward/backward), a pan (sweeping left/right), or a tilt (moving up/down). For more dynamic effects, you can request aerial shots, orbit movements, or specialized techniques like dolly zooms. You can also specify the speed of the movement; refer to camera controls for more ideas.

Example video generation prompts

Let’s look at some of the outputs from Amazon Nova Reel.


Track right shot of single red balloon floating through empty subway tunnel. Balloon glows from within, casting soft red light on concrete walls. Cinematic 4k, moody lighting	Dolly in shot of peaceful deer drinking from forest stream. Sunlight filtering through and bokeh of other deer and forest plants. 4k cinematic.

Pedestal down in a pan in modern kitchen penne pasta with heavy cream white sauce on top, mushrooms and garlic, steam coming out.	Orbit shot of crystal light bulb centered on polished marble surface, gentle floating gears spinning inside with golden glow. Premium lighting. 4k cinematic.

Generate image-based video using Nova Reel

In addition to the fundamental text-to-video, Amazon Nova Reel also supports an image-to-video feature that allows you to use an input reference image to guide video generation. Using image-based prompts for video generation offers significant advantages: it speeds up your creative iteration process and provides precise control over the final output. Instead of relying solely on text descriptions, you can visually define exactly how you want your video to start. These images can come from Nova Canvas or from other sources where you have appropriate rights to use them. This approach has two main strategies:

Simple camera motion – Start with your reference image to establish the visual elements and style, then add minimal prompts focused purely on camera movement, such as “dolly forward.” This approach keeps the scene largely static while creating dynamic motion through camera direction.
Dynamic transformation – This strategy involves describing specific actions and temporal changes in your scene. Detail how elements should evolve over time, framing your descriptions as summaries of the desired transformation rather than step-by-step commands. This allows for more complex scene evolution while maintaining the visual foundation established by your reference image.

This method streamlines your creative workflow by using your image as the visual foundation. Rather than spending time refining text prompts to achieve the right look, you can quickly create an image in Nova Canvas (or source it elsewhere) and use it as your starting point in Nova Reel. This approach leads to faster iterations and more predictable results compared to pure text-to-video generation.

Let’s take some images we created earlier with Amazon Nova Canvas and use them as reference frames to generate videos with Amazon Nova Reel.


Slow dolly forward	A 3d animated film: A mint-green baby dragon is talking dramatically. Emotive, high quality animation.

Track right of a premium matte metal water bottle with bamboo accents with background serene mountain lake ripples moving	Orbit shot of premium over-ear headphones on a reflective surface. Dramatic side lighting accentuates the curves and edges, casting subtle shadows that highlight the product’s premium build quality.

Tying it all together

You can transform your storytelling by combining multiple generated video clips into a captivating narrative. Although Nova Reel handles the video generation, you can further enhance the content using preferred video editing software to add thoughtful transitions, background music, and narration. This combination creates an immersive journey that transports viewers into compelling storytelling. The following example showcases Nova Reel’s capabilities in creating engaging visual narratives. Each clip is crafted with professional lighting and cinematic quality.

Best practices

Consider the following best practices:

Refine iteratively – Start simple and refine your prompt based on the output.
Be specific – Provide detailed descriptions for better results.
Diversify adjectives – Avoid overusing generic descriptors like “beautiful” or “amazing;” opt for specific terms like “serene” or “ornate.”
Refine prompts with AI – Use multimodal understanding models like Amazon Nova (Pro, Lite, or Micro) to help you convert your high-level idea into a refined prompt based on best practices. Generative AI can also help you quickly generate different variations of your prompt to help you experiment with different combinations of style, framing, and more, as well as other creative explorations. Experiment with community-built tools like the Amazon Nova Canvas Prompt Refiner in PartyRock, the Amazon Nova Canvas Prompting Assistant, and the Nova Reel Prompt Optimizer.
Maintain a templatized prompt library – Keep a catalog of effective prompts and their results to refine and adapt over time. Create reusable templates for common scenarios to save time and maintain consistency.
Learn from others – Explore community resources and tools to see effective prompts and adapt them.
Follow trends – Monitor updates or new model features, because prompt behavior might change with new capabilities.

Fundamentals of prompt engineering for Amazon Nova

Effective prompt engineering for Amazon Nova Canvas and Nova Reel follows an iterative process, designed to refine your input and achieve the desired output. This process can be visualized as a logical flow chart, as illustrated in the following diagram.

The process consists of the following steps:

Begin by crafting your initial prompt, focusing on descriptive elements such as subject, action, environment, lighting, style, and camera position or motion.
After generating the first output, assess whether it aligns with your vision. If the result is close to what you want, proceed to the next step. If not, return to Step 1 and refine your prompt.
When you’ve found a promising direction, maintain the same seed value. This maintains consistency in subsequent generations while you make minor adjustments.
Make small, targeted changes to your prompt. These could involve adjusting descriptors, adding or removing elements, and more.
Generate a new output with your refined prompt, keeping the seed value constant.
Evaluate the new output. If it’s satisfactory, move to the next step. If not, return to Step 4 and continue refining.
When you’ve achieved a satisfactory result, experiment with different seed values to create variations of your successful prompt. This allows you to explore subtle differences while maintaining the core elements of your desired output.
Select the best variations from your generated options as your final output.

This iterative approach allows for systematic improvement of your prompts, leading to more accurate and visually appealing results from Nova Canvas and Nova Reel models. Remember, the key to success lies in making incremental changes, maintaining consistency where needed, and being open to exploring variations after you’ve found a winning formula.

Conclusion

Understanding effective prompt engineering for Nova Canvas and Nova Reel unlocks exciting possibilities for creating stunning images and compelling videos. By following the best practices outlined in this guide—from crafting descriptive prompts to iterative refinement—you can transform your ideas into production-ready visual assets.

Ready to start creating? Visit the Amazon Bedrock console today to experiment with Nova Canvas and Nova Reel in the Amazon Bedrock Playground or using the APIs. For detailed specifications, supported features, and additional examples, refer to the following resources:

About the authors

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Kris Schultz has spent over 25 years bringing engaging user experiences to life by combining emerging technologies with world class design. In his role as Senior Product Manager, Kris helps design and build AWS services to power Media & Entertainment, Gaming, and Spatial Computing.

Sanju Sunny is a Digital Innovation Specialist with Amazon ProServe. He engages with customers in a variety of industries around Amazon’s distinctive customer-obsessed innovation mechanisms in order to rapidly conceive, validate and prototype new products, services and experiences.

Nitin Eusebius is a Sr. Enterprise Solutions Architect at AWS, experienced in Software Engineering, Enterprise Architecture, and AI/ML. He is deeply passionate about exploring the possibilities of generative AI. He collaborates with customers to help them build well-architected applications on the AWS platform, and is dedicated to solving technology challenges and assisting with their cloud journey.

Security best practices to consider while fine-tuning models in Amazon Bedrock

January 24, 2025

by Vishal Naik Amazon AWS

Amazon Bedrock has emerged as the preferred choice for tens of thousands of customers seeking to build their generative AI strategy. It offers a straightforward, fast, and secure way to develop advanced generative AI applications and experiences to drive innovation.

With the comprehensive capabilities of Amazon Bedrock, you have access to a diverse range of high-performing foundation models (FMs), empowering you to select the most suitable option for your specific needs, customize the model privately with your own data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and create managed agents that run complex business tasks.

Fine-tuning pre-trained language models allows organizations to customize and optimize the models for their specific use cases, providing better performance and more accurate outputs tailored to their unique data and requirements. By using fine-tuning capabilities, businesses can unlock the full potential of generative AI while maintaining control over the model’s behavior and aligning it with their goals and values.

In this post, we delve into the essential security best practices that organizations should consider when fine-tuning generative AI models.

Security in Amazon Bedrock

Cloud security at AWS is the highest priority. Amazon Bedrock prioritizes security through a comprehensive approach to protect customer data and AI workloads.

Amazon Bedrock is built with security at its core, offering several features to protect your data and models. The main aspects of its security framework include:

Access control – This includes features such as:
- Fine-grained access control using AWS Identity and Access Management (IAM)
- Resource-based policies to control access to specific Amazon Bedrock resources
Data encryption – Amazon Bedrock offers the following encryption:
- Data at rest is encrypted using AWS Key Management Service (AWS KMS)
- Data in transit is encrypted using TLS 1.2 or higher
Network security – Amazon Bedrock offers several security options, including:
- Support for AWS PrivateLink to establish private connectivity between your virtual private cloud (VPC) and Amazon Bedrock
- VPC endpoints for secure communication within your AWS environment
Compliance – Amazon Bedrock is in alignment with various industry standards and regulations, including HIPAA, SOC, and PCI DSS

Solution overview

Model customization is the process of providing training data to a model to improve its performance for specific use cases. Amazon Bedrock currently offers the following customization methods:

Continued pre-training – Enables tailoring an FM’s capabilities to specific domains by fine-tuning its parameters with unlabeled, proprietary data, allowing continuous improvement as more relevant data becomes available.
Fine-tuning – Involves providing labeled data to train a model on specific tasks, enabling it to learn the appropriate outputs for given inputs. This process adjusts the model’s parameters, enhancing its performance on the tasks represented by the labeled training dataset.
Distillation – Process of transferring knowledge from a larger more intelligent model (known as teacher) to a smaller, faster, cost-efficient model (known as student).

Model customization in Amazon Bedrock involves the following actions:

Create training and validation datasets.
Set up IAM permissions for data access.
Configure a KMS key and VPC.
Create a fine-tuning or pre-training job with hyperparameter tuning.
Analyze results through metrics and evaluation.
Purchase provisioned throughput for the custom model.
Use the custom model for tasks like inference.

In this post, we explain these steps in relation to fine-tuning. However, you can apply the same concepts for continued pre-training as well.

The following architecture diagram explains the workflow of Amazon Bedrock model fine-tuning.

The workflow steps are as follows:

The user submits an Amazon Bedrock fine-tuning job within their AWS account, using IAM for resource access.
The fine-tuning job initiates a training job in the model deployment accounts.
To access training data in your Amazon Simple Storage Service (Amazon S3) bucket, the job employs Amazon Security Token Service (AWS STS) to assume role permissions for authentication and authorization.
Network access to S3 data is facilitated through a VPC network interface, using the VPC and subnet details provided during job submission.
The VPC is equipped with private endpoints for Amazon S3 and AWS KMS access, enhancing overall security.
The fine-tuning process generates model artifacts, which are stored in the model provider AWS account and encrypted using the customer-provided KMS key.

This workflow provides secure data handling across multiple AWS accounts while maintaining customer control over sensitive information using customer managed encryption keys.

The customer is in control of the data; model providers don’t have access to the data, and they don’t have access to a customer’s inference data or their customization training datasets. Therefore, data will not be available to model providers for them to improve their base models. Your data is also unavailable to the Amazon Bedrock service team.

In the following sections, we go through the steps of fine-tuning and deploying the Meta Llama 3.1 8B Instruct model in Amazon Bedrock using the Amazon Bedrock console.

Prerequisites

Before you get started, make sure you have the following prerequisites:

An AWS account
An IAM federation role with access to do the following:
- Create, edit, view, and delete VPC network and security resources
- Create, edit, view, and delete KMS keys
- Create, edit, view, and delete IAM roles and policies for model customization
- Create, upload, view, and delete S3 buckets to access training and validation data and permission to write output data to Amazon S3
- List FMs on the base model that will be used for fine-tuning
- Create a custom training job for the Amazon Bedrock FM
- Provisioned model throughputs
- List custom models and invoke model permissions on the fine-tuned model
Model access, which you can request through the Amazon Bedrock console

For this post, we use the us-west-2 AWS Region. For instructions on assigning permissions to the IAM role, refer to Identity-based policy examples for Amazon Bedrock and How Amazon Bedrock works with IAM.

Prepare your data

To fine-tune a text-to-text model like Meta Llama 3.1 8B Instruct, prepare a training and optional validation dataset by creating a JSONL file with multiple JSON lines.

Each JSON line is a sample containing a prompt and completion field. The format is as follows:

{"prompt": "<prompt1>", "completion": "<expected generated text>"}
{"prompt": "<prompt2>", "completion": "<expected generated text>"}

The following is an example from a sample dataset used as one-line input for fine-tuning Meta Llama 3.1 8B Instruct in Amazon Bedrock. In JSONL format, each record is one text line.

{"prompt": "consumer complaints and resolutions for financial products", "completion": "{'Date received': '01/01/24', 'Product': 'Credit card', 'Sub-product': 'Store credit card', 'Issue': 'Other features, terms, or problems', 'Sub-issue': 'Other problem', 'Consumer complaint narrative': None, 'Company public response': None, 'Company': 'Bread Financial Holdings, Inc.', 'State': 'MD', 'ZIP code': '21060', 'Tags': 'Servicemember', 'Consumer consent provided?': 'Consent not provided', 'Submitted via': 'Web', 'Date sent to company': '01/01/24', 'Company response to consumer': 'Closed with non-monetary relief', 'Timely response?': 'Yes', 'Consumer disputed?': None, 'Complaint ID': 8087806}"}

Create a KMS symmetric key

When uploading your training data to Amazon S3, you can use server-side encryption with AWS KMS. You can create KMS keys on the AWS Management Console, the AWS Command Line Interface (AWS CLI) and SDKs, or an AWS CloudFormation template. Complete the following steps to create a KMS key in the console:

On the AWS KMS console, choose Customer managed keys in the navigation pane.
Choose Create key.
Create a symmetric key. For instructions, see Create a KMS key.

Create an S3 bucket and configure encryption

Complete the following steps to create an S3 bucket and configure encryption:

On the Amazon S3 console, choose Buckets in the navigation pane.
Choose Create bucket.
For Bucket name, enter a unique name for your bucket.

For Encryption type¸ select Server-side encryption with AWS Key Management Service keys.
For AWS KMS key, select Choose from your AWS KMS keys and choose the key you created.

Complete the bucket creation with default settings or customize as needed.

Upload the training data

Complete the following steps to upload the training data:

On the Amazon S3 console, navigate to your bucket.
Create the folders fine-tuning-datasets and outputs and keep the bucket encryption settings as server-side encryption.
Choose Upload and upload your training data file.

Create a VPC

To create a VPC using Amazon Virtual Private Cloud (Amazon VPC), complete the following steps:

On the Amazon VPC console, choose Create VPC.
Create a VPC with private subnets in all Availability Zones.

Create an Amazon S3 VPC gateway endpoint

You can further secure your VPC by setting up an Amazon S3 VPC endpoint and using resource-based IAM policies to restrict access to the S3 bucket containing the model customization data.

Let’s create an Amazon S3 gateway endpoint and attach it to VPC with custom IAM resource-based policies to more tightly control access to your Amazon S3 files.

The following code is a sample resource policy. Use the name of the bucket you created earlier.

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "RestrictAccessToTrainingBucket",
			"Effect": "Allow",
			"Principal": "*",
			"Action": [
				"s3:GetObject",
				"s3:PutObject",
				"s3:ListBucket"
			],
			"Resource": [
				"arn:aws:s3:::$your-bucket",
				"arn:aws:s3:::$your-bucket/*"
			]
		}
	]
}

Create a security group for the AWS KMS VPC interface endpoint

A security group acts as a virtual firewall for your instance to control inbound and outbound traffic. This VPC endpoint security group only allows traffic originating from the security group attached to your VPC private subnets, adding a layer of protection. Complete the following steps to create the security group:

On the Amazon VPC console, choose Security groups in the navigation pane.
Choose Create security group.
For Security group name, enter a name (for example, bedrock-kms-interface-sg).
For Description, enter a description.
For VPC, choose your VPC.

Add an inbound rule to HTTPS traffic from the VPC CIDR block.

Create a security group for the Amazon Bedrock custom fine-tuning job

Now you can create a security group to establish rules for controlling Amazon Bedrock custom fine-tuning job access to the VPC resources. You use this security group later during model customization job creation. Complete the following steps:

On the Amazon VPC console, choose Security groups in the navigation pane.
Choose Create security group.
For Security group name, enter a name (for example, bedrock-fine-tuning-custom-job-sg).
For Description, enter a description.
For VPC, choose your VPC.

Add an inbound rule to allow traffic from the security group.

Create an AWS KMS VPC interface endpoint

Now you can create an interface VPC endpoint (PrivateLink) to establish a private connection between the VPC and AWS KMS.

For the security group, use the one you created in the previous step.

Attach a VPC endpoint policy that controls the access to resources through the VPC endpoint. The following code is a sample resource policy. Use the Amazon Resource Name (ARN) of the KMS key you created earlier.

{
	"Statement": [
		{
			"Sid": "AllowDecryptAndView",
			"Principal": {
				"AWS": "*"
			},
			"Effect": "Allow",
			"Action": [
				"kms:Decrypt",
				"kms:DescribeKey",
				"kms:ListAliases",
				"kms:ListKeys"
			],
			"Resource": "$Your-KMS-KEY-ARN"
		}
	]
}

Now you have successfully created the endpoints needed for private communication.

Create a service role for model customization

Let’s create a service role for model customization with the following permissions:

A trust relationship for Amazon Bedrock to assume and carry out the model customization job
Permissions to access your training and validation data in Amazon S3 and to write your output data to Amazon S3
If you encrypt any of the following resources with a KMS key, permissions to decrypt the key (see Encryption of model customization jobs and artifacts)
A model customization job or the resulting custom model
The training, validation, or output data for the model customization job
Permission to access the VPC

Let’s first create the required IAM policies:

On the IAM console, choose Policies in the navigation pane.
Choose Create policy.
Under Specify permissions¸ use the following JSON to provide access on S3 buckets, VPC, and KMS keys. Provide your account, bucket name, and VPC settings.

You can use the following IAM permissions policy as a template for VPC permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeVpcs",
                "ec2:DescribeDhcpOptions",
                "ec2:DescribeSubnets",
                "ec2:DescribeSecurityGroups"
            ],
            "Resource": "*"
        }, 
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateNetworkInterface",
            ],
            "Resource":[
               "arn:aws:ec2:${{region}}:${{account-id}}:network-interface/*"
            ],
            "Condition": {
               "StringEquals": { 
                   "aws:RequestTag/BedrockManaged": ["true"]
                },
                "ArnEquals": {
                   "aws:RequestTag/BedrockModelCustomizationJobArn": ["arn:aws:bedrock:${{region}}:${{account-id}}:model-customization-job/*"]
               }
            }
        }, 
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateNetworkInterface",
            ],
            "Resource":[
               "arn:aws:ec2:${{region}}:${{account-id}}:subnet/${{subnet-id}}",
               "arn:aws:ec2:${{region}}:${{account-id}}:subnet/${{subnet-id2}}",
               "arn:aws:ec2:${{region}}:${{account-id}}:security-group/security-group-id"
            ]
        }, 
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateNetworkInterfacePermission",
                "ec2:DeleteNetworkInterface",
                "ec2:DeleteNetworkInterfacePermission",
            ],
            "Resource": "*",
            "Condition": {
               "ArnEquals": {
                   "ec2:Subnet": [
                       "arn:aws:ec2:${{region}}:${{account-id}}:subnet/${{subnet-id}}",
                       "arn:aws:ec2:${{region}}:${{account-id}}:subnet/${{subnet-id2}}"
                   ],
                   "ec2:ResourceTag/BedrockModelCustomizationJobArn": ["arn:aws:bedrock:${{region}}:${{account-id}}:model-customization-job/*"]
               },
               "StringEquals": { 
                   "ec2:ResourceTag/BedrockManaged": "true"
               }
            }
        }, 
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateTags"
            ],
            "Resource": "arn:aws:ec2:${{region}}:${{account-id}}:network-interface/*",
            "Condition": {
                "StringEquals": {
                    "ec2:CreateAction": [
                        "CreateNetworkInterface"
                    ]    
                },
                "ForAllValues:StringEquals": {
                    "aws:TagKeys": [
                        "BedrockManaged",
                        "BedrockModelCustomizationJobArn"
                    ]
                }
            }
        }
    ]
}

You can use the following IAM permissions policy as a template for Amazon S3 permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::training-bucket",
                "arn:aws:s3:::training-bucket/*",
                "arn:aws:s3:::validation-bucket",
                "arn:aws:s3:::validation-bucket/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::output-bucket",
                "arn:aws:s3:::output-bucket/*"
            ]
        }
    ]
}

Now let’s create the IAM role.

On the IAM console, choose Roles in the navigation pane.
Choose Create roles.
Create a role with the following trust policy (provide your AWS account ID):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "bedrock.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "account-id"
                },
                "ArnEquals": {
                    "aws:SourceArn": "arn:aws:bedrock:us-west-2:account-id:model-customization-job/*"
                }
            }
        }
    ] 
}

Assign your custom VPC and S3 bucket access policies.

Give a name to your role and choose Create role.

Update the KMS key policy with the IAM role

In the KMS key you created in the previous steps, you need to update the key policy to include the ARN of the IAM role. The following code is a sample key policy:

{
    "Version": "2012-10-17",
    "Id": "key-consolepolicy-3",
    "Statement": [
        {
            "Sid": "BedrockFineTuneJobPermissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "$IAM Role ARN"
            },
            "Action": [
                "kms:Decrypt",
                "kms:GenerateDataKey",
                "kms:Encrypt",
                "kms:DescribeKey",
                "kms:CreateGrant",
                "kms:RevokeGrant"
            ],
            "Resource": "$ARN of the KMS key"
        }
     ]
}

For more details, refer to Encryption of model customization jobs and artifacts.

Initiate the fine-tuning job

Complete the following steps to set up your fine-tuning job:

On the Amazon Bedrock console, choose Custom models in the navigation pane.
In the Models section, choose Customize model and Create fine-tuning job.

Under Model details, choose Select model.
Choose Llama 3.1 8B Instruct as the base model and choose Apply.

For Fine-tuned model name, enter a name for your custom model.
Select Model encryption to add a KMS key and choose the KMS key you created earlier.
For Job name, enter a name for the training job.
Optionally, expand the Tags section to add tags for tracking.

Under VPC Settings, choose the VPC, subnets, and security group you created as part of previous steps.

When you specify the VPC subnets and security groups for a job, Amazon Bedrock creates elastic network interfaces (ENIs) that are associated with your security groups in one of the subnets. ENIs allow the Amazon Bedrock job to connect to resources in your VPC.

We recommend that you provide at least one subnet in each Availability Zone.

Under Input data, specify the S3 locations for your training and validation datasets.

Under Hyperparameters, set the values for Epochs, Batch size, Learning rate, and Learning rate warm up steps for your fine-tuning job.

Refer to Custom model hyperparameters for additional details.

Under Output data, for S3 location, enter the S3 path for the bucket storing fine-tuning metrics.
Under Service access, select a method to authorize Amazon Bedrock. You can select Use an existing service role and use the role you created earlier.
Choose Create Fine-tuning job.

Monitor the job

On the Amazon Bedrock console, choose Custom models in the navigation pane and locate your job.

You can monitor the job on the job details page.

Purchase provisioned throughput

After fine-tuning is complete (as shown in the following screenshot), you can use the custom model for inference. However, before you can use a customized model, you need to purchase provisioned throughput for it.

Complete the following steps:

On the Amazon Bedrock console, under Foundation models in the navigation pane, choose Custom models.
On the Models tab, select your model and choose Purchase provisioned throughput.

For Provisioned throughput name, enter a name.
Under Select model, make sure the model is the same as the custom model you selected earlier.
Under Commitment term & model units, configure your commitment term and model units. Refer to Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock for additional insights. For this post, we choose No commitment and use 1 model unit.

Under Estimated purchase summary, review the estimated cost and choose Purchase provisioned throughput.

After the provisioned throughput is in service, you can use the model for inference.

Use the model

Now you’re ready to use your model for inference.

On the Amazon Bedrock console, under Playgrounds in the navigation pane, choose Chat/text.
Choose Select model.
For Category, choose Custom models under Custom & self-hosted models.
For Model, choose the model you just trained.
For Throughput, choose the provisioned throughput you just purchased.
Choose Apply.

Now you can ask sample questions, as shown in the following screenshot.

Implementing these procedures allows you to follow security best practices when you deploy and use your fine-tuned model within Amazon Bedrock for inference tasks.

When developing a generative AI application that requires access to this fine-tuned model, you have the option to configure it within a VPC. By employing a VPC interface endpoint, you can make sure communication between your VPC and the Amazon Bedrock API endpoint occurs through a PrivateLink connection, rather than through the public internet.

This approach further enhances security and privacy. For more information on this setup, refer to Use interface VPC endpoints (AWS PrivateLink) to create a private connection between your VPC and Amazon Bedrock.

Clean up

Delete the following AWS resources created for this demonstration to avoid incurring future charges:

Amazon Bedrock model provisioned throughput
VPC endpoints
VPC and associated security groups
KMS key
IAM roles and policies
S3 bucket and objects

Conclusion

In this post, we implemented secure fine-tuning jobs in Amazon Bedrock, which is crucial for protecting sensitive data and maintaining the integrity of your AI models.

By following the best practices outlined in this post, including proper IAM role configuration, encryption at rest and in transit, and network isolation, you can significantly enhance the security posture of your fine-tuning processes.

By prioritizing security in your Amazon Bedrock workflows, you not only safeguard your data and models, but also build trust with your stakeholders and end-users, enabling responsible and secure AI development.

As a next step, try the solution out in your account and share your feedback.

About the Authors

Vishal Naik is a Sr. Solutions Architect at Amazon Web Services (AWS). He is a builder who enjoys helping customers accomplish their business needs and solve complex challenges with AWS solutions and best practices. His core area of focus includes Generative AI and Machine Learning. In his spare time, Vishal loves making short films on time travel and alternate universe themes.

Sumeet Tripathi is an Enterprise Support Lead (TAM) at AWS in North Carolina. He has over 17 years of experience in technology across various roles. He is passionate about helping customers to reduce operational challenges and friction. His focus area is AI/ML and Energy & Utilities Segment. Outside work, He enjoys traveling with family, watching cricket and movies.

Secure a generative AI assistant with OWASP Top 10 mitigation

January 24, 2025

by Syed Jaffry Amazon AWS

A common use case with generative AI that we usually see customers evaluate for a production use case is a generative AI-powered assistant. However, before it can be deployed, there is the typical production readiness assessment that includes concerns such as understanding the security posture, monitoring and logging, cost tracking, resilience, and more. The highest priority of these production readiness assessments is usually security. If there are security risks that can’t be clearly identified, then they can’t be addressed, and that can halt the production deployment of the generative AI application.

In this post, we show you an example of a generative AI assistant application and demonstrate how to assess its security posture using the OWASP Top 10 for Large Language Model Applications, as well as how to apply mitigations for common threats.

Generative AI scoping framework

Start by understanding where your generative AI application fits within the spectrum of managed vs. custom. Use the AWS generative AI scoping framework to understand the specific mix of the shared responsibility for the security controls applicable to your application. For example, Scope 1 “Consumer Apps” like PartyRock or ChatGPT are usually publicly facing applications, where most of the application internal security is owned and controlled by the provider, and your responsibility for security is on the consumption side. Contrast that with Scope 4/5 applications, where not only do you build and secure the generative AI application yourself, but you are also responsible for fine-tuning and training the underlying large language model (LLM). The security controls in scope for Scope 4/5 applications will range more broadly from the frontend to LLM model security. This post will focus on the Scope 3 generative AI assistant application, which is one of the more frequent use cases seen in the field.

The following figure of the AWS Generative AI Security Scoping Matrix summarizes the types of models for each scope.

OWASP Top 10 for LLMs

Using the OWASP Top 10 for understanding threats and mitigations to an application is one of the most common ways application security is assessed. The OWASP Top 10 for LLMs takes a tried and tested framework and applies it to generative AI applications to help us discover, understand, and mitigate the novel threats for generative AI.

Solution overview

Let’s start with a logical architecture of a typical generative AI assistant application overlying the OWASP Top 10 for LLM threats, as illustrated in the following diagram.

In this architecture, the end-user request usually goes through the following components:

Authentication layer – This layer validates that the user connecting to the application is who they say they are. This is typically done through some sort of an identity provider (IdP) capability like Okta, AWS IAM Identity Center, or Amazon Cognito.
Application controller – This layer contains most of the application business logic and determines how to process the incoming user request by generating the LLM prompts and processing LLM responses before they are sent back to the user.
LLM and LLM agent – The LLM provides the core generative AI capability to the assistant. The LLM agent is an orchestrator of a set of steps that might be necessary to complete the desired request. These steps might involve both the use of an LLM and external data sources and APIs.
Agent plugin controller – This component is responsible for the API integration to external data sources and APIs. This component also holds the mapping between the logical name of an external component, which the LLM agent might refer to, and the physical name.
RAG data store – The Retrieval Augmented Generation (RAG) data store delivers up-to-date, precise, and access-controlled knowledge from various data sources such as data warehouses, databases, and other software as a service (SaaS) applications through data connectors.

The OWASP Top 10 for LLM risks map to various layers of the application stack, highlighting vulnerabilities from UIs to backend systems. In the following sections, we discuss risks at each layer and provide an application design pattern for a generative AI assistant application in AWS that mitigates these risks.

The following diagram illustrates the assistant architecture on AWS.

Authentication layer (Amazon Cognito)

Common security threats such as brute force attacks, session hijacking, and denial of service (DoS) attacks can occur. To mitigate these risks, implement best practices like multi-factor authentication (MFA), rate limiting, secure session management, automatic session timeouts, and regular token rotation. Additionally, deploying edge security measures such as AWS WAF and distributed denial of service (DDoS) mitigation helps block common web exploits and maintain service availability during attacks.

In the preceding architecture diagram, AWS WAF is integrated with Amazon API Gateway to filter incoming traffic, blocking unintended requests and protecting applications from threats like SQL injection, cross-site scripting (XSS), and DoS attacks. AWS WAF Bot Control further enhances security by providing visibility and control over bot traffic, allowing administrators to block or rate-limit unwanted bots. This feature can be centrally managed across multiple accounts using AWS Firewall Manager, providing a consistent and robust approach to application protection.

Amazon Cognito complements these defenses by enabling user authentication and data synchronization. It supports both user pools and identity pools, enabling seamless management of user identities across devices and integration with third-party identity providers. Amazon Cognito offers security features, including MFA, OAuth 2.0, OpenID Connect, secure session management, and risk-based adaptive authentication, to help protect against unauthorized access by evaluating sign-in requests for suspicious activity and responding with additional security measures like MFA or blocking sign-ins. Amazon Cognito also enforces password reuse prevention, further protecting against compromised credentials.

AWS Shield Advanced adds an extra layer of defense by providing enhanced protection against sophisticated DDoS attacks. Integrated with AWS WAF, Shield Advanced delivers comprehensive perimeter protection, using tailored detection and health-based assessments to enhance response to attacks. It also offers round-the-clock support from the AWS Shield Response Team and includes DDoS cost protection, making applications remain secure and cost-effective. Together, Shield Advanced and AWS WAF create a security framework that protects applications against a wide range of threats while maintaining availability.

This comprehensive security setup addresses LLM10:2025 Unbound Consumption and LLM02:2025 Sensitive Information Disclosure, making sure that applications remain both resilient and secure.

Application controller layer (LLM orchestrator Lambda function)

The application controller layer is usually vulnerable to risks such as LLM01:2025 Prompt Injection, LLM05:2025 Improper Output Handling, and LLM 02:2025 Sensitive Information Disclosure. Outside parties might frequently attempt to exploit this layer by crafting unintended inputs to manipulate the LLM, potentially causing it to reveal sensitive information or compromise downstream systems.

In the physical architecture diagram, the application controller is the LLM orchestrator AWS Lambda function. It performs strict input validation by extracting the event payload from API Gateway and conducting both syntactic and semantic validation. By sanitizing inputs, applying allowlisting and deny listing of keywords, and validating inputs against predefined formats or patterns, the Lambda function helps prevent LLM01:2025 Prompt Injection attacks. Additionally, by passing the user_id downstream, it enables the downstream application components to mitigate the risk of sensitive information disclosure, addressing concerns related to LLM02:2025 Sensitive Information Disclosure.

Amazon Bedrock Guardrails provides an additional layer of protection by filtering and blocking sensitive content, such as personally identifiable information (PII) and other custom sensitive data defined by regex patterns. Guardrails can also be configured to detect and block offensive language, competitor names, or other undesirable terms, making sure that both inputs and outputs are safe. You can also use guardrails to prevent LLM01:2025 Prompt Injection attacks by detecting and filtering out harmful or manipulative prompts before they reach the LLM, thereby maintaining the integrity of the prompt.

Another critical aspect of security is managing LLM outputs. Because the LLM might generate content that includes executable code, such as JavaScript or Markdown, there is a risk of XSS attacks if this content is not properly handled. To mitigate this risk, apply output encoding techniques, such as HTML entity encoding or JavaScript escaping, to neutralize any potentially harmful content before it is presented to users. This approach addresses the risk of LLM05:2025 Improper Output Handling.

Implementing Amazon Bedrock prompt management and versioning allows for continuous improvement of the user experience while maintaining the overall security of the application. By carefully managing changes to prompts and their handling, you can enhance functionality without introducing new vulnerabilities and mitigating LLM01:2025 Prompt Injection attacks.

Treating the LLM as an untrusted user and applying human-in-the-loop processes over certain actions are strategies to lower the likelihood of unauthorized or unintended operations.

LLM and LLM agent layer (Amazon Bedrock LLMs)

The LLM and LLM agent layer frequently handles interactions with the LLM and faces risks such as LLM10: Unbounded Consumption, LLM05:2025 Improper Output Handling, and LLM02:2025 Sensitive Information Disclosure.

DoS attacks can overwhelm the LLM with multiple resource-intensive requests, degrading overall service quality while increasing costs. When interacting with Amazon Bedrock hosted LLMs, setting request parameters such as the maximum length of the input request will minimize the risk of LLM resource exhaustion. Additionally, there is a hard limit on the maximum number of queued actions and total actions an Amazon Bedrock agent can take to fulfill a customer’s intent, which limits the number of actions in a system reacting to LLM responses, avoiding unnecessary loops or intensive tasks that could exhaust the LLM’s resources.

Improper output handling leads to vulnerabilities such as remote code execution, cross-site scripting, server-side request forgery (SSRF), and privilege escalation. The inadequate validation and management of the LLM-generated outputs before they are sent downstream can grant indirect access to additional functionality, effectively enabling these vulnerabilities. To mitigate this risk, treat the model as any other user and apply validation of the LLM-generated responses. The process is facilitated with Amazon Bedrock Guardrails using filters such as content filters with configurable thresholds to filter harmful content and safeguard against prompt attacks before they are processed further downstream by other backend systems. Guardrails automatically evaluate both user input and model responses to detect and help prevent content that falls into restricted categories.

Amazon Bedrock Agents execute multi-step tasks and securely integrate with AWS native and third-party services to reduce the risk of insecure output handling, excessive agency, and sensitive information disclosure. In the architecture diagram, the action group Lambda function under the agents is used to encode all the output text, making it automatically non-executable by JavaScript or Markdown. Additionally, the action group Lambda function parses each output from the LLM at every step executed by the agents and controls the processing of the outputs accordingly, making sure they are safe before further processing.

Sensitive information disclosure is a risk with LLMs because malicious prompt engineering can cause LLMs to accidentally reveal unintended details in their responses. This can lead to privacy and confidentiality violations. To mitigate the issue, implement data sanitization practices through content filters in Amazon Bedrock Guardrails.

Additionally, implement custom data filtering policies based on user_id and strict user access policies. Amazon Bedrock Guardrails helps filter content deemed sensitive, and Amazon Bedrock Agents further reduces the risk of sensitive information disclosure by allowing you to implement custom logic in the preprocessing and postprocessing templates to strip any unexpected information. If you have enabled model invocation logging for the LLM or implemented custom logging logic in your application to record the input and output of the LLM in Amazon CloudWatch, measures such as CloudWatch Log data protection are important in masking sensitive information identified in the CloudWatch logs, further mitigating the risk of sensitive information disclosure.

Agent plugin controller layer (action group Lambda function)

The agent plugin controller frequently integrates with internal and external services and applies custom authorization to internal and external data sources and third-party APIs. At this layer, the risk of LLM08:2025 Vector & Embedding Weaknesses and LLM06:2025 Excessive Agency are in effect. Untrusted or unverified third-party plugins could introduce backdoors or vulnerabilities in the form of unexpected code.

Apply least privilege access to the AWS Identity and Access Management (IAM) roles of the action group Lambda function, which interacts with plugin integrations to external systems to help mitigate the risk of LLM06:2025 Excessive Agency and LLM08:2025 Vector & Embedding Weaknesses. This is demonstrated in the physical architecture diagram; the agent plugin layer Lambda function is associated with a least privilege IAM role for secure access and interface with other internal AWS services.

Additionally, after the user identity is determined, restrict the data plane by applying user-level access control by passing the user_id to downstream layers like the agent plugin layer. Although this user_id parameter can be used in the agent plugin controller Lambda function for custom authorization logic, its primary purpose is to enable fine-grained access control for third-party plugins. The responsibility lies with the application owner to implement custom authorization logic within the action group Lambda function, where the user_id parameter can be used in combination with predefined rules to apply the appropriate level of access to third-party APIs and plugins. This approach wraps deterministic access controls around a non-deterministic LLM and enables granular access control over which users can access and execute specific third-party plugins.

Combining user_id-based authorization on data and IAM roles with least privilege on the action group Lambda function will generally minimize the risk of LLM08:2025 Vector & Embedding Weaknesses and LLM06:2025 Excessive Agency.

RAG data store layer

The RAG data store is responsible for securely retrieving up-to-date, precise, and user access-controlled knowledge from various first-party and third-party data sources. By default, Amazon Bedrock encrypts all knowledge base-related data using an AWS managed key. Alternatively, you can choose to use a customer managed key. When setting up a data ingestion job for your knowledge base, you can also encrypt the job using a custom AWS Key Management Service (AWS KMS) key.

If you decide to use the vector store in Amazon OpenSearch Service for your knowledge base, Amazon Bedrock can pass a KMS key of your choice to it for encryption. Additionally, you can encrypt the sessions in which you generate responses from querying a knowledge base with a KMS key. To facilitate secure communication, Amazon Bedrock Knowledge Bases uses TLS encryption when interacting with third-party vector stores, provided that the service supports and permits TLS encryption in transit.

Regarding user access control, Amazon Bedrock Knowledge Bases uses filters to manage permissions. You can build a segmented access solution on top of a knowledge base using metadata and filtering feature. During runtime, your application must authenticate and authorize the user, and include this user information in the query to maintain accurate access controls. To keep the access controls updated, you should periodically resync the data to reflect any changes in permissions. Additionally, groups can be stored as a filterable attribute, further refining access control.

This approach helps mitigate the risk of LLM02:2025 Sensitive Information Disclosure and LLM08:2025 Vector & Embedding Weaknesses, to assist in that only authorized users can access the relevant data.

Summary

In this post, we discussed how to classify your generative AI application from a security shared responsibility perspective using the AWS Generative AI Security Scoping Matrix. We reviewed a common generative AI assistant application architecture and assessed its security posture using the OWASP Top 10 for LLMs framework, and showed how to apply the OWASP Top 10 for LLMs threat mitigations using AWS service controls and services to strengthen the architecture of your generative AI assistant application. Learn more about building generative AI applications with AWS Workshops for Bedrock.

About the Authors

Syed Jaffry is a Principal Solutions Architect with AWS. He advises software companies on AI and helps them build modern, robust and secure application architectures on AWS.

Amit Kumar Agrawal is a Senior Solutions Architect at AWS where he has spent over 5 years working with large ISV customers. He helps organizations build and operate cost-efficient and scalable solutions in the cloud, driving their business and technical outcomes.

Tej Nagabhatla is a Senior Solutions Architect at AWS, where he works with a diverse portfolio of clients ranging from ISVs to large enterprises. He specializes in providing architectural guidance across a wide range of topics around AI/ML, security, storage, containers, and serverless technologies. He helps organizations build and operate cost-efficient, scalable cloud applications. In his free time, Tej enjoys music, playing basketball, and traveling.

Streamline custom environment provisioning for Amazon SageMaker Studio: An automated CI/CD pipeline approach

January 23, 2025

by Muni Annachi Amazon AWS

Attaching a custom Docker image to an Amazon SageMaker Studio domain involves several steps. First, you need to build and push the image to Amazon Elastic Container Registry (Amazon ECR). You also need to make sure that the Amazon SageMaker domain execution role has the necessary permissions to pull the image from Amazon ECR. After the image is pushed to Amazon ECR, you create a SageMaker custom image on the AWS Management Console. Lastly, you update the SageMaker domain configuration to specify the custom image Amazon Resource Name (ARN). This multi-step process needs to be followed manually every time end-users create new custom Docker images to make them available in SageMaker Studio.

In this post, we explain how to automate this process. This approach allows you to update the SageMaker configuration without writing additional infrastructure code, provision custom images, and attach them to SageMaker domains. By adopting this automation, you can deploy consistent and standardized analytics environments across your organization, leading to increased team productivity and mitigating security risks associated with using one-time images.

The solution described in this post is geared towards machine learning (ML) engineers and platform teams who are often responsible for managing and standardizing custom environments at scale across an organization. For individual data scientists seeking a self-service experience, we recommend that you use the native Docker support in SageMaker Studio, as described in Accelerate ML workflows with Amazon SageMaker Studio Local Mode and Docker support. This feature allows data scientists to build, test, and deploy custom Docker containers directly within the SageMaker Studio integrated development environment (IDE), enabling you to iteratively experiment with your analytics environments seamlessly within the familiar SageMaker Studio interface.

Solution overview

The following diagram illustrates the solution architecture.

We deploy a pipeline using AWS CodePipeline, which automates a custom Docker image creation and attachment of the image to a SageMaker domain. The pipeline first checks out the code base from the GitHub repo and creates custom Docker images based on the configuration declared in the config files. After successfully creating and pushing Docker images to Amazon ECR, the pipeline validates the image by scanning and checking for security vulnerabilities in the image. If no critical or high-security vulnerabilities are found, the pipeline continues to the manual approval stage before deployment. After manual approval is complete, the pipeline deploys the SageMaker domain and attaches custom images to the domain automatically.

Prerequisites

The prerequisites for implementing the solution described in this post include:

An AWS account and access to the account and access to deploy AWS CloudFormation templates
js and the npm command line interface
The AWS Cloud Development Kit (AWS CDK) installed
Version 2 of the AWS Command Line Interface (AWS CLI) installed
The Git command line interface installed on your computer for cloning the repository
The complete AWS CDK bootstrapping in your AWS account

Deploy the solution

Complete the following steps to implement the solution:

Log in to your AWS account using the AWS CLI in a shell terminal (for more details, see Authenticating with short-term credentials for the AWS CLI).
Run the following command to make sure you have successfully logged in to your AWS account:

aws sts get-caller-identity

Fork the the GitHub repo to your GitHub account .
Clone the forked repo to your local workstation using the following command:

git clone <clone_url_of_forked_repo>

Log in to the console and create an AWS CodeStar connection to the GitHub repo in the previous step. For instructions, see Create a connection to GitHub (console).
Copy the ARN for the connection you created.
Go to the terminal and run the following command to cd into the repository directory:

cd streamline-sagemaker-custom-images-cicd

Run the following command to install all libraries from npm:

npm install

Run the following commands to run a shell script in the terminal. This script will take your AWS account number and AWS Region as input parameters and deploy an AWS CDK stack, which deploys components such as CodePipeline, AWS CodeBuild, the ECR repository, and so on. Use an existing VPC to setup VPC_ID export variable below. If you don’t have a VPC, create one with at least two subnets and use it.

export AWS_ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION=<YOUR_AWS_REGION>
export VPC_ID=<VPC_ID_TO_DEPLOY>
export CODESTAR_CONNECTION_ARN=<CODE_STAR_CONNECTION_ARN_CREATED_IN_ABOVE_STEP>
export REPOSITORY_OWNER=<YOUR_GITHUB_LOGIN_ID>

Run the following command to deploy the AWS infrastructure using the AWS CDK V2 and make sure to wait for the template to succeed:

cdk deploy PipelineStack --require-approval never

On the CodePipeline console, choose Pipelines in the navigation pane.
Choose the link for the pipeline named sagemaker-custom-image-pipeline.

You can follow the progress of the pipeline on the console and provide approval in the manual approval stage to deploy the SageMaker infrastructure. Pipeline takes approximately 5-8 min to build image and move to manual approval stage
Wait for the pipeline to complete the deployment stage.

The pipeline creates infrastructure resources in your AWS account with a SageMaker domain and a SageMaker custom image. It also attaches the custom image to the SageMaker domain.

On the SageMaker console, choose Domains under Admin configurations in the navigation pane.

Open the domain named team-ds, and navigate to the Environment

You should be able to see one custom image that is attached.

How custom images are deployed and attached

CodePipeline has a stage called BuildCustomImages that contains the automated steps to create a SageMaker custom image using the SageMaker Custom Image CLI and push it to the ECR repository created in the AWS account. The AWS CDK stack at the deployment stage has the required steps to create a SageMaker domain and attach a custom image to the domain. The parameters to create the SageMaker domain, custom image, and so on are configured in JSON format and used in the SageMaker stack under the lib directory. Refer to the sagemakerConfig section in environments/config.json for declarative parameters.

Add more custom images

Now you can add your own custom Docker image to attach to the SageMaker domain created by the pipeline. For the custom images being created, refer to Dockerfile specifications for the Docker image specifications.

cd into the images directory in the repository in the terminal:

cd images

Create a new directory (for example, custom) under the images directory:

mkdir custom

Add your own Dockerfile to this directory. For testing, you can use the following Dockerfile config:

FROM public.ecr.aws/amazonlinux/amazonlinux:2
ARG NB_USER="sagemaker-user"
ARG NB_UID="1000"
ARG NB_GID="100"
RUN yum update -y && 
    yum install python3 python3-pip shadow-utils -y && 
    yum clean all
RUN yum install --assumeyes python3 shadow-utils && 
    useradd --create-home --shell /bin/bash --gid "${NB_GID}" --uid ${NB_UID} ${NB_USER} && 
    yum clean all && 
    python3 -m pip install jupyterlab
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install --upgrade urllib3==1.26.6
USER ${NB_UID}
CMD jupyter lab --ip 0.0.0.0 --port 8888 
--ServerApp.base_url="/jupyterlab/default" 
--ServerApp.token='' 
--ServerApp.allow_origin='*'

Update the images section in the json file under the environments directory to add the new image directory name you have created:

"images": [
      "repositoryName": "research-platform-ecr",
       "tags":[
         "jlab",
         "custom" << Add here
       ]
      }
    ]

Update the same image name in customImages under the created SageMaker domain configuration:

"customImages":[
          "jlab",
          "custom" << Add here
 ],

Commit and push changes to the GitHub repository.
You should see CodePipeline is triggered upon push. Follow the progress of the pipeline and provide manual approval for deployment.

After deployment is completed successfully, you should be able to see that the custom image you have added is attached to the domain configuration (as shown in the following screenshot).

Clean up

To clean up your resources, open the AWS CloudFormation console and delete the stacks SagemakerImageStack and PipelineStack in that order. If you encounter errors such as “S3 Bucket is not empty” or “ECR Repository has images,” you can manually delete the S3 bucket and ECR repository that was created. Then you can retry deleting the CloudFormation stacks.

Conclusion

In this post, we showed how to create an automated continuous integration and delivery (CI/CD) pipeline solution to build, scan, and deploy custom Docker images to SageMaker Studio domains. You can use this solution to promote consistency of the analytical environments for data science teams across your enterprise. This approach helps you achieve machine learning (ML) governance, scalability, and standardization.

About the Authors

Muni Annachi, a Senior DevOps Consultant at AWS, boasts over a decade of expertise in architecting and implementing software systems and cloud platforms. He specializes in guiding non-profit organizations to adopt DevOps CI/CD architectures, adhering to AWS best practices and the AWS Well-Architected Framework. Beyond his professional endeavors, Muni is an avid sports enthusiast and tries his luck in the kitchen.

Ajay Raghunathan is a Machine Learning Engineer at AWS. His current work focuses on architecting and implementing ML solutions at scale. He is a technology enthusiast and a builder with a core area of interest in AI/ML, data analytics, serverless, and DevOps. Outside of work, he enjoys spending time with family, traveling, and playing football.

Arun Dyasani is a Senior Cloud Application Architect at AWS. His current work focuses on designing and implementing innovative software solutions. His role centers on crafting robust architectures for complex applications, leveraging his deep knowledge and experience in developing large-scale systems.

Shweta Singh is a Senior Product Manager in the Amazon SageMaker Machine Learning platform team at AWS, leading the SageMaker Python SDK. She has worked in several product roles in Amazon for over 5 years. She has a Bachelor of Science degree in Computer Engineering and a Masters of Science in Financial Engineering, both from New York University.

Jenna Eun is a Principal Practice Manager for the Health and Advanced Compute team at AWS Professional Services. Her team focuses on designing and delivering data, ML, and advanced computing solutions for the public sector, including federal, state and local governments, academic medical centers, nonprofit healthcare organizations, and research institutions.

Meenakshi Ponn Shankaran is a Principal Domain Architect at AWS in the Data & ML Professional Services Org. He has extensive expertise in designing and building large-scale data lakes, handling petabytes of data. Currently, he focuses on delivering technical leadership to AWS US Public Sector clients, guiding them in using innovative AWS services to meet their strategic objectives and unlock the full potential of their data.

Enhance your customer’s omnichannel experience with Amazon Bedrock and Amazon Lex

January 23, 2025

by Michael Cho Amazon AWS

The rise of AI has opened new avenues for enhancing customer experiences across multiple channels. Technologies like natural language understanding (NLU) are employed to discern customer intents, facilitating efficient self-service actions. Automatic speech recognition (ASR) translates spoken words into text, enabling seamless voice interactions. With Amazon Lex bots, businesses can use conversational AI to integrate these capabilities into their call centers. Amazon Lex uses ASR and NLU to comprehend customer needs, guiding them through their journey. These AI technologies have significantly reduced agent handle times, increased Net Promoter Scores (NPS), and streamlined self-service tasks, such as appointment scheduling.

The advent of generative AI further expands the potential to enhance omnichannel customer experiences. However, concerns about security, compliance, and AI hallucinations often deter businesses from directly exposing customers to large language models (LLMs) through their omnichannel solutions. This is where the integration of Amazon Lex and Amazon Bedrock becomes invaluable. In this setup, Amazon Lex serves as the initial touchpoint, managing intent classification, slot collection, and fulfillment. Meanwhile, Amazon Bedrock acts as a secondary validation layer, intervening when Amazon Lex encounters uncertainties in understanding customer inputs.

In this post, we demonstrate how to integrate LLMs into your omnichannel experience using Amazon Lex and Amazon Bedrock.

Enhancing customer interactions with LLMs

The following are three scenarios illustrating how LLMs can enhance customer interactions:

Intent classification – These scenarios occur when a customer clearly articulates their intent, but the lack of utterance training data results in poor performance by traditional models. For example, a customer might call in and say, “My basement is flooded, there is at least a foot of water, and I have no idea what to do.” Traditional NLU models might lack the training data to handle this out-of-band response, because they’re typically trained on sample utterances like “I need to make a claim,” “I have a flood claim,” or “Open claim,” which are mapped to a hypothetical StartClaim intent. However, an LLM, when provided with the context of each intent including a description and sample utterances, can accurately determine that the customer is dealing with a flooded basement and is seeking to start a claim.
Assisted slot resolution (built-in) and custom slot assistance (custom) – These scenarios occur when a customer says an out-of-band response to a slot collection. For select built-in slot types such as AMAZON.Date, AMAZON.Country, and AMAZON.Confirmation, Amazon Lex currently has a built-in capability to handle slot resolution for select built-in slot types. For custom slot types, you would need to implement custom logic using AWS Lambda for slot resolution and additional validation. This solution handles custom slot resolution by using LLMs to clarify and map these inputs to the correct slots. For example, interpreting “Toyota Tundra” as “truck” or “the whole dang top of my house is gone” as “roof.” This allows you to integrate generative AI to validate both your pre-built slots and your custom slots.
Background noise mitigation – Many customers can’t control the background noise when calling into a call center. This noise might include a loud TV, a sidebar conversation, or non-human sounds being transcribed as voice (for example, a car passing by and is transcribed as “uhhh”). In such cases, the NLU model, depending on its training data, might misclassify the caller’s intent or require the caller to repeat themselves. However, with an LLM, you can provide the transcript with appropriate context to distinguish the noise from the customer’s actual statement. For example, if a TV show is playing in the background and the customer says “my car” when asked about their policy, the transcription might read “Tune in this evening for my car.” The LLM can ignore the irrelevant portion of the transcription and focus on the relevant part, “my car,” to accurately understand the customer’s intent.

As demonstrated in these scenarios, the LLM is not controlling the conversation. Instead, it operates within the boundaries defined by intents, intent descriptions, slots, sample slots, and utterances from Amazon Lex. This approach helps guide the customer along the correct path, reducing the risks of hallucination and manipulation of the customer-facing application. Furthermore, this approach reduces cost, because NLU is used when possible, and the LLM acts as a secondary check before re-prompting the customer.

You can further enhance this AI-driven experience by integrating it with your contact center solution, such as Amazon Connect. By combining the capabilities of Amazon Lex, Amazon Bedrock, and Amazon Connect, you can deliver a seamless and intelligent customer experience across your channels.

When customers reach out, whether through voice or chat, this integrated solution provides a powerful, AI-driven interaction:

Amazon Connect manages the initial customer contact, handling call routing and channel selection.
Amazon Lex processes the customer’s input, using NLU to identify intent and extract relevant information.
In cases where Amazon Lex might not fully understand the customer’s intent or when a more nuanced interpretation is needed, advanced language models in Amazon Bedrock can be invoked to provide deeper analysis and understanding.
The combined insights from Amazon Lex and Amazon Bedrock guide the conversation flow in Amazon Connect, determining whether to provide automated responses, request more information, or route the customer to a human agent.

Solution overview

In this solution, Amazon Lex will connect to Amazon Bedrock through Lambda, and invoke an LLM of your choice on Amazon Bedrock when assistance in intent classification and slot resolution is needed throughout the conversation. For instance, if an ElicitIntent call defaults to the FallbackIntent, the Lambda function runs to have Amazon Bedrock determine if the user potentially used out-of-band phrases that should be properly mapped. Additionally, we can augment the prompts sent to the model for intent classification and slot resolution with business context to yield more accurate results. Example prompts for intent classification and slot resolution is available in the GitHub repo.

The following diagram illustrates the solution architecture:

The workflow consists of the following steps:

Messages are sent to the Amazon Lex omnichannel using Amazon Connect (text and voice), messaging apps (text), and third-party contact centers (text and voice). Amazon Lex NLU maps user utterances to specific intents.
The Lambda function is invoked at certain phases of the conversation where Amazon Lex NLU didn’t identify the user utterance, such as during the fallback intent or during slot fulfillment.
Lambda calls foundation models (FMs) selected from an AWS CloudFormation template through Amazon Bedrock to identify the intent, identify the slot, or determine if the transcribed messages contain background noise.
Amazon Bedrock returns the identified intent or slot, or responds that it is unable to classify the utterance as a related intent or slot.
Lambda sets the state of Amazon Lex to either move forward in the selected intent or re-prompt the user for input.
Amazon Lex continues the conversation by either re-prompting the user or continuing to fulfill the intent.

Prerequisites

You should have the following prerequisites:

An AWS account with privileges to create AWS Identity and Access Management (IAM) roles and policies. For more information, see Overview of access management: Permissions and policies.
Familiarity with AWS services such as Amazon Lex, AWS Lambda, and Amazon Bedrock.
Access enabled for FMs on Amazon Bedrock. For instructions, see Access Amazon Bedrock foundation models. In this solution, you have a choice to use Anthropic’s Claude 3 Haiku, Anthropic’s Claude 3.5 Haiku, or Anthropic’s Claude 3.5 Sonnet.

Deploy the omnichannel Amazon Lex bot

To deploy this solution, complete the following steps:

Choose Launch Stack to launch a CloudFormation stack in us-east-1:
For Stack name, enter a name for your stack. This post uses the name FNOLBot.
In the Parameters section, select the model you want to use.
Review the IAM resource creation and choose Create stack.

After a few minutes, your stack should be complete. The core resources are as follows:

Amazon Lex bot – FNOLBot
Lambda function – ai-assist-lambda-{Stack-Name}
IAM roles – {Stack-Name}-AIAssistLambdaRole, and {Stack-Name}-BotRuntimeRole

Test the omnichannel bot

To test the bot, navigate to FNOLBot on the Amazon Lex console and open a test window. For more details, see Testing a bot using the console.

Intent classification

Let’s test how, instead of saying “I would like to make a claim,” the customer can ask more complex questions:

In the test window, enter in “My neighbor’s tree fell on my garage. What steps should I take with my insurance company?”
Choose Inspect.

In the response, the intent has been identified as GatherFNOLInfo.

Background noise mitigation with intent classification

Let’s simulate making a request with background noise:

Refresh the bot by choosing the refresh icon.
In the test window, enter “Hi yes I’m calling about yeah yeah one minute um um I need to make a claim.”
Choose Inspect.

In the response, the intent has been identified as GatherFNOLInfo.

Slot assistance

Let’s test how instead of saying explicit slot values, we can use generative AI to help fill the slot:

Refresh the bot by choosing the refresh icon.
Enter “I need to make a claim.”

The Amazon Lex bot will then ask “What portion of the home was damaged?”

Enter “the whole dang top of my house was gone.”

The bot will then ask “Please describe any injuries that occurred during the incident.”

Enter “I got a pretty bad cut from the shingles.”
Choose Inspect.

You will notice that the Damage slot has been filled with “roof” and the PersonalInjury slot has been filled with “laceration.”

Background noise mitigation with slot assistance

We now simulate how Amazon Lex uses ASR transcribing background noise. The first scenario is a conversation where the user is having a conversation with others while talking to the Amazon Lex bot. In the second scenario, a TV on in the background is so loud that it gets transcribed by ASR.

Refresh the bot by choosing the refresh icon.
Enter “I need to make a claim.”

The Amazon Lex bot will then ask “What portion of the home was damaged?”

Enter “yeah i really need that soon um the roof was damaged.”

The bot will then ask “Please describe any injuries that occurred during the incident.”

Enter “tonight on the nightly news reporters are on the scene um i got a pretty bad cut.”
Choose Inspect.

You will notice that the Damage slot has been filled with “roof” and the PersonalInjury slot has been filled with “laceration.”

Clean up

To avoid incurring additional charges, delete the CloudFormation stacks you deployed.

Conclusion

In this post, we showed you how to set up Amazon Lex for an omnichannel chatbot experience and Amazon Bedrock to be your secondary validation layer. This allows your customers to potentially provide out-of-band responses both at the intent and slot collection levels without having to be re-prompted, allowing for a seamless customer experience. As we demonstrated, whether the user comes in and provides a robust description of their intent and slot or if they use phrases that are outside of the Amazon Lex NLU training data, the LLM is able to correctly identify the correct intent and slot.

If you have an existing Amazon Lex bot deployed, you can edit the Lambda code to further enhance the bot. Try out the solution from CloudFormation stack or code in the GitHub repo and let us know if you have any questions in the comments.

About the Authors

Michael Cho is a Solutions Architect at AWS, where he works with customers to accelerate their mission on the cloud. He is passionate about architecting and building innovative solutions that empower customers. Lately, he has been dedicating his time to experimenting with Generative AI for solving complex business problems.

Joe Morotti is a Solutions Architect at Amazon Web Services (AWS), working with Financial Services customers across the US. He has held a wide range of technical roles and enjoy showing customer’s art of the possible. His passion areas include conversational AI, contact center, and generative AI. In his free time, he enjoys spending quality time with his family exploring new places and over analyzing his sports team’s performance.

Vikas Shah is an Enterprise Solutions Architect at Amazon Web Services. He is a technology enthusiast who enjoys helping customers find innovative solutions to complex business challenges. His areas of interest are ML, IoT, robotics and storage. In his spare time, Vikas enjoys building robots, hiking, and traveling.

Introducing multi-turn conversation with an agent node for Amazon Bedrock Flows (preview)

January 22, 2025

by Christian Kamwangala Amazon AWS

Amazon Bedrock Flows offers an intuitive visual builder and a set of APIs to seamlessly link foundation models (FMs), Amazon Bedrock features, and AWS services to build and automate user-defined generative AI workflows at scale. Amazon Bedrock Agents offers a fully managed solution for creating, deploying, and scaling AI agents on AWS. With Flows, you can provide explicitly stated, user-defined decision logic to execute workflows, and add Agents as a node in a flow to use FMs to dynamically interpret and execute tasks based on contextual reasoning for certain steps in your workflow.

Today, we’re excited to announce multi-turn conversation with an agent node (preview), a powerful new capability in Flows. This new capability enhances the agent node functionality, enabling dynamic, back-and-forth conversations between users and flows, similar to a natural dialogue in a flow execution.

With this new feature, when an agent node requires clarification or additional context from the user before it can continue, it can intelligently pause the flow’s execution and request user-specific information. After the user sends the requested information, the flow seamlessly resumes the execution with the enriched input, maintaining the executionId of the conversation.

This creates a more interactive and context-aware experience, because the node can adapt its behavior based on user responses. The following sequence diagram shows the flow steps.

Multi-turn conversations make it straightforward to developers to create agentic workflows that can adapt and reason dynamically. This is particularly valuable for complex scenarios where a single interaction might not be sufficient to fully understand and address the user’s needs.

In this post, we discuss how to create a multi-turn conversation and explore how this feature can transform your AI applications.

Solution overview

Consider ACME Corp, a leading fictional online travel agency developing an AI-powered holiday trip planner using Flows. They face several challenges in their implementation:

Their planner can’t engage in dynamic conversations, requiring all trip details upfront instead of asking follow-up questions
They face challenges to orchestrate complex, multi-step travel planning processes that require coordinating flights, accommodations, activities, and transportation across multiple destinations, often leading to inefficiencies and suboptimal customer experiences
Their application can’t dynamically adapt its recommendations when users modify their preferences or introduce new constraints during the planning process

Let’s explore how the new multi-turn conversation capability in Flows addresses these challenges and enables ACME Corp to build a more intelligent, context-aware, and efficient holiday trip planner that truly enhances the customer’s travel planning experience.

The flow offers two distinct interaction paths. For general travel inquiries, users receive instant responses powered by an LLM. However, when users want to search or book flights and hotels, they are connected to an agent who guides them through the process, collecting essential information while maintaining the session until completion. The workflow is illustrated in the following diagram.

Prerequisites

For this example, you need the following:

An AWS account and a user with an AWS Identity and Access Management (IAM) role authorized to use Bedrock. For guidance, refer to Getting started with Amazon Bedrock. Make sure the role includes the permissions for using Flows, as explained in Prerequisites for Amazon Bedrock Flows, and the permissions for using Agents, as explained in Prerequisites for creating Amazon Bedrock Agents.
Access provided to the models you use for invocation and evaluation. For guidance, see Manage access to Amazon Bedrock foundation models.
Create an Amazon Bedrock Agent to automate the task for the travel agency application by orchestrating interactions between the FM, APIs calls, and user conversations. Our travel agent offers four essential booking functions: searching available flights, securing flight reservations, finding suitable hotel accommodations, and completing hotel bookings. For an example of how to create a travel agent, refer to Agents for Amazon Bedrock now support memory retention and code interpretation (preview). Make sure the agent has user input functionality enabled. This setting allows the agent to gather all required details through natural conversation, even when the initial request is incomplete.

Create a multi-turn conversation flow

To create a multi-turn conversation flow, complete the following steps:

On the Bedrock console, choose Flows under Builder tools in the navigation pane.
Start creating a new flow called ACME-Corp-trip-planner.

For detailed instructions on creating a Flow, see Amazon Bedrock Flows is now generally available with enhanced safety and traceability.

Bedrock provides different node types to build your prompt flow.

Choose the prompt node to evaluate the input intention. It will classify the intentions as categoryLetter=A if the user wants to search or book a hotel or flight and categoryLetter=B if the user is asking for destination information. If you’re using Amazon Bedrock Prompt Management, you can select the prompt from there.

For this node, we use the following message in the prompt configuration:

You are a query classifier. Analyze the {{input}} and respond with a single letter:

A: Travel planning/booking queries for hotel and flights Example: "Find flights to London"
B: Destination information queries Example: "What's the weather in Paris?"

Return only 'A' or 'B' based on the primary intent.

For our example, we chose Amazon’s Nova Lite model and set the temperature inference parameter to 0.1 to minimize hallucinations and enhance output reliability. You can select other available Amazon Bedrock models.

Create the Condition node with the following information and connect with the Query Classifier node. For this node, the condition value is:
```
Name: Booking
Condition: categoryLetter=="A"
```

Create a second prompt node for the LLM guide invocation. The input of the node is the output of the Condition node output “If all conditions are false.” To end this flow branch, add a Flow output node and connect the prompt node output to it.

You are AcmeGuide, an enthusiastic and knowledgeable travel guide. 
Your task is to provide accurate and comprehensive information about travel destinations to users. 
When answering a user's query, cover the following key aspects:

- Weather and best times to visit
- Famous local figures and celebrities
- Major attractions and landmarks
- Local culture and cuisine
- Essential travel tips

Answer the user's question {{query}}. 

Present the information in a clear and engaging manner. 
If you are unsure about specific details, acknowledge this and provide the most reliable information available. 
Avoid any hallucinations or fabricated content. 
Provide your response immediately after these instructions, without any preamble or additional text.

For our example, we chose Amazon’s Nova Lite model and set the temperature inference parameter to 0.1 to minimize hallucinations and enhance output reliability.

Finally, create the agent node and configure it to use the agent that was created previously. The input of the node is the output of the Condition node output “Conditions Booking.” To end this flow branch, add a Flow output node and connect the agent node output to it.
Choose Save to save your flow.

Test the flow

You’re now ready to test the flow through the Amazon Bedrock console or API. First, we ask for information about Paris. In the response, you can review the flow traces, which provide detailed visibility into the execution process. These traces help you monitor and debug response times for each step, track the processing of customer inputs, verify if guardrails are properly applied, and identify any bottlenecks in the system. Flow traces offer a comprehensive overview of the entire response generation process, allowing for more efficient troubleshooting and performance optimization.,

Next, we continue our conversation and request to book a travel to Paris. As you can see, now with the multi-turn support in Flows, our agent node is able to ask follow-up questions to gather all information and make the booking.

We continue talking to our agent, providing all required information, and finally, the agent makes the booking for us. In the traces, you can check the ExecutionId that maintains the session for the multi-turn requests.

After the confirmation, the agent has successfully completed the user request.

Use Amazon Bedrock Flows APIs

You can also interact with flows programmatically using the InvokeFlow API, as shown in the following code. During the initial invocation, the system automatically generates a unique executionId, which maintains the session for 1 hour. This executionId is essential for subsequent InvokeFlow API calls, because it provides the agent with contextual information necessary for maintaining conversation history and completing actions.

{
  "flowIdentifier": " MQM2RM1ORA",
  "flowAliasIdentifier": "T00ZXPGI35",
  "inputs": [
    {
      "content": {
        "document": "Book a flight to paris"
      },
      "nodeName": "FlowInputNode",
      "nodeOutputName": "document"
    }
  ]
}

If the agent node in the flow decides that it needs more information from the user, the response stream (responseStream) from InvokeFlow includes a FlowMultiTurnInputRequestEvent event object. The event has the requested information in the content(FlowMultiTurnInputContent) field.

The following is an example FlowMultiTurnInputRequestEvent JSON object:

{
  "nodeName": "Trip_planner",
  "nodeType": "AgentNode",
  "content": {
      "document": "Certainly! I'd be happy to help you book a flight to Paris. 
To get started, I need some more information:
1. What is your departure airport (please provide the IATA airport code if possible)?
2. What date would you like to travel (in YYYYMMDD format)?
3. Do you have a preferred time for the flight (in HHMM format)?
Once I have these details, I can search for available flights for you."
  }
}

Because the flow can’t continue until more input is received, the flow also emits a FlowCompletionEvent event. A flow always emits the FlowMultiTurnInputRequestEvent before the FlowCompletionEvent. If the value of completionReason in the FlowCompletionEvent event is INPUT_REQUIRED, the flow needs more information before it can continue.

The following is an example FlowCompletionEvent JSON object:

{
  "completionReason": "INPUT_REQUIRED"
}

Send the user response back to the flow by calling the InvokeFlow API again. Be sure to include the executionId for the conversation.

The following is an example JSON request for the InvokeFlow API, which provides additional information required by an agent node:

{
  "flowIdentifier": "MQM2RM1ORA",
  "flowAliasIdentifier": "T00ZXPGI35",
  "executionId": "b6450554-f8cc-4934-bf46-f66ed89b60a0",
  "inputs": [
    {
      "content": {
        "document": "Madrid on Valentine's day 2025"
      },
      "nodeName": "Trip_planner",
      "nodeInputName": "agentInputText"
    }
  ]
}

This back and forth continues until no more information is needed and the agent has all that is required to complete the user’s request. When no more information is needed, the flow emits a FlowOutputEvent event, which contains the final response.

The following is an example FlowOutputEvent JSON object:

{
  "nodeName": "FlowOutputNode",
  "content": {
      "document": "Great news! I've successfully booked your flight to Paris. Here are the details:

- Date: February 14, 2025 (Valentine's Day)
- Departure: Madrid (MAD) at 20:43 (8:43 PM)
- Arrival: Paris (CDG)

Your flight is confirmed."
  }
}

The flow also emits a FlowCompletionEvent event. The value of completionReason is SUCCESS.

The following is an example FlowCompletionEvent JSON object:

{
  "completionReason": "SUCCESS"
}

To get started with multi-turn invocation, use the following example code. It handles subsequent interactions using the same executionId and maintains context throughout the conversation. You need to specify your flow’s ID in FLOW_ID and its alias ID in FLOW_ALIAS_ID (refer to View information about flows in Amazon Bedrock for instructions on obtaining these IDs).

The system will prompt for additional input as needed, using the executionId to maintain context across multiple interactions, providing a coherent and continuous conversation flow while executing the requested actions.

"""
Runs an Amazon Bedrock flow and handles multi-turn interactions
"""
import boto3
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def invoke_flow(client, flow_id, flow_alias_id, input_data, execution_id=None):
    """
    Invoke an Amazon Bedrock flow and handle the response stream.

    Args:
        client: Boto3 client for Bedrock
        flow_id: The ID of the flow to invoke
        flow_alias_id: The alias ID of the flow
        input_data: Input data for the flow
        execution_id: Execution ID for continuing a flow. Defaults to None for first run.

    Returns:
        Dict containing flow_complete status, input_required info, and execution_id
    """
    request_params = {
        "flowIdentifier": flow_id,
        "flowAliasIdentifier": flow_alias_id,
        "inputs": [input_data]
    }
    
    if execution_id:
        request_params["executionId"] = execution_id

    response = client.invoke_flow(**request_params)
    execution_id = response.get('executionId', execution_id)
    
    input_required = None
    flow_status = ""

    for event in response['responseStream']:
        if 'flowCompletionEvent' in event:
            flow_status = event['flowCompletionEvent']['completionReason']
        elif 'flowMultiTurnInputRequestEvent' in event:
            input_required = event
        elif 'flowOutputEvent' in event:
            print(event['flowOutputEvent']['content']['document'])
        elif 'flowTraceEvent' in event:
            print("Flow trace:", event['flowTraceEvent'])

    return {
        "flow_status": flow_status,
        "input_required": input_required,
        "execution_id": execution_id
    }

def create_input_data(text, node_name="FlowInputNode", is_initial_input=True):
    """
    Create formatted input data dictionary.
    
    Args:
        text: The input text
        node_name: Name of the node (defaults to "FlowInputNode")
        is_initial_input: Boolean indicating if this is the first input (defaults to True)
    
    Returns:
        Dict containing the formatted input data
    """
    input_data = {
        "content": {"document": text},
        "nodeName": node_name
    }

    if is_initial_input:
        input_data["nodeOutputName"] = "document"
    else:
        input_data["nodeInputName"] = "agentInputText"

    return input_data

def main():
    FLOW_ID = "MQM2RM1ORA"
    FLOW_ALIAS_ID = "T00ZXPGI35"
    
    session = boto3.Session(
        region_name='us-east-1'
    )
    bedrock_agent_client = session.client(
        'bedrock-multi-turn', 
    )

    execution_id = None

    try:
        # Initial input
        user_input = input("Enter input: ")
        input_data = create_input_data(user_input, is_initial_input=True)

        while True:
            result = invoke_flow(
                bedrock_agent_client, 
                FLOW_ID, 
                FLOW_ALIAS_ID, 
                input_data, 
                execution_id
            )
        
            if result['flow_status'] == "SUCCESS":
                break
            
            if result['flow_status'] == "INPUT_REQUIRED":
                more_input = result['input_required']
                prompt = f"{more_input['flowMultiTurnInputRequestEvent']['content']['document']}: "
                user_input = input(prompt)
                # Subsequent inputs
                input_data = create_input_data(
                    user_input,
                    more_input['flowMultiTurnInputRequestEvent']['nodeName'],
                    is_initial_input=False
                )
            
            execution_id = result['execution_id']

    except Exception as e:
        logger.error(f"Error occurred: {str(e)}", exc_info=True)

if __name__ == "__main__":
    main()

Clean up

To clean up your resources, delete the flow, agent, AWS Lambda functions created for the agent, and knowledge base.

Conclusion

The introduction of multi-turn conversation capability in Flows marks a significant advancement in building sophisticated conversational AI applications. In this post, we demonstrated how this feature enables developers to create dynamic, context-aware workflows that can handle complex interactions while maintaining conversation history and state. The combination of the Flows visual builder interface and APIs with powerful agent capabilities makes it straightforward to develop and deploy intelligent applications that can engage in natural, multi-step conversations.

With this new capability, businesses can build more intuitive and responsive AI solutions that better serve their customers’ needs. Whether you’re developing a travel booking system, customer service or other conversational application, multi-turn conversation with Flows provides the tools needed to create sophisticated AI workflows with minimal complexity.

We encourage you to explore these capabilities on the Bedrock console and start building your own multi-turn conversational applications today. For more information and detailed documentation, visit the Amazon Bedrock User Guide. We look forward to seeing the innovative solutions you will create with these powerful new features.

About the Authors

Christian Kamwangala is an AI/ML and Generative AI Specialist Solutions Architect at AWS, based in Paris, France. He helps enterprise customers architect and implement cutting-edge AI solutions using the comprehensive suite of AWS tools, with a focus on production-ready systems that follow industry best practices. In his spare time, Christian enjoys exploring nature and spending time with family and friends.

Irene Arroyo Delgado is an AI/ML and GenAI Specialist Solutions Architect at AWS. She focuses on bringing out the potential of generative AI for each use case and productionizing ML workloads to achieve customers’ desired business outcomes by automating end-to-end ML lifecycles. In her free time, Irene enjoys traveling and hiking.

Video security analysis for privileged access management using generative AI and Amazon Bedrock

January 22, 2025

by Ken Haynes Amazon AWS

Security teams in highly regulated industries like financial services often employ Privileged Access Management (PAM) systems to secure, manage, and monitor the use of privileged access across their critical IT infrastructure. Security and compliance regulations require that security teams audit the actions performed by systems administrators using privileged credentials. Keystroke logging (the action of recording the keys struck on a keyboard into a log) and video recording of the server console sessions is a feature of PAM systems that enable security teams to meet these security and compliance obligations.

Keystroke logging produces a dataset that can be programmatically parsed, making it possible to review the activity in these sessions for anomalies, quickly and at scale. However, the capturing of keystrokes into a log is not always an option. Operating systems like Windows are predominantly interacted with through a graphical user interface, restricting the PAM system to capturing the activity in these privileged access sessions as video recordings of the server console.

Video recordings can’t be easily parsed like log files, requiring security team members to playback the recordings to review the actions performed in them. A typical PAM system of a financial services organization can produce over 100,000 hours of video recordings each month. If only 30% of these video recordings come from Windows Servers, it would require a workforce of 1,000 employees, working around the clock, to review them all. As a result, security teams are constrained to performing random spot-checks, impacting their ability to detect security anomalies by bad actors.

The following graphic is a simple example of Windows Server Console activity that could be captured in a video recording.

AI services have revolutionized the way we process, analyze, and extract insights from video content. These services use advanced machine learning (ML) algorithms and computer vision techniques to perform functions like object detection and tracking, activity recognition, and text and audio recognition. However, to describe what is occurring in the video from what can be visually observed, we can harness the image analysis capabilities of generative AI.

Advancements in multi-modal large language models (MLLMs), like Anthropic’s state-of-the-art Claude 3, offer cutting-edge computer vision techniques, enabling Anthropic’s Claude to interpret visual information and understand the relationships, activities, and broader context depicted in images. Using this capability, security teams can process all the video recordings into transcripts. Security analytics can then be performed against the transcripts, enabling organizations to improve their security posture by increasing their ability to detect security anomalies by bad actors.

In this post, we show you how to use Amazon Bedrock and Anthropic’s Claude 3 to solve this problem. We explain the end-to-end solution workflow, the prompts needed to produce the transcript and perform security analysis, and provide a deployable solution architecture.

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the AWS tools without having to manage any infrastructure.

Solution workflow

Our solution requires a two-stage workflow of video transcription and security analysis. The first stage uses Anthropic’s Claude to produce a transcript of the video recordings. The second stage uses Anthropic’s Claude to analyze the transcript for security anomalies.

Stage 1: Video transcription

Many of the MLLMs available at the time of writing, including Anthropic’s Claude, are unable to directly process sequential visual data formats like MPEG and AVI, and of those that can, their performance and accuracy are below what can be achieved when analyzing static images. Because of that, we need to break the video recordings into a sequence of static images for Anthropic’s Claude to analyze.

The following diagram depicts the workflow we will use to perform the video transcription.

The first step in our workflow extracts one still frame image a second from our video recording. Then we engineer images into a prompt that instructs Anthropic’s Claude Haiku 3 to analyze them and produce a visual transcript. At the time of writing, Anthropic’s Claude on Amazon Bedrock is limited to accepting up to 20 images at one time; therefore, to transcribe videos longer than 20 seconds, we need to submit the images in batches to produce a transcript of each 20-second segment. After all segments have been individually transcribed, we engineer them into another prompt instructing Anthropic’s Claude Sonnet 3 to aggregate the segments into a complete transcript.

Stage 2: Security analysis

The second stage can be performed several times to run different queries against the combined transcript for security analysis.

The following diagram depicts the workflow we will use to perform the security analysis of the aggregated video transcripts.

The type of security analysis performed against the transcripts will vary depending on factors like the data classification or criticality of the server the recording was taken from. The following are some common examples of the security analysis that could be performed:

Compliance with change request runbook – Compare the actions described in the transcript with the steps defined in the runbook of the associated change request. Highlight any actions taken that don’t appear to be part of the runbook.
Sensitive data access and exfiltration risk – Analyze the actions described in the transcript to determine whether any sensitive data may have been accessed, changed, or copied to an external location.
Privilege elevation risk – Analyze the actions described in the transcript to determine whether any attempts were made to elevate privileges or gain unauthorized access to a system.

This workflow provides the mechanical function of processing the video recordings through Anthropic’s Claude into transcripts and performing security analysis. The key to the capability of the solution is the prompts we have engineered to instruct Anthropic’s Claude what to do.

Prompt engineering

Prompt engineering is the process of carefully designing the input prompts or instructions that are given to LLMs and other generative AI systems. These prompts are crucial in determining the quality, relevance, and coherence of the output generated by the AI.

For a comprehensive guide to prompt engineering, refer to Prompt engineering techniques and best practices: Learn by doing with Anthropic’s Claude 3 on Amazon Bedrock.

Video transcript prompt (Stage 1)

The utility of our solution relies on the accuracy of the transcripts we receive from Anthropic’s Claude when it is passed the images to analyze. We must also account for limitations in the data that we ask Anthropic’s Claude to analyze. The image sequences we pass to Anthropic’s Claude will often lack the visual indicators necessary to conclusively determine what actions are being performed. For example, the use of shortcut keys like Ctrl + S to save a document can’t be detected from an image of the console. The click of a button or menu items could also occur in the 1 fps time lapse between the still frame images. These limitations can lead Anthropic’s Claude to make inaccurate assumptions about the action being performed. To counter this, we include instructions in our prompt to not make assumptions and tag where it can’t categorically determine whether an action has been performed or not.

The outputs from generative AI models can never be 100% accurate, but we can engineer a complex prompt that will provide a transcript with a level of accuracy sufficient for our security analysis purposes. We provide an example prompt with the solution that we detail further and that you can adapt and modify at will. Using the task context, detailed task description and rules, immediate task, and instructions to think step-by-step in our prompt, we influence the accuracy of the image analysis by describing the role and task to be performed by Anthropic’s Claude. With the examples and output formatting elements, we can control the consistency of the transcripts we receive as the output.

To learn more about creating complex prompts and gain practical experience, refer to the Complex Prompts from Scratch lab in our Prompt Engineering with Anthropic’s Claude 3 workshop.

The following is an example of our task context:

You are a Video Transcriptionist who specializes in watching recordings from Windows 
Server Consoles, providing a summary description of what tasks you visually observe 
taking place in videos.  You will carefully watch through the video and document the 
various tasks, configurations, and processes that you see being performed by the IT 
Systems Administrator. Your goal is to create a comprehensive, step-by-step transcript 
that captures all the relevant details.

The following is the detailed task description and rules:

Here is a description of how you will function:
- You receive an ordered sequence of still frame images taken from a sample of a video 
recording.
- You will analyze each of the still frame images in the video sequence, comparing the 
previous image to the current image, and determine a list of actions being performed by 
the IT Systems Administrator.
- You will capture detail about the applications being launched, websites accessed, 
files accessed or updated.
- Where you identify a Command Line Interface in use by the IT Systems Administrator, 
you will capture the commands being executed.
- If there are many small actions such as typing text letter by letter then you can 
summarize them as one step.
- If there is a big change between frames and the individual actions have not been 
captured then you should describe what you think has happened. Precede that description 
with the word ASSUMPTION to clearly mark that you are making an assumption.

The following are examples:

Here is an example.
<example>
1. The Windows Server desktop is displayed.
2. The administrator opens the Start menu.
3. The administrator uses the search bar to search for and launch the Paint application.
4. The Paint application window opens, displaying a blank canvas.
5. The administrator selects the Text tool from the toolbar in Paint.
6. The administrator types the text "Hello" using the keyboard.
7. The administrator types the text "World!" using the keyboard, completing the phrase 
"Hello World!".
8. The administrator adds a smiley face emoticon ":" and ")" to the end of the text.
9. ASSUMPTION: The administrator saves the Paint file.
10. ASSUMPTION: The administrator closes the Paint application.
</example>

The following summarizes the immediate task:

Analyze the actions the administrator performs.

The following are instructions to think step-by-step:

Think step-by-step before you narrate what action the administrator took in 
<thinking></thinking> tags.
First, observe the images thoroughly and write down the key UI elements that are 
relevant to administrator input, for example text input, mouse clicks, and buttons.
Then identify which UI elements changed from the previous frame to the current frame. 
Then think about all the potential administrator actions that resulted in the change.
Finally, write down the most likely action that the user took in 
<narration></narration> tags.

Lastly, the following is an example of output formatting:

Detail each of the actions in a numbered list.
Do not provide any preamble, only output the list of actions and start with 1.
Put your response in <narration></narration> tags.

Aggregate transcripts prompt (Stage 1)

To create the aggregated transcript, we pass all of the segment transcripts to Anthropic’s Claude in a single prompt along with instructions on how to combine them and format the output:

Combine the lists of actions in the provided messages.
List all the steps as a numbered list and start with 1.
You must keep the ASSUMPTION: where it is used.
Keep the style of the list of actions.
Do not provide any preamble, and only output the list of actions.

Security analysis prompts (Stage 2)

The prompts we use for the security analysis require the aggregated transcript to be provided to Anthropic’s Claude in the prompt along with a description of the security analysis to be performed.

The following prompt is for compliance with a change request runbook:

You are an IT Security Auditor. You will be given two documents to compare.
The first document is a runbook for an IT Change Management Ticket that describes the 
steps an IT Administrator is going to perform.
The second document is a transcript of a video recording taken in the Windows Server 
Console that the IT Administrator used to complete the steps described in the runbook. 
Your task is to compare the transcript with the runbook and assess whether there are 
any anomalies that could be a security concern.

You carefully review the two documents provided - the runbook for an IT Change 
Management Ticket and the transcript of the video recording from the Windows Server 
Console - to identify any anomalies that could be a security concern.

As the IT Security Auditor, you will provide your assessment as follows:
1. Comparison of the Runbook and Transcript:
- You will closely examine each step in the runbook and compare it to the actions 
taken by the IT Administrator in the transcript.
- You will look for any deviations or additional steps that were not outlined in the 
runbook, which could indicate unauthorized or potentially malicious activities.
- You will also check if the sequence of actions in the transcript matches the steps 
described in the runbook.
2. Identification of Anomalies:
- You will carefully analyze the transcript for any unusual commands, script executions,
 or access to sensitive systems or data that were not mentioned in the runbook.
- You will look for any indications of privilege escalation, unauthorized access 
attempts, or the use of tools or techniques that could be used for malicious purposes.
- You will also check for any discrepancies between the reported actions in the runbook 
and the actual actions taken, as recorded in the transcript.

Here are the two documents.  The runbook for the IT Change Management ticket is provided 
in <runbook> tags.  The transcript is provided in <transcript> tags.

The following prompt is for sensitive data access and exfiltration risk:

You are an IT Security Auditor. You will be given a transcript that describes the actions 
performed by an IT Administrator on a Window Server.  Your task is to assess whether there 
are any actions taken, such as accessing, changing or copying of sensitive data, that could 
be a breach of data privacy, data security or a data exfiltration risk.

The transcript is provided in <transcript> tags.

The following prompt is for privilege elevation risk:

You are an IT Security Auditor. You will be given a transcript that describes the actions 
performed by an IT Administrator on a Window Server. Your task is to assess whether there 
are any actions taken that could represent an attempt to elevate privileges or gain 
unauthorized access to a system.

The transcript is provided in <transcript> tags.

Solution overview

The serverless architecture provides a video processing pipeline to run Stage 1 of the workflow, and a simple UI for the Stage 2 security analysis of the aggregated transcripts. This architecture can be used for demonstration purposes and testing with your own video recordings and prompts; however, it is not suitable for a production use.

The following diagram illustrates the solution architecture.

In Stage 1, video recordings are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket, which sends a notification of the object creation to Amazon EventBridge. An EventBridge rule then triggers the AWS Step Functions workflow to begin processing the video recording into a transcript. The Step Functions workflow generates the still frame images from the video recording and uploads them to another S3 bucket. Then the workflow runs parallel tasks to submit the images, for each 20-second segment, to Amazon Bedrock for transcribing before writing the output to an Amazon DynamoDB table. The segment transcripts are passed to the final task in the workflow, which submits them to Amazon Bedrock, with instructions to combine them into an aggregated transcript, which is written to DynamoDB.

The UI is provided by a simple Streamlit application with access to the DynamoDB and Amazon Bedrock APIs. Through the Streamlit application, users can read the transcripts from DynamoDB and submit them to Amazon Bedrock for security analysis.

Solution implementation

The solution architecture we’ve presented provides a starting point for security teams looking to improve their security posture. For a detailed solution walkthrough and guidance on how to implement this solution, refer to the Video Security Analysis for Privileged Access Management using GenAI GitHub repository. This will guide you through the prerequisite tools, enabling models in Amazon Bedrock, cloning the repository, and using the AWS Cloud Development Kit (AWS CDK) to deploy into your own AWS account.

We welcome your feedback, questions, and contributions as we continue to refine and expand this approach to video-based security analysis.

Conclusion

In this post, we showed you an innovative solution to a challenge faced by security teams in highly regulated industries: the efficient security analysis of vast amounts of video recordings from Privileged Access Management (PAM) systems. We demonstrated how you can use Anthropic’s Claude 3 family of models and Amazon Bedrock to perform the complex task of analyzing video recordings of server console sessions and perform queries to highlight any potential security anomalies.

We also provided a template for how you can analyze sequences of still frame images taken from a video recording, which could be applied to different types of video content. You can use the techniques described in this post to develop your own video transcription solution. By tailoring the prompt engineering to your video content type, you can adapt the solution to your use case. Furthermore, by using model evaluation in Amazon Bedrock, you can improve the accuracy of the results you receive from your prompt.

To learn more, the Prompt Engineering with Anthropic’s Claude 3 workshop is an excellent resource for you to gain hands-on experience in your own AWS account.

About the authors

Ken Haynes is a Senior Solutions Architect in AWS Global Financial Services and has been with AWS since September 2022. Prior to AWS, Ken worked for Santander UK Technology and Deutsche Bank helping them build their cloud foundations on AWS, Azure, and GCP.

Rim Zaafouri is a technologist at heart and a cloud enthusiast. As an AWS Solutions Architect, she guides financial services businesses in their cloud adoption journey and helps them to drive innovation, with a particular focus on serverless technologies and generative AI. Beyond the tech world, Rim is an avid fitness enthusiast and loves exploring new destinations around the world.

Patrick Sard works as a Solutions Architect accompanying financial institutions in EMEA through their cloud transformation journeys. He has helped multiple enterprises harness the power of AI and machine learning on AWS. He’s currently guiding organizations to unlock the transformative potential of Generative AI technologies. When not architecting cloud solutions, you’ll likely find Patrick on a tennis court, applying the same determination to perfect his game as he does to solving complex technical challenges.

How Cato Networks uses Amazon Bedrock to transform free text search into structured GraphQL queries

January 22, 2025

by Asaf Fried Amazon AWS

This is a guest post authored by Asaf Fried, Daniel Pienica, Sergey Volkovich from Cato Networks.

Cato Networks is a leading provider of secure access service edge (SASE), an enterprise networking and security unified cloud-centered service that converges SD-WAN, a cloud network, and security service edge (SSE) functions, including firewall as a service (FWaaS), a secure web gateway, zero trust network access, and more.

On our SASE management console, the central events page provides a comprehensive view of the events occurring on a specific account. With potentially millions of events over a selected time range, the goal is to refine these events using various filters until a manageable number of relevant events are identified for analysis. Users can review different types of events such as security, connectivity, system, and management, each categorized by specific criteria like threat protection, LAN monitoring, and firmware updates. However, the process of adding filters to the search query is manual and can be time consuming, because it requires in-depth familiarity with the product glossary.

To address this challenge, we recently enabled customers to perform free text searches on the event management page, allowing new users to run queries with minimal product knowledge. This was accomplished by using foundation models (FMs) to transform natural language into structured queries that are compatible with our products’ GraphQL API.

In this post, we demonstrate how we used Amazon Bedrock, a fully managed service that makes FMs from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and quickly integrate and deploy them into your applications using AWS tools without having to manage the infrastructure. Amazon Bedrock enabled us to enrich FMs with product-specific knowledge and convert free text inputs from users into structured search queries for the product API that can greatly enhance user experience and efficiency in data management applications.

Solution overview

The Events page includes a filter bar with both event and time range filters. These filters need to be added and updated manually for each query. The following screenshot shows an example of the event filters (1) and time filters (2) as seen on the filter bar (source: Cato knowledge base).

The event filters are a conjunction of statements in the following form:

Key – The field name
Operator – The evaluation operator (for example, is, in, includes, greater than, etc.)
Value – A single value or list of values

For example, the following screenshot shows a filter for action in [ Alert, Block ].

The time filter is a time range following ISO 8601 time intervals standard.

For example, the following screenshot shows a time filter for UTC.2024-10-{01/00:00:00--02/00:00:00}.

Converting free text to a structured query of event and time filters is a complex natural language processing (NLP) task that can be accomplished using FMs. Customizing an FM that is specialized on a specific task is often done using one of the following approaches:

Prompt engineering – Add instructions in the context/input window of the model to help it complete the task successfully.
Retrieval Augmented Generation (RAG) – Retrieve relevant context from a knowledge base, based on the input query. This context is augmented to the original query. This approach is used for reducing the amount of context provided to the model to relevant data only.
Fine-tuning – Train the FM on data relevant to the task. In this case, the relevant context will be embedded into the model weights, instead of being part of the input.

For our specific task, we’ve found prompt engineering sufficient to achieve the results we needed.

Because the event filters on the Events page are specific to our product, we need to provide the FM with the exact instructions for how to generate them, based on free text queries. The main considerations when creating the prompt are:

Include the relevant context – This includes the following:
- The available keys, operators, and values the model can use.
- Specific instructions. For example, numeric operators can only be used with keys that have numeric values.
Make sure it’s simple to validate – Given the extensive number of instructions and limitations, we can’t trust the model output without checking the results for validity. For example, what if the model generates a filter with a key not supported by our API?

Instead of asking the FM to generate the GraphQL API request directly, we can use the following method:

Instruct the model to return a response following a well-known JSON schema validation IETF standard.
Validate the JSON schema on the response.
Translate it to a GraphQL API request.

Request prompt

Based on the preceding examples, the system prompt will be structured as follows:

# Genral Instructions

Your task is to convert free text queries to a JSON format that will be used to query security and network events in a SASE management console of Cato Networks. You are only allowed to output text in JSON format. Your output will be validated against the following schema that is compatible with the IETF standard:

# Schema definition
{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "Query Schema",
   "description": "Query object to be executed in the 'Events' management console page. ",
    "type": "object",
    "properties":
    {
        "filters":
        {
            "type": "array",
           "description": "List of filters to apply in the query, based on the free text query provided.",
            "items":
            {
                "oneOf":
                [
                    {
                        "$ref": "#/$defs/Action"
                    },
                    .
                    .
                    .
                ]
            }
        },
        "time":
        {
            "description": "Start datetime and end datetime to be used in the query.",
            "type": "object",
            "required":
            [
                "start",
                "end"
            ],
            "properties":
            {
                "start":
                {
                    "description": "start datetime",
                    "type": "string",
                    "format": "date-time"
                },
                "end":
                {
                    "description": "end datetime",
                    "type": "string",
                    "format": "date-time"
                }
            }
        },
        "$defs":
        {
            "Operator":
            {
                "description": "The operator used in the filter.",
                "type": "string",
                "enum":
                [
                    "is",
                    "in",
                    "not_in",
                    .
                    .
                    .
                ]
            },
            "Action":
            {
                "required":
                [
                    "id",
                    "operator",
                    "values"
                ],
                "description": "The action taken in the event.",
                "properties":
                {
                    "id":
                    {
                        "const": "action"
                    },
                    "operator":
                    {
                        "$ref": "#/$defs/Operator"
                    },
                    "values":
                    {
                        "type": "array",
                        "minItems": 1,
                        "items":
                        {
                            "type": "string",
                            "enum":
                            [
                                "Block",
                                "Allow",
                                "Monitor",
                                "Alert",
                                "Prompt"
                            ]
                        }
                    }
                }
            },
            .
            .
            .
        }
    }
}

Each user query (appended to the system prompt) will be structured as follows:

# Free text query
Query: {free_text_query}

# Add current timestamp for context (used for time filters) 
Context: If you need a reference to the current datetime, it is {datetime}, and the current day of the week is {day_of_week}

The same JSON schema included in the prompt can also be used to validate the model’s response. This step is crucial, because model behavior is inherently non-deterministic, and responses that don’t comply with our API will break the product functionality.

In addition to validating alignment, the JSON schema can also point out the exact schema violation. This allows us to create a policy based on different failure types. For example:

If there are missing fields marked as required, output a translation failure to the user
If the value given for an event filter doesn’t comply with the format, remove the filter and create an API request from other values, and output a translation warning to the user

After the FM successfully translates the free text into structured output, converting it into an API request—such as GraphQL—is a straightforward and deterministic process.

To validate this approach, we’ve created a benchmark with hundreds of text queries and their corresponding expected JSON outputs. For example, let’s consider the following text query:

Security events with high risk level from IPS and Anti Malware engines

For this query, we expect the following response from the model, based on the JSON schema provided:

{
    "filters":
    [
        {
            "id": "risk_level",
            "operator": "is",
            "values":
            [
                "High"
            ]
        },
        {
            "id": "event_type",
            "operator": "is",
            "values":
            [
                "Security"
            ]
        },
        {
            "id": "event_subtype ",
            "operator": "in",
            "values":
            [
                "IPS",
                "Anti Malware"
            ]
        }
    ]
}

For each response of the FM, we define three different outcomes:

Success:
- Valid JSON
- Valid by schema
- Full match of filters
Partial:
- Valid JSON
- Valid by schema
- Partial match of filters
Error:
- Invalid JSON or invalid by schema

Because translation failures lead to a poor user experience, releasing the feature was contingent on achieving an error rate below 0.05, and the selected FM was the one with the highest success rate (ratio of responses with full match of filters) passing this criterion.

Working with Amazon Bedrock

Amazon Bedrock is a fully managed service that simplifies access to a wide range of state-of-the-art FMs through a single, serverless API. It offers a production-ready service capable of efficiently handling large-scale requests, making it ideal for enterprise-level deployments.

Amazon Bedrock enabled us to efficiently transition between different models, making it simple to benchmark and optimize for accuracy, latency, and cost, without the complexity of managing the underlying infrastructure. Additionally, some vendors within the Amazon Bedrock landscape, such as Cohere and Anthropic’s Claude, offer models with native understanding of JSON schemas and structured data, further enhancing their applicability to our specific task.

Using our benchmark, we evaluated several FMs on Amazon Bedrock, taking into account accuracy, latency, and cost. Based on the results, we selected anthropic.claude-3-5-sonnet-20241022-v2:0, which met the error rate criterion and achieved the highest success rate while maintaining reasonable costs and latency. Following this, we proceeded to develop the complete solution, which includes the following components:

Management console – Cato’s management application that the user interacts with to view their account’s network and security events.
GraphQL server – A backend service that provides a GraphQL API for accessing data in a Cato account.
Amazon Bedrock – The cloud service that handles hosting and serving requests to the FM.
Natural language search (NLS) service – An Amazon Elastic Kubernetes Service (Amazon EKS) hosted service to bridge between Cato’s management console and Amazon Bedrock. This service is responsible for creating the complete prompt for the FM and validating the response using the JSON schema.

The following diagram illustrates the workflow from the user’s manual query to the extraction of relevant events.

With the new capability, users can also use free text query mode, which is processed as shown in the following diagram.

The following screenshot of the Events page displays free text query mode in action.

Business impact

The recent feature update has received positive customer feedback. Users, especially those unfamiliar with Cato, have found the new search capability more intuitive, making it straightforward to navigate and engage with the system. Additionally, the inclusion of multi-language input, natively supported by the FM, has made the Events page more accessible for non-native English speakers to use, helping them interact and find insights in their own language.

One of the standout impacts is the significant reduction in query time—cut down from minutes of manual filtering to near-instant results. Account admins using the new feature have reported near-zero time to value, experiencing immediate benefits with minimal learning curve.

Conclusion

Accurately converting free text inputs into structured data is crucial for applications that involve data management and user interaction. In this post, we introduced a real business use case from Cato Networks that significantly improved user experience.

By using Amazon Bedrock, we gained access to state-of-the-art generative language models with built-in support for JSON schemas and structured data. This allowed us to optimize for cost, latency, and accuracy without the complexity of managing the underlying infrastructure.

Although a prompt engineering solution met our needs, users handling complex JSON schemas might want to explore alternative approaches to reduce costs. Including the entire schema in the prompt can lead to a significantly high token count for a single query. In such cases, consider using Amazon Bedrock to fine-tune a model, to embed product knowledge more efficiently.

About the Authors

Asaf Fried leads the Data Science team in Cato Research Labs at Cato Networks. Member of Cato Ctrl. Asaf has more than six years of both academic and industry experience in applying state-of-the-art and novel machine learning methods to the domain of networking and cybersecurity. His main research interests include asset discovery, risk assessment, and network-based attacks in enterprise environments.

Daniel Pienica is a Data Scientist at Cato Networks with a strong passion for large language models (LLMs) and machine learning (ML). With six years of experience in ML and cybersecurity, he brings a wealth of knowledge to his work. Holding an MSc in Applied Statistics, Daniel applies his analytical skills to solve complex data problems. His enthusiasm for LLMs drives him to find innovative solutions in cybersecurity. Daniel’s dedication to his field is evident in his continuous exploration of new technologies and techniques.

Sergey Volkovich is an experienced Data Scientist at Cato Networks, where he develops AI-based solutions in cybersecurity & computer networks. He completed an M.Sc. in physics at Bar-Ilan University, where he published a paper on theoretical quantum optics. Before joining Cato, he held multiple positions across diverse deep learning projects, ranging from publishing a paper on discovering new particles at the Weizmann Institute to advancing computer networks and algorithmic trading. Presently, his main area of focus is state-of-the-art natural language processing.

Omer Haim is a Senior Solutions Architect at Amazon Web Services, with over 6 years of experience dedicated to solving complex customer challenges through innovative machine learning and AI solutions. He brings deep expertise in generative AI and container technologies, and is passionate about working backwards from customer needs to deliver scalable, efficient solutions that drive business value and technological transformation.

Using FMEval for model evaluation

Using SageMaker with MLflow to track experiments

Code walkthrough

Prerequisites

Evaluate a model and log to MLflow

ModelRunner definition

DataConfig definition

Evaluation sets definition

Evaluate and log to MLflow

Run evaluations

Clean up

Conclusion

About the authors

Solution overview

Prerequisites

Extend a SageMaker container image for your model

Build your inference.py file

model_fn

input_fn

predict_fn:

output_fn

Test your inference.py file

Model file structure, tar.gz compressing, and S3 upload

Create SageMaker model, SageMaker endpoint configuration, and SageMaker endpoint

Test your custom SageMaker inference endpoint

Clean up

Conclusion

About the Authors

Solution overview

Generate images with Nova Canvas

Essential elements of good prompts

Example image generation prompts

Generate video with Nova Reel

Example video generation prompts

Generate image-based video using Nova Reel

Tying it all together

Best practices

Fundamentals of prompt engineering for Amazon Nova

Conclusion

About the authors

Security in Amazon Bedrock

Solution overview

Prerequisites

Prepare your data

Create a KMS symmetric key

Create an S3 bucket and configure encryption

Upload the training data

Create a VPC

Create an Amazon S3 VPC gateway endpoint

Create a security group for the AWS KMS VPC interface endpoint

Create a security group for the Amazon Bedrock custom fine-tuning job

Create an AWS KMS VPC interface endpoint

Create a service role for model customization

Update the KMS key policy with the IAM role

Initiate the fine-tuning job

Monitor the job

Purchase provisioned throughput

Use the model

Clean up

Conclusion

About the Authors

Generative AI scoping framework

OWASP Top 10 for LLMs

Solution overview

Authentication layer (Amazon Cognito)

Application controller layer (LLM orchestrator Lambda function)

LLM and LLM agent layer (Amazon Bedrock LLMs)

Agent plugin controller layer (action group Lambda function)

RAG data store layer

Summary

About the Authors

Solution overview

Prerequisites

Deploy the solution

How custom images are deployed and attached

Add more custom images

Clean up

Conclusion

About the Authors

Enhancing customer interactions with LLMs

Build your `inference.py` file

`model_fn`

`input_fn`

`predict_fn`:

`output_fn`

Test your `inference.py` file