Ray jobs on Amazon SageMaker HyperPod: scalable and resilient distributed AI

Ray jobs on Amazon SageMaker HyperPod: scalable and resilient distributed AI

Foundation model (FM) training and inference has led to a significant increase in computational needs across the industry. These models require massive amounts of accelerated compute to train and operate effectively, pushing the boundaries of traditional computing infrastructure. They require efficient systems for distributing workloads across multiple GPU accelerated servers, and optimizing developer velocity as well as performance.

Ray is an open source framework that makes it straightforward to create, deploy, and optimize distributed Python jobs. At its core, Ray offers a unified programming model that allows developers to seamlessly scale their applications from a single machine to a distributed cluster. It provides a set of high-level APIs for tasks, actors, and data that abstract away the complexities of distributed computing, enabling developers to focus on the core logic of their applications. Ray promotes the same coding patterns for both a simple machine learning (ML) experiment and a scalable, resilient production application. Ray’s key features include efficient task scheduling, fault tolerance, and automatic resource management, making it a powerful tool for building a wide range of distributed applications, from ML models to real-time data processing pipelines. With its growing ecosystem of libraries and tools, Ray has become a popular choice for organizations looking to use the power of distributed computing to tackle complex and data-intensive problems.

Amazon SageMaker HyperPod is a purpose-built infrastructure to develop and deploy large-scale FMs. SageMaker HyperPod not only provides the flexibility to create and use your own software stack, but also provides optimal performance through same spine placement of instances, as well as built-in resiliency. Combining the resiliency of SageMaker HyperPod and the efficiency of Ray provides a powerful framework to scale up your generative AI workloads.

In this post, we demonstrate the steps involved in running Ray jobs on SageMaker HyperPod.

Overview of Ray

This section provides a high-level overview of the Ray tools and frameworks for AI/ML workloads. We primarily focus on ML training use cases.

Ray is an open-source distributed computing framework designed to run highly scalable and parallel Python applications. Ray manages, executes, and optimizes compute needs across AI workloads. It unifies infrastructure through a single, flexible framework—enabling AI workloads from data processing, to model training, to model serving and beyond.

For distributed jobs, Ray provides intuitive tools for parallelizing and scaling ML workflows. It allows developers to focus on their training logic without the complexities of resource allocation, task scheduling, and inter-node communication.

At a high level, Ray is made up of three layers:

  • Ray Core: The foundation of Ray, providing primitives for parallel and distributed computing
  • Ray AI libraries:
    • Ray Train – A library that simplifies distributed training by offering built-in support for popular ML frameworks like PyTorch, TensorFlow, and Hugging Face
    • Ray Tune – A library for scalable hyperparameter tuning
    • Ray Serve – A library for distributed model deployment and serving
  • Ray clusters: A distributed computing platform where worker nodes run user code as Ray tasks and actors, generally in the cloud

In this post, we dive deep into running Ray clusters on SageMaker HyperPod. A Ray cluster consists of a single head node and a number of connected worker nodes. The head node orchestrates task scheduling, resource allocation, and communication between nodes. The ray worker nodes execute the distributed workloads using Ray tasks and actors, such as model training or data preprocessing.

Ray clusters and Kubernetes clusters pair well together. By running a Ray cluster on Kubernetes using the KubeRay operator, both Ray users and Kubernetes administrators benefit from the smooth path from development to production. For this use case, we use a SageMaker HyperPod cluster orchestrated through Amazon Elastic Kubernetes Service (Amazon EKS).

The KubeRay operator enables you to run a Ray cluster on a Kubernetes cluster. KubeRay creates the following custom resource definitions (CRDs):

  • RayCluster – The primary resource for managing Ray instances on Kubernetes. The nodes in a Ray cluster manifest as pods in the Kubernetes cluster.
  • RayJob – A single executable job designed to run on an ephemeral Ray cluster. It serves as a higher-level abstraction for submitting tasks or batches of tasks to be executed by the Ray cluster. A RayJob also manages the lifecycle of the Ray cluster, making it ephemeral by automatically spinning up the cluster when the job is submitted and shutting it down when the job is complete.
  • RayService – A Ray cluster and a Serve application that runs on top of it into a single Kubernetes manifest. It allows for the deployment of Ray applications that need to be exposed for external communication, typically through a service endpoint.

For the remainder of this post, we don’t focus on RayJob or RayService; we focus on creating a persistent Ray cluster to run distributed ML training jobs.

When Ray clusters are paired with SageMaker HyperPod clusters, Ray clusters unlock enhanced resiliency and auto-resume capabilities, which we will dive deeper into later in this post. This combination provides a solution for handling dynamic workloads, maintaining high availability, and providing seamless recovery from node failures, which is crucial for long-running jobs.

Overview of SageMaker HyperPod

In this section, we introduce SageMaker HyperPod and its built-in resiliency features to provide infrastructure stability.

Generative AI workloads such as training, inference, and fine-tuning involve building, maintaining, and optimizing large clusters of thousands of GPU accelerated instances. For distributed training, the goal is to efficiently parallelize workloads across these instances in order to maximize cluster utilization and minimize time to train. For large-scale inference, it’s important to minimize latency, maximize throughput, and seamlessly scale across those instances for the best user experience. SageMaker HyperPod is a purpose-built infrastructure to address these needs. It removes the undifferentiated heavy lifting involved in building, maintaining, and optimizing a large GPU accelerated cluster. It also provides flexibility to fully customize your training or inference environment and compose your own software stack. You can use either Slurm or Amazon EKS for orchestration with SageMaker HyperPod.

Due to their massive size and the need to train on large amounts of data, FMs are often trained and deployed on large compute clusters composed of thousands of AI accelerators such as GPUs and AWS Trainium. A single failure in one of these thousand accelerators can interrupt the entire training process, requiring manual intervention to identify, isolate, debug, repair, and recover the faulty node in the cluster. This workflow can take several hours for each failure and as the scale of the cluster grows, it’s common to see a failure every few days or even every few hours. SageMaker HyperPod provides resiliency against infrastructure failures by applying agents that continuously run health checks on cluster instances, fix the bad instances, reload the last valid checkpoint, and resume the training—without user intervention. As a result, you can train your models up to 40% faster. You can also SSH into an instance in the cluster for debugging and gather insights on hardware-level optimization during multi-node training. Orchestrators like Slurm or Amazon EKS facilitate efficient allocation and management of resources, provide optimal job scheduling, monitor resource utilization, and automate fault tolerance.

Solution overview

This section provides an overview of how to run Ray jobs for multi-node distributed training on SageMaker HyperPod. We go over the architecture and the process of creating a SageMaker HyperPod cluster, installing the KubeRay operator, and deploying a Ray training job.

Although this post provides a step-by-step guide to manually create the cluster, feel free to check out the aws-do-ray project, which aims to simplify the deployment and scaling of distributed Python application using Ray on Amazon EKS or SageMaker HyperPod. It uses Docker to containerize the tools necessary to deploy and manage Ray clusters, jobs, and services. In addition to the aws-do-ray project, we’d like to highlight the Amazon SageMaker Hyperpod EKS workshop, which offers an end-to-end experience for running various workloads on SageMaker Hyperpod clusters. There are multiple examples of training and inference workloads from the GitHub repository awsome-distributed-training.

As introduced earlier in this post, KubeRay simplifies the deployment and management of Ray applications on Kubernetes. The following diagram illustrates the solution architecture.

SMHP EKS Architecture

Create a SageMaker HyperPod cluster

Prerequisites

Before deploying Ray on SageMaker HyperPod, you need a HyperPod cluster:

If you prefer to deploy HyperPod on an existing EKS cluster, please follow the instructions here which include:

  • EKS cluster – You can associate SageMaker HyperPod compute to an existing EKS cluster that satisfies the set of prerequisites. Alternatively and recommended, you can deploy a ready-made EKS cluster with a single AWS CloudFormation template. Refer to the GitHub repo for instructions on setting up an EKS cluster.
  • Custom resources – Running multi-node distributed training requires various resources, such as device plugins, Container Storage Interface (CSI) drivers, and training operators, to be pre-deployed on the EKS cluster. You also need to deploy additional resources for the health monitoring agent and deep health check. HyperPodHelmCharts simplify the process using Helm, one of most commonly used package mangers for Kubernetes. Refer to Install packages on the Amazon EKS cluster using Helm for installation instructions.

The following provide an example workflow for creating a HyperPod cluster on an existing EKS Cluster after deploying prerequisites. This is for reference only and not required for the quick deploy option.

cat > cluster-config.json << EOL
{
    "ClusterName": "ml-cluster",
    "Orchestrator": {
        "Eks": {
            "ClusterArn": "${EKS_CLUSTER_ARN}"
        }
    },
    "InstanceGroups": [
        {
            "InstanceGroupName": "worker-group-1",
            "InstanceType": "ml.p5.48xlarge",
            "InstanceCount": 4,
            "LifeCycleConfig": {
                "SourceS3Uri": "s3://amzn-s3-demo-bucket",
                "OnCreate": "on_create.sh"
            },
            "ExecutionRole": "${EXECUTION_ROLE}",
            "ThreadsPerCore": 1,
            "OnStartDeepHealthChecks": [
                "InstanceStress",
                "InstanceConnectivity"
            ]
        },
        {
            "InstanceGroupName": "head-group",
            "InstanceType": "ml.m5.2xlarge",
            "InstanceCount": 1,
            "LifeCycleConfig": {
                "SourceS3Uri": "s3://amzn-s3-demo-bucket",
                "OnCreate": "on_create.sh"
            },
            "ExecutionRole": "${EXECUTION_ROLE}",
            "ThreadsPerCore": 1,
        }
    ],
    "VpcConfig": {
        "SecurityGroupIds": [
            "${SECURITY_GROUP_ID}"
        ],
        "Subnets": [
            "${SUBNET_ID}"
        ]
    },
    "NodeRecovery": "Automatic"
}
EOL

The provided configuration file contains two key highlights:

  • “OnStartDeepHealthChecks”: [“InstanceStress”, “InstanceConnectivity”] – Instructs SageMaker HyperPod to conduct a deep health check whenever new GPU or Trainium instances are added
  • “NodeRecovery”: “Automatic” – Enables SageMaker HyperPod automated node recovery

You can create a SageMaker HyperPod compute with the following AWS Command Line Interface (AWS CLI) command (AWS CLI version 2.17.47 or newer is required):

aws sagemaker create-cluster 
    --cli-input-json file://cluster-config.json
{
"ClusterArn": "arn:aws:sagemaker:us-east-2:xxxxxxxxxx:cluster/wccy5z4n4m49"
}

To verify the cluster status, you can use the following command:

aws sagemaker list-clusters --output table

This command displays the cluster details, including the cluster name, status, and creation time:

------------------------------------------------------------------------------------------------------------------------------------------------------
|                                                                    ListClusters                                                                    |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
||                                                                 ClusterSummaries                                                                 ||
|+----------------------------------------------------------------+---------------------------+----------------+------------------------------------+|
||                           ClusterArn                           |        ClusterName        | ClusterStatus  |           CreationTime             ||
|+----------------------------------------------------------------+---------------------------+----------------+------------------------------------+|
||  arn:aws:sagemaker:us-west-2:xxxxxxxxxxxx:cluster/zsmyi57puczf |         ml-cluster        |   InService     |  2025-03-03T16:45:05.320000+00:00  ||
|+----------------------------------------------------------------+---------------------------+----------------+------------------------------------+|

Alternatively, you can verify the cluster status on the SageMaker console. After a brief period, you can observe that the status for the nodes transitions to Running.

Create an FSx for Lustre shared file system

For us to deploy the Ray cluster, we need the SageMaker HyperPod cluster to be up and running, and additionally we need a shared storage volume (for example, an Amazon FSx for Lustre file system). This is a shared file system that the SageMaker HyperPod nodes can access. This file system can be provisioned statically before launching your SageMaker HyperPod cluster or dynamically afterwards.

Specifying a shared storage location (such as cloud storage or NFS) is optional for single-node clusters, but it is required for multi-node clusters. Using a local path will raise an error during checkpointing for multi-node clusters.

The Amazon FSx for Lustre CSI driver uses IAM roles for service accounts (IRSA) to authenticate AWS API calls. To use IRSA, an IAM OpenID Connect (OIDC) provider needs to be associated with the OIDC issuer URL that comes provisioned your EKS cluster.

Create an IAM OIDC identity provider for your cluster with the following command:

eksctl utils associate-iam-oidc-provider --cluster $EKS_CLUSTER_NAME --approve

Deploy the FSx for Lustre CSI driver:

helm repo add aws-fsx-csi-driver https://kubernetes-sigs.github.io/aws-fsx-csi-driver
helm repo update
helm upgrade --install aws-fsx-csi-driver aws-fsx-csi-driver/aws-fsx-csi-driver
  --namespace kube-system 

This Helm chart includes a service account named fsx-csi-controller-sa that gets deployed in the kube-system namespace.

Use the eksctl CLI to create an AWS Identity and Access Management (IAM) role bound to the service account used by the driver, attaching the AmazonFSxFullAccess AWS managed policy:

eksctl create iamserviceaccount 
  --name fsx-csi-controller-sa 
  --override-existing-serviceaccounts 
  --namespace kube-system 
  --cluster $EKS_CLUSTER_NAME 
  --attach-policy-arn arn:aws:iam::aws:policy/AmazonFSxFullAccess 
  --approve 
  --role-name AmazonEKSFSxLustreCSIDriverFullAccess 
  --region $AWS_REGION

The --override-existing-serviceaccounts flag lets eksctl know that the fsx-csi-controller-sa service account already exists on the EKS cluster, so it skips creating a new one and updates the metadata of the current service account instead.

Annotate the driver’s service account with the Amazon Resource Name (ARN) of the AmazonEKSFSxLustreCSIDriverFullAccess IAM role that was created:

SA_ROLE_ARN=$(aws iam get-role --role-name AmazonEKSFSxLustreCSIDriverFullAccess --query 'Role.Arn' --output text)

kubectl annotate serviceaccount -n kube-system fsx-csi-controller-sa 
  eks.amazonaws.com/role-arn=${SA_ROLE_ARN} --overwrite=true

This annotation lets the driver know what IAM role it should use to interact with the FSx for Lustre service on your behalf.

Verify that the service account has been properly annotated:

kubectl get serviceaccount -n kube-system fsx-csi-controller-sa -o yaml

Restart the fsx-csi-controller deployment for the changes to take effect:

kubectl rollout restart deployment fsx-csi-controller -n kube-system

The FSx for Lustre CSI driver presents you with two options for provisioning a file system:

  • Dynamic provisioning – This option uses Persistent Volume Claims (PVCs) in Kubernetes. You define a PVC with desired storage specifications. The CSI driver automatically provisions the FSx for Lustre file system for you based on the PVC request. This allows for straightforward scaling and eliminates the need to manually create file systems.
  • Static provisioning – In this method, you manually create the FSx for Lustre file system before using the CSI driver. You will need to configure details like subnet ID and security groups for the file system. Then, you can use the driver to mount this pre-created file system within your container as a volume.

For this example, we use dynamic provisioning. Start by creating a storage class that uses the fsx.csi.aws.com provisioner:

cat <<EOF > storageclass.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: fsx-sc
provisioner: fsx.csi.aws.com
parameters:
  subnetId: ${SUBNET_ID}
  securityGroupIds: ${SECURITYGROUP_ID}
  deploymentType: PERSISTENT_2
  automaticBackupRetentionDays: "0"
  copyTagsToBackups: "true"
  perUnitStorageThroughput: "250"
  dataCompressionType: "LZ4"
  fileSystemTypeVersion: "2.12"
mountOptions:
  - flock
EOF

kubectl apply -f storageclass.yaml
  • SUBNET_ID: The subnet ID that the FSx for Lustre filesystem. Should be the same private subnet that was used for HyperPod creation.
  • SECURITYGROUP_ID: The security group IDs that will be attached to the file system. Should be the same Security Group ID that is used in HyperPod and EKS.

Next, create a PVC that uses the fsx-claim storage claim:

cat <<EOF > pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fsx-claim
spec:
  accessModes:
  - ReadWriteMany
  storageClassName: fsx-sc
  resources:
    requests:
      storage: 1200Gi
EOF

kubectl apply -f pvc.yaml

This PVC will start the dynamic provisioning of an FSx for Lustre file system based on the specifications provided in the storage class.

Create the Ray cluster

Now that we have both the SageMaker HyperPod cluster and the FSx for Lustre file system created, we can set up the Ray cluster:

  1. Set up dependencies. We will create a new namespace in our Kubernetes cluster and install the KubeRay operator using a Helm chart.

We recommend using KubeRay operator version 1.2.0 or higher, which supports automatic Ray Pod eviction and replacement in case of failures (for example, hardware issues on EKS or SageMaker HyperPod nodes).

# Create KubeRay namespace
kubectl create namespace kuberay
# Deploy the KubeRay operator with the Helm chart repository
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
#Install both CRDs and Kuberay operator v1.2.0
helm install kuberay-operator kuberay/kuberay-operator --version 1.2.0 --namespace kuberay
# Kuberay operator pod will be deployed onto head pod
kubectl get pods --namespace kuberay
  1. Create a Ray Container Image for the Ray Cluster manifest. With the recent deprecation of the `rayproject/ray-ml` images starting from Ray version 2.31.0, it’s necessary to create a custom container image for our Ray cluster. Therefore, we will build on top of the `rayproject/ray:2.42.1-py310-gpu` image, which has all necessary Ray dependencies, and include our training dependencies to build our own custom image. Please feel free to modify this Dockerfile as you wish.

First, create a Dockerfile that builds upon the base Ray GPU image and includes only the necessary dependencies:

cat <<EOF > Dockerfile
 
FROM rayproject/ray:2.42.1-py310-gpu
# Install Python dependencies for PyTorch, Ray, Hugging Face, and more
RUN pip install --no-cache-dir 
    torch torchvision torchaudio 
    numpy 
    pytorch-lightning 
    transformers datasets evaluate tqdm click 
    ray[train] ray[air] 
    ray[train-torch] ray[train-lightning] 
    torchdata 
    torchmetrics 
    torch_optimizer 
    accelerate 
    scikit-learn 
    Pillow==9.5.0 
    protobuf==3.20.3
 
RUN pip install --upgrade datasets transformers
 
# Set the user
USER ray
WORKDIR /home/ray
 
# Verify ray installation
RUN which ray && 
    ray –-version
  
# Default command
CMD [ "/bin/bash" ]
 
EOF

Then, build and push the image to your container registry (Amazon ECR) using the provided script:

export AWS_REGION=$(aws configure get region)
export ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
export REGISTRY=${ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com/
 
echo "This process may take 10-15 minutes to complete..."
 
echo "Building image..."
 
docker build --platform linux/amd64 -t ${REGISTRY}aws-ray-custom:latest .
 
# Create registry if needed
REGISTRY_COUNT=$(aws ecr describe-repositories | grep "aws-ray-custom" | wc -l)
if [ "$REGISTRY_COUNT" == "0" ]; then
    aws ecr create-repository --repository-name aws-ray-custom
fi
 
# Login to registry
echo "Logging in to $REGISTRY ..."
aws ecr get-login-password --region $AWS_REGION| docker login --username AWS --password-stdin $REGISTRY
 
echo "Pushing image to $REGISTRY ..."
 
# Push image to registry
docker image push ${REGISTRY}aws-ray-custom:latest 

Now, our Ray container image is in Amazon ECR with all necessary Ray dependencies, as well as code library dependencies.

  1. Create a Ray cluster manifest. We use a Ray cluster to host our training jobs. The Ray cluster is the primary resource for managing Ray instances on Kubernetes. It represents a cluster of Ray nodes, including a head node and multiple worker nodes. The Ray cluster CRD determines how the Ray nodes are set up, how they communicate, and how resources are allocated among them. The nodes in a Ray cluster manifest as pods in the EKS or SageMaker HyperPod cluster.

Note that there are two distinct sections in the cluster manifest. While the `headGroupSpec` defines the head node of the Ray Cluster, the `workerGroupSpecs` define the worker nodes of the Ray Cluster. While a job could technically run on the Head node as well, it is common to separate the head node from the actual worker nodes where jobs are executed. Therefore, the instance for the head node can typically be a smaller instance (i.e. we chose a m5.2xlarge). Since the head node also manages cluster-level metadata, it can be beneficial to have it run on a non-GPU node to minimize the risk of node failure (as GPU can be a potential source of node failure).

cat <<'EOF' > raycluster.yaml
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
  name: rayml
  labels:
    controller-tools.k8s.io: "1.0"
spec:
  # Ray head pod template
  headGroupSpec:
    # The `rayStartParams` are used to configure the `ray start` command.
    # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
    # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
    rayStartParams:
      dashboard-host: '0.0.0.0'
    #pod template
    template:
      spec:
        #        nodeSelector:  
        #node.kubernetes.io/instance-type: "ml.m5.2xlarge"
        securityContext:
          runAsUser: 0
          runAsGroup: 0
          fsGroup: 0
        containers:
        - name: ray-head
          image: ${REGISTRY}aws-ray-custom:latest     ## IMAGE: Here you may choose which image your head pod will run
          env:                                ## ENV: Here is where you can send stuff to the head pod
            - name: RAY_GRAFANA_IFRAME_HOST   ## PROMETHEUS AND GRAFANA
              value: http://localhost:3000
            - name: RAY_GRAFANA_HOST
              value: http://prometheus-grafana.prometheus-system.svc:80
            - name: RAY_PROMETHEUS_HOST
              value: http://prometheus-kube-prometheus-prometheus.prometheus-system.svc:9090
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh","-c","ray stop"]
          resources:
            limits:                                    ## LIMITS: Set resource limits for your head pod
              cpu: 1
              memory: 8Gi
            requests:                                    ## REQUESTS: Set resource requests for your head pod
              cpu: 1
              memory: 8Gi
          ports:
          - containerPort: 6379
            name: gcs-server
          - containerPort: 8265 # Ray dashboard
            name: dashboard
          - containerPort: 10001
            name: client
          - containerPort: 8000
            name: serve
          volumeMounts:                                    ## VOLUMEMOUNTS
          - name: fsx-storage
            mountPath: /fsx
          - name: ray-logs
            mountPath: /tmp/ray
        volumes:
          - name: ray-logs
            emptyDir: {}
          - name: fsx-storage
            persistentVolumeClaim:
              claimName: fsx-claim
  workerGroupSpecs:
  # the pod replicas in this group typed worker
  - replicas: 4                                    ## REPLICAS: How many worker pods you want 
    minReplicas: 1
    maxReplicas: 10
    # logical group name, for this called small-group, also can be functional
    groupName: gpu-group
    rayStartParams:
      num-gpus: "8"
    #pod template
    template:
      spec:
        #nodeSelector:
        # node.kubernetes.io/instance-type: "ml.p5.48xlarge"
        securityContext:
          runAsUser: 0
          runAsGroup: 0
          fsGroup: 0
        containers:
        - name: ray-worker
          image: ${REGISTRY}aws-ray-custom:latest             ## IMAGE: Here you may choose which image your head node will run
          env:
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh","-c","ray stop"]
          resources:
            limits:                                    ## LIMITS: Set resource limits for your worker pods
              nvidia.com/gpu: 8
              #vpc.amazonaws.com/efa: 32  
            requests:                                    ## REQUESTS: Set resource requests for your worker pods
              nvidia.com/gpu: 8
              #vpc.amazonaws.com/efa: 32
          volumeMounts:                                    ## VOLUMEMOUNTS
          - name: ray-logs
            mountPath: /tmp/ray
          - name: fsx-storage
            mountPath: /fsx
        volumes:
        - name: fsx-storage
          persistentVolumeClaim:
            claimName: fsx-claim
        - name: ray-logs
          emptyDir: {}
EOF
  1. Deploy the Ray cluster:
envsubst < raycluster.yaml | kubectl apply -f -
  1. Optionally, expose the Ray dashboard using port forwarding:
# Gets name of kubectl service that runs the head pod
export SERVICEHEAD=$(kubectl get service | grep head-svc | awk '{print $1}' | head -n 1)
# Port forwards the dashboard from the head pod service
kubectl port-forward --address 0.0.0.0 service/${SERVICEHEAD} 8265:8265 > /dev/null 2>&1 &

Now, you can visit http://localhost:8265/ to visit the Ray Dashboard.

  1. To launch a training job, there are a few options:
    1. Use the Ray jobs submission SDK, where you can submit jobs to the Ray cluster through the Ray dashboard port (8265 by default) where Ray listens for job requests. To learn more, see Quickstart using the Ray Jobs CLI.
    2. Execute a Ray job in the head pod where you exec directly into the head pod and then submit your job. To learn more, see RayCluster Quickstart.

For this example, we use the first method and submit the job through the SDK. Therefore, we simply run from a local environment where the training code is available in --working-dir. Relative to this path, we specify the main training Python script located at --train.py
Within the working-dir folder, we can also include additional scripts we might need to run the training.

The fsdp-ray.py example is located in aws-do-ray/Container-Root/ray/raycluster/jobs/fsdp-ray/fsdp-ray.py in the aws-do-ray GitHub repo.

# Within jobs/ folder
ray job submit --address http://localhost:8265 --working-dir "fsdp-ray" -- python3 fsdp-ray.py

For our Python training script to run, we need to make sure our training scripts are correctly set up to use Ray. This includes the following steps:

  • Configure a model to run distributed and on the correct CPU/GPU device
  • Configure a data loader to shard data across the workers and place data on the correct CPU or GPU device
  • Configure a training function to report metrics and save checkpoints
  • Configure scaling and CPU or GPU resource requirements for a training job
  • Launch a distributed training job with a TorchTrainer class

For further details on how to adjust your existing training script to get the most out of Ray, refer to the Ray documentation.

The following diagram illustrates the complete architecture you have built after completing these steps.

Ray on Hyperpod EKS Architecture

Implement training job resiliency with the job auto resume functionality

Ray is designed with robust fault tolerance mechanisms to provide resilience in distributed systems where failures are inevitable. These failures generally fall into two categories: application-level failures, which stem from bugs in user code or external system issues, and system-level failures, caused by node crashes, network disruptions, or internal bugs in Ray. To address these challenges, Ray provides tools and strategies that enable applications to detect, recover, and adapt seamlessly, providing reliability and performance in distributed environments. In this section, we look at two of the most common types of failures, and how to implement fault tolerance in them that SageMaker HyperPod compliments: Ray Train worker failures and Ray worker node failures.

  • Ray Train worker – This is a worker process specifically used for training tasks within Ray Train, Ray’s distributed training library. These workers handle individual tasks or shards of a distributed training job. Each worker is responsible for processing a portion of the data, training a subset of the model, or performing computation during distributed training. They are coordinated by the Ray Train orchestration logic to collectively train a model.
  • Ray worker node – At the Ray level, this is a Ray node in a Ray cluster. It’s part of the Ray cluster infrastructure and is responsible for running tasks, actors, and other processes as orchestrated by the Ray head node. Each worker node can host multiple Ray processes that execute tasks or manage distributed objects. At the Kubernetes level, a Ray worker node is a Kubernetes pod that is managed by a KubeRay operator. For this post, we will be talking about the Ray worker nodes at the Kubernetes level, so we will refer to them as pods.

At the time of writing, there are no official updates regarding head pod fault tolerance and auto resume capabilities. Though head pod failures are rare, in the unlikely event of such a failure, you will need to manually restart your training job. However, you can still resume progress from the last saved checkpoint. To minimize the risk of hardware-related head pod failures, it’s advised to place the head pod on a dedicated, CPU-only SageMaker HyperPod node, because GPU failures are a common training job failure point.

Ray Train worker failures

Ray Train is designed with fault tolerance to handle worker failures, such as RayActorErrors. When a failure occurs, the affected workers are stopped, and new ones are automatically started to maintain operations. However, for training progress to continue seamlessly after a failure, saving and loading checkpoints is essential. Without proper checkpointing, the training script will restart, but all progress will be lost. Checkpointing is therefore a critical component of Ray Train’s fault tolerance mechanism and needs to be implemented in your code.

Automatic recovery

When a failure is detected, Ray shuts down failed workers and provisions new ones. Although this happens, we can tell the training function to always keep retrying until training can continue. Each instance of recovery from a worker failure is considered a retry. We can set the number of retries through the max_failures attribute of the FailureConfig, which is set in the RunConfig passed to the Trainer (for example, TorchTrainer). See the following code:

from ray.train import RunConfig, FailureConfig
# Tries to recover a run up to this many times.
run_config = RunConfig(failure_config=FailureConfig(max_failures=2))
# No limit on the number of retries.
run_config = RunConfig(failure_config=FailureConfig(max_failures=-1))

For more information, see Handling Failures and Node Preemption.

Checkpoints

A checkpoint in Ray Train is a lightweight interface representing a directory stored either locally or remotely. For example, a cloud-based checkpoint might point to s3://my-bucket/checkpoint-dir, and a local checkpoint might point to /tmp/checkpoint-dir. To learn more, see Saving checkpoints during training.

To save a checkpoint in the training loop, you first need to write your checkpoint to a local directory, which can be temporary. When saving, you can use checkpoint utilities from other frameworks like torch.save, pl.Trainer.save_checkpoint, accelerator.save_model, save_pretrained, tf.keras.Model.save, and more. Then you create a checkpoint from the directory using Checkpoint.from_directory. Finally, report the checkpoint to Ray Train using ray.train.report(metrics, checkpoint=...). The metrics reported alongside the checkpoint are used to keep track of the best-performing checkpoints. Reporting will upload the checkpoint to persistent storage.

If you save checkpoints with ray.train.report(..., checkpoint=...) and run on a multi-node cluster, Ray Train will raise an error if NFS or cloud storage is not set up. This is because Ray Train expects all workers to be able to write the checkpoint to the same persistent storage location.

Finally, clean up the local temporary directory to free up disk space (for example, by exiting the tempfile.TemporaryDirectory context). We can save a checkpoint every epoch or every few iterations.

The following diagram illustrates this setup.

Ray Checkpointing Architecture

The following code is an example of saving checkpoints using native PyTorch:

import os
import tempfile

import numpy as np
import torch
import torch.nn as nn
from torch.optim import Adam

import ray.train.torch
from ray import train
from ray.train import Checkpoint, ScalingConfig
from ray.train.torch import TorchTrainer


def train_func(config):
    n = 100
    # create a toy dataset
    # data   : X - dim = (n, 4)
    # target : Y - dim = (n, 1)
    X = torch.Tensor(np.random.normal(0, 1, size=(n, 4)))
    Y = torch.Tensor(np.random.uniform(0, 1, size=(n, 1)))
    # toy neural network : 1-layer
    # Wrap the model in DDP
    model = ray.train.torch.prepare_model(nn.Linear(4, 1))
    criterion = nn.MSELoss()

    optimizer = Adam(model.parameters(), lr=3e-4)
    for epoch in range(config["num_epochs"]):
        y = model.forward(X)
        loss = criterion(y, Y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        metrics = {"loss": loss.item()}

        with tempfile.TemporaryDirectory() as temp_checkpoint_dir:
            checkpoint = None

            should_checkpoint = epoch % config.get("checkpoint_freq", 1) == 0
            # In standard DDP training, where the model is the same across all ranks,
            # only the global rank 0 worker needs to save and report the checkpoint
            if train.get_context().get_world_rank() == 0 and should_checkpoint:
                torch.save(
                    model.module.state_dict(),  # NOTE: Unwrap the model.
                    os.path.join(temp_checkpoint_dir, "model.pt"),
                )
                checkpoint = Checkpoint.from_directory(temp_checkpoint_dir)

            train.report(metrics, checkpoint=checkpoint)


trainer = TorchTrainer(
    train_func,
    train_loop_config={"num_epochs": 5},
    scaling_config=ScalingConfig(num_workers=2),
)
result = trainer.fit()

Ray Train also comes with CheckpointConfig, a way to configure checkpointing options:

from ray.train import RunConfig, CheckpointConfig
# Example 1: Only keep the 2 *most recent* checkpoints and delete the others.
run_config = RunConfig(checkpoint_config=CheckpointConfig(num_to_keep=2))
# Example 2: Only keep the 2 *best* checkpoints and delete the others.
run_config = RunConfig(
    checkpoint_config=CheckpointConfig(
        num_to_keep=2,
        # *Best* checkpoints are determined by these params:
        checkpoint_score_attribute="mean_accuracy",
        checkpoint_score_order="max",
    ),
    # This will store checkpoints on S3.
    storage_path="s3://remote-bucket/location",
)

To restore training state from a checkpoint if your training job were to fail and retry, you should modify your training loop to auto resume and then restore a Ray Train job. By pointing to the path of your saved checkpoints, you can restore your trainer and continue training. Here’s a quick example:

from ray.train.torch import TorchTrainer

restored_trainer = TorchTrainer.restore(
    path="~/ray_results/dl_trainer_restore",  # Can also be a cloud storage path like S3
    datasets=get_datasets(),
)
result = restored_trainer.fit()

To streamline restoration, you can add auto resume logic to your script. This checks if a valid experiment directory exists and restores the trainer if available. If not, it starts a new experiment:

experiment_path = "~/ray_results/dl_restore_autoresume"
if TorchTrainer.can_restore(experiment_path):
    trainer = TorchTrainer.restore(experiment_path, datasets=get_datasets())
else:
    trainer = TorchTrainer(
        train_loop_per_worker=train_loop_per_worker,
        datasets=get_datasets(),
        scaling_config=train.ScalingConfig(num_workers=2),
        run_config=train.RunConfig(
            storage_path="~/ray_results",
            name="dl_restore_autoresume",
        ),
    )
result = trainer.fit()

To summarize, to provide fault tolerance and auto resume when using Ray Train libraries, set your max_failures parameter in the FailureConfig (we recommend setting it to -1 to make sure it will keep retrying until the SageMaker HyperPod node is rebooted or replaced), and make sure you have enabled checkpointing in your code.

Ray worker pod failures

In addition to the aforementioned mechanisms to recover from Ray Train worker failures, Ray also provides fault tolerance at the worker pod level. When a worker pod fails (this includes scenarios in which the raylet process fails), the running tasks and actors on it will fail and the objects owned by worker processes of this pod will be lost. In this case, the tasks, actors, and objects fault tolerance mechanisms will start and try to recover the failures using other worker pods.

These mechanisms will be implicitly handled by the Ray Train library. To learn more about the underlying fault tolerance at the tasks, actors, objects (implemented at the Ray Core level), see Fault Tolerance.

In practice, this means that in case of a worker pod failure, the following occurs:

  • If there is a free worker pod in the Ray cluster, Ray will recover the failed worker pod by replacing it with the free worker pod.
  • If there is no free worker pod, but in the underlying SageMaker HyperPod cluster there are free SageMaker HyperPod nodes, Ray will schedule a new worker pod onto one of the free SageMaker HyperPod nodes. This pod will join the running Ray cluster and the failure will be recovered using this new worker pod.

In the context of KubeRay, Ray worker nodes are represented by Kubernetes pods, and failures at this level can include issues such as pod eviction or preemption caused by software-level factors.

However, another critical scenario to consider is hardware failures. If the underlying SageMaker HyperPod node becomes unavailable due to a hardware issue, such as a GPU error, it would inevitably cause the Ray worker pod running on that node to fail as well. Now the fault tolerance and auto-healing mechanisms of your SageMaker HyperPod cluster start and will reboot or replace the faulty node. After the new healthy node is added into the SageMaker HyperPod cluster, Ray will schedule a new worker pod onto the SageMaker HyperPod node and recover the interrupted training. In this case, both the Ray fault tolerance mechanism and the SageMaker HyperPod resiliency features work together seamlessly and make sure that even in case of a hardware failure, your ML training workload can auto resume and pick up from where it was interrupted.

As you have seen, there are various built-in resiliency and fault-tolerance mechanisms that allow your Ray Train workload on SageMaker HyperPod to recover and auto resume. Because these mechanisms will essentially recover by restarting the training job, it is crucial that checkpointing is implemented in the training script. It is also generally advised to save the checkpoints on a shared and persistent path, such as an Amazon Simple Storage Service (Amazon S3) bucket or FSx for Lustre file system.

Clean up

To delete your SageMaker HyperPod cluster created in this post, you can either use the SageMaker AI console or use the following AWS CLI command:

aws sagemaker delete-cluster --cluster-name <cluster_name>

Cluster deletion will take a few minutes. You can confirm successful deletion after you see no clusters on the SageMaker AI console.

If you used the CloudFormation stack to create resources, you can delete it using the following command:

aws cloudformation delete-stack --stack-name <stack_name>

Conclusion

This post demonstrated how to set up and deploy Ray clusters on SageMaker HyperPod, highlighting key considerations such as storage configuration and fault tolerance and auto resume mechanisms.

Running Ray jobs on SageMaker HyperPod offers a powerful solution for distributed AI/ML workloads, combining the flexibility of Ray with the robust infrastructure of SageMaker HyperPod. This integration provides enhanced resiliency and auto resume capabilities, which are crucial for long-running and resource-intensive tasks. By using Ray’s distributed computing framework and the built-in features of SageMaker HyperPod, you can efficiently manage complex ML workflows, specifically training workloads as covered in this post. As AI/ML workloads continue to grow in scale and complexity, the combination of Ray and SageMaker HyperPod offers a scalable, resilient, and efficient platform for tackling the most demanding computational challenges in machine learning.

To get started with SageMaker HyperPod, refer to the Amazon EKS Support in Amazon SageMaker HyperPod workshop and the Amazon SageMaker HyperPod Developer Guide. To learn more about the aws-do-ray framework, refer to the GitHub repo.


About the Authors

mvincigMark Vinciguerra is an Associate Specialist Solutions Architect at Amazon Web Services (AWS) based in New York. He focuses on the Automotive and Manufacturing sector, specializing in helping organizations architect, optimize, and scale artificial intelligence and machine learning solutions, with particular expertise in autonomous vehicle technologies. Prior to AWS, he went to Boston University and graduated with a degree in Computer Engineering.

flostahlFlorian Stahl is a Worldwide Specialist Solutions Architect at AWS, based in Hamburg, Germany. He specializes in Artificial Intelligence, Machine Learning, and Generative AI solutions, helping customers optimize and scale their AI/ML workloads on AWS. With a background as a Data Scientist, Florian focuses on working with customers in the Autonomous Vehicle space, bringing deep technical expertise to help organizations design and implement sophisticated machine learning solutions. He works closely with customers worldwide to transform their AI initiatives and maximize the value of their machine learning investments on AWS.

anoopxAnoop Saha is a Sr GTM Specialist at Amazon Web Services (AWS) focusing on Gen AI model training and inference. He is partnering with top foundation model builders, strategic customers, and AWS service teams to enable distributed training and inference at scale on AWS and lead joint GTM motions. Before AWS, Anoop has held several leadership roles at startups and large corporations, primarily focusing on silicon and system architecture of AI infrastructure.

iankoulsAlex Iankoulski is a Principal Solutions Architect, ML/AI Frameworks, who focuses on helping customers orchestrate their AI workloads using containers and accelerated computing infrastructure on AWS. He is also the author of the open source do framework and a Docker captain who loves applying container technologies to accelerate the pace of innovation while solving the world’s biggest challenges.

Read More

Using Large Language Models on Amazon Bedrock for multi-step task execution

Using Large Language Models on Amazon Bedrock for multi-step task execution

The goal of this blog post is to show you how a large language model (LLM) can be used to perform tasks that require multi-step dynamic reasoning and execution. Examples of tasks that require dynamic reasoning and execution are answering questions of the form “What is the average length of stay for patients with [specific condition] across different hospitals?” or “How do prescription trends for [specific drug] vary across different regions?” Traditionally, answering these queries required the expertise of business intelligence specialists and data engineers, often resulting in time-consuming processes and potential bottlenecks.

However, the advances in LLMs have opened up exciting new possibilities to break down complex tasks into a series of steps, use tools to complete each step, and produce a final solution.

In this context, the term tools refer to external capabilities or APIs that the model can access and interact with to extend its functionality beyond text-based responses. These tools allow LLMs to perform specialized tasks such as retrieving real-time information, running code, browsing the web, or generating images. By using tools, LLMs can offer more accurate, context-aware, and actionable outputs, enabling them to effectively assist with complex queries that require access to data or functions outside their internal knowledge base.

In this post, we showcase an example of such capabilities for a patient record retrieval solution built on top of an existing dataset by relying on APIs only instead of text-to-SQL. You can find the code for this post on GitHub .

Overview of solution

The goal of the solution is to accurately answer analytical questions that require multi-step reasoning and execution. The following is an example of an expected interaction with the agent.

User: Give me the first and last name of the patient with least vaccines and their vaccine count.

AI: Based on the results of our analysis, I can answer your question: The patient with the least number of vaccines is Sharleen176 Kulas532, and their vaccine count is

Here’s a breakdown of how we arrived at this answer:

1. We first retrieved patients and immunization records.

2. We then grouped the immunization records by patient_id and counted the number of vaccines for each patient.

3. We ordered this list by the vaccine count in ascending order.

4. We limited the result to the top 1 (the patient with the least vaccines).

5. We joined this result with the patient information to get the first and last name.

6. Finally, we selected only the relevant information (first name, last name, and vaccine count).

The result shows that Sharleen176 Kulas532 has received only 1 vaccine, which is the lowest among patients in the database.

For this solution, we use the Synthetic Patient Generation dataset, which contains multiple tables of healthcare industry related synthetic data.

You can set up the dataset at the project’s root folder using the following steps:

curl https://synthetichealth.github.io/synthea-sample-data/downloads/synthea_sample_data_csv_apr2020.zip > dataset.zip

unzip dataset.zip

mv csv dataset

The solution is made from two core steps: plan and execute. In its simplest form, it can be represented by the following diagram.


Fig 1: Simple execution flow – solution overview

In a more complex scheme, you can add multiple layers of validation and provide relevant APIs to increase the success rate of the LLM.

Fig 2: Complex execution flow – solution overview

Plan

In the Plan stage, the LLM is given a set of predefined API function signatures along with a brief description of what each function does. These function signatures act as tools that the LLM can use to formulate a plan to answer a user’s query. The goal is to have the LLM reason through the steps required to arrive at the answer, much like a human would.

Why the plan stage is important

The Plan stage is critical because it allows the LLM to create a structured, logical sequence of actions that will be executed in the next stage. By planning, the LLM can break down a complex question into manageable steps, making sure that the right APIs are called in the correct order. This structured approach helps to minimize errors and increases the likelihood of producing accurate results.

Providing function signatures

In this stage, the LLM is given a set of function signatures that represent the tools it can use. Each function signature includes the name of the function, the parameters it accepts, and the type of value it returns. Here’s an example of a few function signatures:

  • def get_patients() -> List[Patient]:
    Retrieves a list of patients from the dataset.
  • def get_immunization() -> List[Immunization]:
  • Retrieves a list of immunization records from the dataset.
  • def filter(list: List[object], keys: List[str], values: List[str]) > List[object]:
    Filters a given list based on specified keys and values.
  • def join(a: List, b: List, left_key: str, right_key: str, how: JoinMode) > List:
    Joins two lists based on matching keys, using a specified join mode (e.g., INNER, LEFT, RIGHT).

These function signatures act as building blocks for the LLM to generate a plan. The LLM must choose the appropriate functions and sequence them in a logical order to achieve the desired outcome.

Retrieval Augmented Generation (RAG) improves the selection process by narrowing down the tools an LLM sees based on the task, simplifying the prompt. In a project with many tools, RAG makes sure that only the most relevant tools are surfaced for a given query, reducing complexity and helping the LLM make more accurate decisions. This focused exposure enhances performance by preventing the model from being overwhelmed by irrelevant options.

Generating a plan

After the function signatures are provided, the LLM is prompted to create a plan. The plan typically consists of a series of steps, each represented as a JSON object. Each step indicates a function that needs to be executed, the parameters that need to be passed, and the expected outcome (often referred to as evidence).

For example, if the task is to find the patient with the least number of vaccines, the LLM might generate a plan that includes the following steps:

  1. Retrieve patients: Use the get_patients() function to get a list of patients.
  2. Retrieve immunization records: Use the get_immunization() function to get a list of immunizations.
  3. Group by patient: Use the group_by() function to group the immunizations by patient_id, counting the number of vaccines for each patient.
  4. Order by count: Use the order_by() function to sort the grouped list in ascending order based on the vaccine count.
  5. Limit the result: Use the limit() function to select the patient with the least vaccines.
  6. Join with patient data: Use the join() function to match the selected result with the patient’s information.
  7. Select relevant fields: Use the select() function to extract only the necessary fields, such as the patient’s first name, last name, and vaccine count.

JSON representation

The LLM outputs this plan as a structured JSON, which makes it straightforward to parse and execute in the next stage. The JSON format helps makes sure that the plan is clear, unambiguous, and ready for programmatic execution.

The following is an example of what the JSON might look like:

{
    "role": "assistant",
    "content": [
        {
            "toolUse": {
                "toolUseId": "tooluse_example_id",
                "name": "execute_plan",
                "input": {
                    "plans": [
                        {
                            "function_name": "get_patients",
                            "parameters": [],
                            "evidence_number": 1
                        },
                        {
                            "function_name": "get_immunization",
                            "parameters": [],
                            "evidence_number": 2
                        },
                        {
                            "function_name": "group_by",
                            "parameters": [
                                "list",
                                "group_key",
                                "aggregation_key",
                                "aggregation"
                            ],
                            "parameter_values": [
                                "#E2",
                                "patient_id",
                                null,
                                "COUNT"
                            ],
                            "evidence_number": 3
                        },
                        {
                            "function_name": "order_by",
                            "parameters": [
                                "list",
                                "key",
                                "value"
                            ],
                            "parameter_values": [
                                "#E3",
                                "count",
                                "ASCENDING"
                            ],
                            "evidence_number": 4
                        },
                        {
                            "function_name": "limit",
                            "parameters": [
                                "list",
                                "k"
                            ],
                            "parameter_values": [
                                "#E4",
                                1
                            ],
                            "evidence_number": 5
                        },
                        {
                            "function_name": "join",
                            "parameters": [
                                "a",
                                "b",
                                "left_key",
                                "right_key",
                                "how"
                            ],
                            "parameter_values": [
                                "#E5",
                                "#E1",
                                "patient_id",
                                "id",
                                "INNER"
                            ],
                            "evidence_number": 6
                        },
                        {
                            "function_name": "select",
                            "parameters": [
                                "list",
                                "keys"
                            ],
                            "parameter_values": [
                                "#E6",
                                [
                                    "first",
                                    "last",
                                    "count"
                                ]
                            ],
                            "evidence_number": 7
                        }
                    ]
                }
            }
        }
    ]
}

Execute

In the Execute stage, the structured plan generated by the LLM in the previous step is programmatically carried out to produce the final output. The JSON blueprint from the planning stage is parsed, and each function call described in the plan is executed sequentially.

The process begins with data retrieval, such as accessing patient records or immunization data, using predefined API functions such as get_patients() or get_immunization(). These initial function calls generate intermediate results, which are stored as evidence and referenced in subsequent steps.

The plan typically involves a series of data transformation functions, such as group_by() to aggregate data, filter() for refining results, and order_by() for data sorting. Each function is executed with specific parameters as outlined in the JSON plan, utilizing  progressive data refinement to answer the query.

As each function is executed, its output is passed to the subsequent function in the sequence. This chain of function calls culminates in a final step, often involving a select() function to extract the most relevant information, such as a patient’s name and vaccine count.

Error handling in the Execute stage is crucial for facilitating  the reliability and robustness of the entire process. As the LLM’s plan is executed, various issues can arise, including empty datasets, invalid parameters, or mismatched data types during function calls such as join() or filter(). To address these potential challenges, the system incorporates error-checking mechanisms at each step, enabling it to detect and respond to anomalies efficiently. If a function returns an unexpected result or encounters an issue, the system might provide the error back to the LLM itself, enabling it to regenerate the plan with necessary adjustments. This approach not only alleviates execution failures but also enhances the overall user experience by delivering accurate and reliable results, even in the face of unexpected challenges.

Summary

This post explores the application of LLMs in executing complex analytical queries through an API, with specific focus on Amazon Bedrock. Traditionally, business users rely on data professionals to retrieve and present data, but LLMs can now offer a streamlined approach enabling direct query responses by using predefined API tools. To illustrate this capability, we use the Synthetic Patient Generation dataset and present a solution structured around two primary phases: Plan and Execution.

In the Plan stage, the LLM is provided with API function signatures, which it uses to generate a structured, logical sequence of steps to answer the query. This plan is output as a JSON, providing clarity and facilitating seamless execution. In the Execute stage, the system programmatically carries out the plan by sequentially executing each function call. Robust error-handling mechanisms are integrated to identify potential issues and, if necessary, relay errors back to the LLM for plan regeneration.

To demonstrate this process, we present a use case where the system identifies the patient with the least number of vaccines by retrieving, grouping, and sorting data, and ultimately presenting the final result. This example showcases the LLM’s ability to extend beyond mere text-based responses, providing actionable and context-aware outputs that can significantly enhance business decision-making processes.

Conclusion

This article highlights the efficacy of LLMs in expanding their functionality to deliver practical, data-driven solutions that have the potential to revolutionize business analytics and decision-making workflows.


About the Authors

Bruno Klein is a Senior Machine Learning Engineer with AWS Professional Services Analytics Practice. He helps customers implement big data and analytics solutions. Outside of work, he enjoys spending time with family, traveling, and trying new food.

Rushabh Lokhande is a Senior Data & ML Engineer with AWS Professional Services Analytics Practice. He helps customers implement big data, machine learning, and analytics solutions. Outside of work, he enjoys spending time with family, reading, running, and playing golf.

Mohammad Arbabshirani, PhD, is a Sr. Data Science Manager at AWS Professional Services. He specializes in helping customers accelerate business outcomes on AWS through the application of machine learning and generative AI. He has 12 years of experience in full life cycle of machine learning, computer vision, and data science from sales support to end-to-end solution delivery specially in healthcare and life sciences vertical. Currently, Mohammad leads a team of data scientists, machine learning engineers, and data architects, focusing on delivery of cutting-edge ML solutions for customers. His background includes extensive research in neuroimaging and medical imaging. Outside of his professional endeavors, Mohammad enjoys tennis, soccer, and instrumental music.

Read More

Introducing AWS MCP Servers for code assistants (Part 1)

Introducing AWS MCP Servers for code assistants (Part 1)

We’re excited to announce the open source release of AWS MCP Servers for code assistants — a suite of specialized Model Context Protocol (MCP) servers that bring Amazon Web Services (AWS) best practices directly to your development workflow. Our specialized AWS MCP servers combine deep AWS knowledge with agentic AI capabilities to accelerate development across key areas. Each AWS MCP Server focuses on a specific domain of AWS best practices, working together to provide comprehensive guidance throughout your development journey.

This post is the first in a series covering AWS MCP Servers. In this post, we walk through how these specialized MCP servers can dramatically reduce your development time while incorporating security controls, cost optimizations, and AWS Well-Architected best practices into your code. Whether you’re an experienced AWS developer or just getting started with cloud development, you’ll discover how to use AI-powered coding assistants to tackle common challenges such as complex service configurations, infrastructure as code (IaC) implementation, and knowledge base integration. By the end of this post, you’ll understand how to start using AWS MCP Servers to transform your development workflow and deliver better solutions, faster.

If you want to get started right away, skip ahead to the section “From Concept to working code in minutes.”

AI is transforming how we build software, creating opportunities to dramatically accelerate development while improving code quality and consistency. Today’s AI assistants can understand complex requirements, generate production-ready code, and help developers navigate technical challenges in real time. This AI-driven approach is particularly valuable in cloud development, where developers need to orchestrate multiple services while maintaining security, scalability, and cost-efficiency.

Developers need code assistants that understand the nuances of AWS services and best practices. Specialized AI agents can address these needs by:

  • Providing contextual guidance on AWS service selection and configuration
  • Optimizing compliance with security best practices and regulatory requirements
  • Promoting the most efficient utilization and cost-effective solutions
  • Automating repetitive implementation tasks with AWS specific patterns

This approach means developers can focus on innovation while AI assistants handle the undifferentiated heavy lifting of coding. Whether you’re using Amazon Q, Amazon Bedrock, or other AI tools in your workflow, AWS MCP Servers complement and enhance these capabilities with deep AWS specific knowledge to help you build better solutions faster.

Model Context Protocol (MCP) is a standardized open protocol that enables seamless interaction between large language models (LLMs), data sources, and tools. This protocol allows AI assistants to use specialized tooling and to access domain-specific knowledge by extending the model’s capabilities beyond its built-in knowledge—all while keeping sensitive data local. Through MCP, general-purpose LLMs can now seamlessly access relevant knowledge beyond initial training data and be effectively steered towards desired outputs by incorporating specific context and best practices.

Accelerate building on AWS

What if your AI assistant could instantly access deep AWS knowledge, understanding every AWS service, best practice, and architectural pattern? With MCP, we can transform general-purpose LLMs into AWS specialists by connecting them to specialized knowledge servers. This opens up exciting new possibilities for accelerating cloud development while maintaining security and following best practices.

Build on AWS in a fraction of the time, with best practices automatically applied from the first line of code. Skip hours of documentation research and immediately access ready-to-use patterns for complex services such as Amazon Bedrock Knowledge Bases. Our MCP Servers will help you write well-architected code from the start, implement AWS services correctly the first time, and deploy solutions that are secure, observable, and cost-optimized by design. Transform how you build on AWS today.

  • Enforce AWS best practices automatically – Write well-architected code from the start with built-in security controls, proper observability, and optimized resource configurations
  • Cut research time dramatically – Stop spending hours reading documentation. Our MCP Servers provide contextually relevant guidance for implementing AWS services correctly, addressing common pitfalls automatically
  • Access ready-to-use patterns instantly – Use pre-built AWS CDK constructs, Amazon Bedrock Agents schema generators, and Amazon Bedrock Knowledge Bases integration templates that follow AWS best practices from the start
  • Optimize cost proactively – Prevent over-provisioning as you design your solution by getting cost-optimization recommendations and generating a comprehensive cost report to analyze your AWS spending before deployment

To turn this vision into reality and make AWS development faster, more secure, and more efficient, we’ve created AWS MCP Servers—a suite of specialized AWS MCP Servers that bring AWS best practices directly to your development workflow. Our specialized AWS MCP Servers combine deep AWS knowledge with AI capabilities to accelerate development across key areas. Each AWS MCP Server focuses on a specific domain of AWS best practices, working together to provide comprehensive guidance throughout your development journey.

Overview of domain-specific MCP Servers for AWS development

Our specialized MCP Servers are designed to cover distinct aspects of AWS development, each bringing deep knowledge to specific domains while working in concert to deliver comprehensive solutions:

  • Core – The foundation server that provides AI processing pipeline capabilities and serves as a central coordinator. It helps provide clear plans for building AWS solutions and can federate to other MCP servers as needed.
  • AWS Cloud Development Kit (AWS CDK) – Delivers AWS CDK knowledge with tools for implementing best practices, security configurations with cdk-nag, Powertools for AWS Lambda integration, and specialized constructs for generative AI services. It makes sure infrastructure as code (IaC) follows AWS Well-Architected principles from the start.
  • Amazon Bedrock Knowledge Bases – Enables seamless access to Amazon Bedrock Knowledge Bases so developers can query enterprise knowledge with natural language, filter results by data source, and use reranking for improved relevance.
  • Amazon Nova Canvas – Provides image generation capabilities using Amazon Nova Canvas through Amazon Bedrock, enabling the creation of visuals from text prompts and color palettes—perfect for mockups, diagrams, and UI design concepts.
  • Cost – Analyzes AWS service costs and generates comprehensive cost reports, helping developers understand the financial implications of their architectural decisions and optimize for cost-efficiency.

Prerequisites

To complete the solution, you need to have the following prerequisites in place:

  • uv package manager
  • Install Python using uv python install 3.13
  • AWS credentials with appropriate permissions
  • An MCP-compatible LLM client (such as Anthropic’s Claude for Desktop, Cline, Amazon Q CLI, or Cursor)

From concept to working code in minutes

You can download the AWS MCP Servers on GitHub or through the PyPI package manager. Here’s how to get started using your favorite code assistant with MCP support.

To install MCP Servers, enter the following code:

# Install and setup the MCP servers
{
  "mcpServers": {
    "awslabs.core-mcp-server": {
      "command": "uvx",
      "args": [
        "awslabs.core-mcp-server@latest"
      ],
      "env": {
        "FASTMCP_LOG_LEVEL": "ERROR",
        "MCP_SETTINGS_PATH": "path to your mcp server settings"
      },
      "autoApprove": [],
      "disabled": false
    },
    "awslabs.bedrock-kb-retrieval-mcp-server": {
      "command": "uvx",
      "args": ["awslabs.bedrock-kb-retrieval-mcp-server@latest"],
      "env": {
        "AWS_PROFILE": "your-aws-profile",
        "AWS_REGION": "us-east-1"
      }
    },
    "awslabs.cdk-mcp-server": {
      "command": "uvx",
      "args": ["awslabs.cdk-mcp-server@latest"],
      "env": {
        "FASTMCP_LOG_LEVEL": "ERROR"
      }
    },
    "awslabs.cost-analysis-mcp-server": {
      "command": "uvx",
      "args": ["awslabs.cost-analysis-mcp-server@latest"],
      "env": {
        "AWS_PROFILE": "your-aws-profile",
        "FASTMCP_LOG_LEVEL": "ERROR"
      }
    },
    "awslabs.nova-canvas-mcp-server": {
      "command": "uvx",
      "args": ["awslabs.nova-canvas-mcp-server@latest"],
      "env": {
        "AWS_PROFILE": "your-aws-profile",
        "AWS_REGION": "us-east-1"
      }
    }
  }
}

AWS MCP Servers in action

Here’s how AWS MCP servers transform the development experience:

Developer: “I need to build an AI-powered chatbot using Amazon Bedrock that can answer questions from our company’s knowledge base. I also want to add a tool for the chatbot to call our internal API.”

Core: “I’ll help you build an Amazon Bedrock Knowledge Bases chatbot with API integration. Let’s create an architecture that uses Amazon Bedrock Agents with a custom action group to call your internal API.”

Core generates a comprehensive architecture diagram showing the knowledge base integration, Amazon Bedrock Agents configuration with action groups, API connectivity, and data flow between components.

AWS CDK: “Here’s the infrastructure code for your chatbot with the Amazon Bedrock Agents action group. I’ve included proper IAM roles, security controls, and Lambda Powertools for observability.”

// CDK code with Well-Architected best practices
 const knowledgeBase = new BedrockKnowledgeBase(this, 'CompanyKB', {
   embeddingModel: BedrockFoundationModel.TITAN_EMBED_TEXT_V1,
   vectorStore: new OpenSearchServerlessVectorStore(this, 'VectorStore', {
     encryption: OpenSearchEncryption.KMS,
     ebs: OpenSearchEbsOptions.provisioned(100, OpenSearchVolumeType.GP3)
   })
 });

 // Action Group for API integration
 const actionGroupSchema = new BedrockAgentSchema(this, 'ApiActionSchema', {
   schemaPath: 'schemas/api-action-schema.json',
   // Schema will be auto-generated from your Lambda code
   generateFromLambda: {
     lambdaPath: 'lambda/api-actions.py',
     outputPath: 'schemas/api-action-schema.json'
   }
 });

 const apiActionGroup = new BedrockAgentActionGroup(this, 'ApiActionGroup', {
   actionGroupName: 'InternalApiActions',
   description: 'Actions to interact with internal API',
   actionGroupExecutor: new BedrockAgentLambdaExecutor(apiActionsLambda),
   actionGroupSchema: actionGroupSchema
 });

The CDK MCP Server generates complete AWS CDK code to deploy the entire solution. It automatically runs cdk-nag to identify potential security issues and provides remediation steps for each finding, making sure that the infrastructure follows AWS Well-Architected best practices.

Amazon Bedrock Knowledge Bases retrieval: “I’ve configured the optimal settings for your knowledge base queries, including proper reranking for improved relevance.”

Amazon Bedrock Knowledge Bases MCP Server demonstrates how to structure queries to the knowledge base for maximum relevance, provides sample code for filtering by data source, and shows how to integrate the knowledge base responses with the chatbot interface.

Amazon Nova Canvas: “To enhance your chatbot’s capabilities, I’ve created visualizations that can be generated on demand when users request data explanations.”

Amazon Nova Canvas MCP server generates sample images showing how Amazon Nova Canvas can create charts, diagrams, and visual explanations based on knowledge base content, making complex information more accessible to users.

Cost Analysis: “Based on your expected usage patterns, here’s the estimated monthly cost breakdown and optimization recommendations.”

The Cost Analysis MCP Server generates a detailed cost analysis report showing projected expenses for each AWS service, identifies cost optimization opportunities such as reserved capacity for Amazon Bedrock, and provides specific recommendations to reduce costs without impacting performance.

With AWS MCP Servers, what would typically take days of research and implementation is completed in minutes, with better quality, security, and cost-efficiency than manual development in that same time.

Best practices for MCP-assisted development

To maximize the benefits of MCP assisted development while maintaining security and code quality, developers should follow these essential guidelines:

  • Always review generated code for security implications before deployment
  • Use MCP Servers as accelerators, not replacements for developer judgment and expertise
  • Keep MCP Servers updated with the latest AWS security best practices
  • Follow the principle of least privilege when configuring AWS credentials
  • Run security scanning tools on generated infrastructure code

Coming up in the series

This post introduced the foundations of AWS MCP Servers and how they accelerate AWS development through specialized, AWS specific MCP Servers. In upcoming posts, we’ll dive deeper into:

  • Detailed walkthroughs of each MCP server’s capabilities
  • Advanced patterns for integrating AWS MCP Servers into your development workflow
  • Real-world case studies showing AWS MCP Servers’ impact on development velocity
  • How to extend AWS MCP Servers with your own custom MCP servers

Stay tuned to learn how AWS MCP Servers can transform your specific AWS development scenarios and help you build better solutions faster. Visit our GitHub repository or Pypi package manager to explore example implementations and get started today.


About the Authors

Jimin Kim is a Prototyping Architect on the AWS Prototyping and Cloud Engineering (PACE) team, based in Los Angeles. With specialties in Generative AI and SaaS, she loves helping her customers succeed in their business. Outside of work, she cherishes moments with her wife and three adorable calico cats.

Pranjali Bhandari is part of the Prototyping and Cloud Engineering (PACE) team at AWS, based in the San Francisco Bay Area. She specializes in Generative AI, distributed systems, and cloud computing. Outside of work, she loves exploring diverse hiking trails, biking, and enjoying quality family time with her husband and son.

Laith Al-Saadoon is a Principal Prototyping Architect on the Prototyping and Cloud Engineering (PACE) team. He builds prototypes and solutions using generative AI, machine learning, data analytics, IoT & edge computing, and full-stack development to solve real-world customer challenges. In his personal time, Laith enjoys the outdoors–fishing, photography, drone flights, and hiking.

Paul Vincent is a Principal Prototyping Architect on the AWS Prototyping and Cloud Engineering (PACE) team. He works with AWS customers to bring their innovative ideas to life. Outside of work, he loves playing drums and piano, talking with others through Ham radio, all things home automation, and movie nights with the family.

Justin Lewis leads the Emerging Technology Accelerator at AWS. Justin and his team help customers build with emerging technologies like generative AI by providing open source software examples to inspire their own innovation. He lives in the San Francisco Bay Area with his wife and son.

Anita Lewis is a Technical Program Manager on the AWS Emerging Technology Accelerator team, based in Denver, CO. She specializes in helping customers accelerate their innovation journey with generative AI and emerging technologies. Outside of work, she enjoys competitive pickleball matches, perfecting her golf game, and discovering new travel destinations.

Read More

Harness the power of MCP servers with Amazon Bedrock Agents

Harness the power of MCP servers with Amazon Bedrock Agents

AI agents extend large language models (LLMs) by interacting with external systems, executing complex workflows, and maintaining contextual awareness across operations. Amazon Bedrock Agents enables this functionality by orchestrating foundation models (FMs) with data sources, applications, and user inputs to complete goal-oriented tasks through API integration and knowledge base augmentation. However, in the past, connecting these agents to diverse enterprise systems has created development bottlenecks, with each integration requiring custom code and ongoing maintenance—a standardization challenge that slows the delivery of contextual AI assistance across an organization’s digital ecosystem. This is a problem that you can solve by using Model Context Protocol (MCP), which provides a standardized way for LLMs to connect to data sources and tools.

Today, MCP is providing agents standard access to an expanding list of accessible tools that you can use to accomplish a variety of tasks. In time, MCP can promote better discoverability of agents and tools through marketplaces, enabling agents to share context and have common workspaces for better interaction, and scale agent interoperability across the industry.

In this post, we show you how to build an Amazon Bedrock agent that uses MCP to access data sources to quickly build generative AI applications. Using Amazon Bedrock Agents, your agent can be assembled on the fly with MCP-based tools as in this example:

InlineAgent(
    foundation_model="us.anthropic.claude-3-5-sonnet-20241022-v2:0",
    instruction="You are a friendly assistant for resolving user queries",
    agent_name="SampleAgent",
    action_groups=[
        ActionGroup(
            name="SampleActionGroup",
            mcp_clients=[mcp_client_1, mcp_client_2],
        )
    ],
).invoke(input_text=”Convert 11am from NYC time to London time”)

We showcase an example of building an agent to understand your Amazon Web Service (AWS) spend by connecting to AWS Cost Explorer, Amazon CloudWatch, and Perplexity AI through MCP. You can use the code referenced in this post to connect your agents to other MCP servers to address challenges for your business. We envision a world where agents have access to an ever-growing list of MCP servers that they can use for accomplishing a wide variety of tasks.

Model Context Protocol

Developed by Anthropic as an open protocol, MCP provides a standardized way to connect AI models to virtually any data source or tool. Using a client-server architecture, MCP enables developers to expose their data through lightweight MCP servers while building AI applications as MCP clients that connect to these servers. Through this architecture, MCP enables users to build more powerful, context-aware AI agents that can seamlessly access the information and tools they need. Whether you’re connecting to external systems or internal data stores or tools, you can now use MCP to interface with all of them in the same way. The client-server architecture of MCP enables your agent to access new capabilities as the MCP server updates without requiring any changes to the application code.

MCP architecture

MCP uses a client-server architecture that contains the following components and is shown in the following figure:

  • Host: An MCP host is a program or AI tool that requires access to data through the MCP protocol, such as Claude Desktop, an integrated development environment (IDE), or any other AI application.
  • Client: Protocol clients that maintain one-to-one connections with servers.
  • Server: Lightweight programs that expose capabilities through standardized MCP.
  • Local data sources: Your databases, local data sources, and services that MCP servers can securely access.
  • Remote services: External systems available over the internet through APIs that MCP servers can connect to.

Let’s walk through how to set up Amazon Bedrock agents that take advantage of MCP servers.

Using MCP with Amazon Bedrock agents

In this post, we provide a step-by-step guide for how to connect your favorite MCP servers with Amazon Bedrock agents as Action Groups that an agent can use to accomplish tasks provided by the user. The AgentInlineSDK provides a straightforward way to create inline agents, containing a built-in MCP client implementation that provides you with direct access to tools delivered by an MCP server.

As part of creating an agent, the developer creates an MCP client specific to each MCP server that requires agent communication. When invoked, the agent determines which tools are needed for the user’s task; if MCP server tools are required, it uses the corresponding MCP client to request tool execution from that server. The user code doesn’t need to be aware of the MCP protocol because that’s handled by the MCP client provided the InlineAgent code repository.

To orchestrate this workflow, you take advantage of the return control capability of Amazon Bedrock Agents. The following diagram illustrates the end-to-end flow of an agent handling a request that uses two tools. In the first flow, a Lambda-based action is taken, and in the second, the agent uses an MCP server.

Use case: transform how you manage your AWS spend across different AWS services including Amazon Bedrock

To show how an Amazon Bedrock agent can use MCP servers, let’s walk through a sample use case. Imagine asking questions like “Help me understand my Bedrock spend over the last few weeks” or “What were my EC2 costs last month across regions and instance types?” and getting a human-readable analysis of the data instead of raw numbers on a dashboard. The system interprets your intent and delivers precisely what you need—whether that’s detailed breakdowns, trend analyses, visualizations, or cost-saving recommendations. This is useful because what you’re interested in is insights rather than data. You can accomplish this using two MCP servers: a custom-built MCP server for retrieving the AWS spend data and an open source MCP server from Perplexity AI to interpret the data. You add these two MCP servers as action groups in an inline Amazon Bedrock agent. This gives you an AI agent that can transform the way you manage your AWS spend. All the code for this post is available in the GitHub repository.

Let’s walk through how this agent is created using inline agents. You can use inline agents to define and configure Amazon Bedrock agents dynamically at runtime. They provide greater flexibility and control over agent capabilities, enabling users to specify FMs, instructions, action groups, guardrails, and knowledge bases as needed without relying on pre-configured control plane settings. It’s worth noting that you can also orchestrate this behavior without inline agents by using RETURN_CONTROL with the InvokeAgent API.

MCP components in Amazon Bedrock Agents

  1. Host: This is the Amazon Bedrock inline agent. This agent adds MCP clients as action groups that can be invoked through RETURN_CONTROL when the user asks an AWS spend-related question.
  2. Client: You create two clients that establish one-to-one connections with their respective servers: a cost explorer client with specific cost server parameters and a Perplexity AI client with Perplexity server parameters.
  3. Servers: You create two MCP servers that each run locally on your machine and communicate to your application over standard input/output (alternatively, you could also configure the client to talk to remote MCP servers).
    1. Cost Explorer and Amazon CloudWatch Logs (for Amazon Bedrock model invocation log data) and an MCP server to retrieve the AWS spend data.
    2. Perplexity AI MCP server to interpret the AWS spend data.
  4. Data sources: The MCP servers talk to remote data sources such as Cost Explorer API, CloudWatch Logs and the Perplexity AI search API.

Prerequisites

You need the following prerequisites to get started implementing the solution in this post:

  1. An AWS account
  2. Familiarity with FMs and Amazon Bedrock
  3. Install AWS Command Line Interface (AWS CLI) and set up credentials
  4. Python 3.11 or later
  5. AWS Cloud Development Kit (AWS CDK) CLI
  6. Enable model access for Anthropic’s Claude 3.5 Sonnet v2
  7. You need to have your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY so that you can set them using environment variables for the server
  8. The two MCP servers are run as Docker daemons, so you need to have Docker installed and running on your computer

The MCP servers run locally on your computer and need to access AWS services and the Perplexity API. You can read more about AWS credentials in Manage access keys for IAM users. Make sure that your credentials include AWS Identity and Access Manager (IAM) read access to Cost Explorer and CloudWatch. You can do this by using AWSBillingReadOnlyAccess and CloudWatchReadOnlyAccess managed IAM permissions. You can get the Perplexity API key from the Perplexity Sonar API page.

Steps to run

With the prerequisites in place, you’re ready to implement the solution.

  1. Navigate to the InlineAgent GitHub repository.
  2. Follow the setup steps.
  3. Navigate to the cost_explorer_agent This folder contains the code for this post.
    cd examples/mcp/cost_explorer_agent

  4. Create a .env file in cost_explorer_agent directory using example.
    AWS_ACCESS_KEY_ID=
    AWS_SECRET_ACCESS_KEY=
    AWS_REGION=
    BEDROCK_LOG_GROUP_NAME=
    PERPLEXITY_API_KEY=

  5. Build aws-cost-explorer-mcp server
    git clone https://github.com/aarora79/aws-cost-explorer-mcp-server.git
    cd aws-cost-explorer-mcp-server/
    docker build -t aws-cost-explorer-mcp

  6. You’re now ready to create an agent that can invoke these MCP servers to provide insights into your AWS spend. You can do this by running the python main.py command. The output will look like the following example. The agent connects with the two MCP servers and accesses their respective tools. Next, the agent lays out a plan to use specific tools sequentially, uses code interpreter to generate a chart showing cost distribution, and uses Cost Explorer and Perplexity API to provide information about Amazon Bedrock and the AWS account’s cost expenditure.
    Connected to server with tools: ['get_bedrock_daily_usage_stats', 'get_bedrock_hourly_usage_stats', 'get_ec2_spend_last_day', 'get_detailed_breakdown_by_day']
    Connected to server with tools: ['perplexity_ask']
    
    Thought: Let me break this down into parts:
    1. First, I'll ask about Amazon Bedrock using Perplexity
    2. Then I'll get the cost breakdown for the last 7 days
    3. Finally, I'll create a visualization of the spending data
    
    Tool use: perplexity_ask with these inputs: messages[[{role=user, content=What is Amazon Bedrock? Provide a concise explanation.}] (array)]
    
    Tool use: get_detailed_breakdown_by_day with these inputs: days[7 (integer)]
    
    Thought: Let me analyze the data and create a visualization of the top services by cost. I'll use Python to process the data and create a bar chart.
    
    Code interpreter:
    Generated code
    ```
    import pandas as pd
    import matplotlib.pyplot as plt
    ...
    ```
    
    Code interpreter output:
    aws_costs.png (image/png)
    Directory output created successfully
    
    Thought: Now I can provide a comprehensive answer about Amazon Bedrock and the cost analysis. Amazon Bedrock is a fully managed service that provides access to foundation models (FMs) from leading AI companies through a single API. Based on the cost analysis of the last 7 days, your top spending services were:
    
    1. Amazon Neptune: $24
    2. Amazon SageMaker: $567
    3. EC2 (Compute + Other): $454
    4. Amazon OpenSearch Service: $34
    5. Amazon Bedrock: $1235
    6. Amazon Q: $178

Implementation details

Now that you understand the output produced by an agent, let’s lift the curtain and review some of the important pieces of code that produce the output.

  1. Creating MCP clients: config.py, defines the two MCP clients that talk to your two MCP servers.
    1. Server parameters are defined for the cost explorer and Perplexity clients. The solution uses StdioServerParameters, which configures how the client should communicate over standard input/output (stdio) streams. This contains the parameters required by the server to access the required data through APIs.
      # Cost server parameters
      cost_server_params = StdioServerParameters(
          command="/usr/local/bin/docker",
          args=[
              "run",
              "-i",
              "--rm",
              "-e",
              "AWS_ACCESS_KEY_ID",
              "-e",
              "AWS_SECRET_ACCESS_KEY",
              "-e",
              "AWS_REGION",
              "-e",
              "BEDROCK_LOG_GROUP_NAME",
              "-e",
              "stdio",
              "aws-cost-explorer-mcp:latest",
          ],
          env={
              "AWS_ACCESS_KEY_ID": AWS_ACCESS_KEY_ID,
              "AWS_SECRET_ACCESS_KEY": AWS_SECRET_ACCESS_KEY,
              "AWS_REGION": AWS_REGION,
              "BEDROCK_LOG_GROUP_NAME": BEDROCK_LOG_GROUP_NAME,
          },
      )
      
      # Perplexity server parameters
      perplexity_server_params = StdioServerParameters(
          command="/usr/local/bin/docker",
          args=["run", "-i", "--rm", "-e", "PERPLEXITY_API_KEY", "mcp/perplexity-ask"],
          env={"PERPLEXITY_API_KEY": PERPLEXITY_API_KEY},
      )

    2. In main.py, the MCP server parameters are imported and used to create your two MCP clients.
      cost_explorer_mcp_client = await MCPClient.create(server_params=cost_server_params)
      perplexity_mcp_client = await MCPClient.create(server_params=perplexity_server_params)

  1. Configure agent action group: main.py creates the action group that combines the MCP clients into a single interface that the agent can access. This enables the agent to ask your application to invoke either of these MCP servers as needed through return of control.
    # Create action group with both MCP clients
    cost_action_group = ActionGroup(
        name="CostActionGroup",
        mcp_clients=[cost_explorer_mcp_client, perplexity_mcp_client]
    )

  2. Inline agent creation: The inline agent can be created with the following specifications:
    1. Foundation model: Configure your choice of FM to power your agent. This can be any model provided on Amazon Bedrock. This example uses Anthropic’s Claude 3.5 Sonnet model.
    2. Agent instruction: Provide instructions to your agent that contain the guidance and steps for orchestrating responses to user queries. These instructions anchor the agent’s approach to handling various types of queries
    3. Agent name: Name of your agent.
    4. Action groups: Define the action groups that your agent can access. These can include single or multiple action groups, with each group having access to multiple MCP clients or AWS Lambda As an option, you can configure your agent to use Code Interpreter to generate, run, and test code for your application.
# Create and invoke the inline agent
await InlineAgent(
    foundation_model="us.anthropic.claude-3-5-sonnet-20241022-v2:0",
    instruction="""You are a friendly assistant that is responsible for resolving user queries.
    
    You have access to search, cost tool and code interpreter. 
    
    """,
    agent_name="cost_agent",
    action_groups=[
        cost_action_group,
        {
            "name": "CodeInterpreter",
            "builtin_tools": {
                "parentActionGroupSignature": "AMAZON.CodeInterpreter"
            },
        },
    ],
).invoke(
    input_text="<user-query-here>"
)

You can use this example to build an inline agent on Amazon Bedrock that establishes connections with different MCP servers and groups their clients into a single action group for the agent to access.

Conclusion

The Anthropic MCP protocol offers a standardized way of connecting FMs to data sources, and now you can use this capability with Amazon Bedrock Agents. In this post, you saw an example of combining the power of Amazon Bedrock and MCP to build an application that offers a new perspective on understanding and managing your AWS spend.

Organizations can now offer their teams natural, conversational access to complex financial data while enhancing responses with contextual intelligence from sources like Perplexity. As AI continues to evolve, the ability to securely connect models to your organization’s critical systems will become increasingly valuable. Whether you’re looking to transform customer service, streamline operations, or gain deeper business insights, the Amazon Bedrock and MCP integration provides a flexible foundation for your next AI innovation. You can dive deeper on this MCP integration by exploring our code samples.

Here are some examples of what you can build by connecting your Amazon Bedrock Agents to MCP servers:

  • A multi-data source agent that retrieves data from different data sources such as Amazon Bedrock Knowledge Bases, Sqlite, or even your local filesystem.
  • A developer productivity assistant agent that integrates with Slack and GitHub MCP servers.
  • A machine learning experiment tracking agent that integrates with the Opik MCP server from Comet ML for managing, visualizing, and tracking machine learning experiments directly within development environments.

What business challenges will you tackle with these powerful new capabilities?


About the authors

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build generative AI solutions. His focus since early 2023 has been leading solution architecture efforts for the launch of Amazon Bedrock, the flagship generative AI offering from AWS for builders. Mark’s work covers a wide range of use cases, with a primary interest in generative AI, agents, and scaling ML across the enterprise. He has helped companies in insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services. Mark holds six AWS certifications, including the ML Specialty Certification.

Eashan Kaushik is a Specialist Solutions Architect AI/ML at Amazon Web Services. He is driven by creating cutting-edge generative AI solutions while prioritizing a customer-centric approach to his work. Before this role, he obtained an MS in Computer Science from NYU Tandon School of Engineering. Outside of work, he enjoys sports, lifting, and running marathons.

Madhur Prashant is an AI and ML Solutions Architect at Amazon Web Services. He is passionate about the intersection of human thinking and generative AI. His interests lie in generative AI, specifically building solutions that are helpful and harmless, and most of all optimal for customers. Outside of work, he loves doing yoga, hiking, spending time with his twin, and playing the guitar.

Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington, D.C.

Andy Palmer is a Director of Technology for AWS Strategic Accounts. His teams provide Specialist Solutions Architecture skills across a number of speciality domain areas, including AIML, generative AI, data and analytics, security, network, and open source software. Andy and his team have been at the forefront of guiding our most advanced customers through their generative AI journeys and helping to find ways to apply these new tools to both existing problem spaces and net new innovations and product experiences.

Read More

Generate compliant content with Amazon Bedrock and ConstitutionalChain

Generate compliant content with Amazon Bedrock and ConstitutionalChain

Generative AI has emerged as a powerful tool for content creation, offering several key benefits that can significantly enhance the efficiency and effectiveness of content production processes such as creating marketing materials, image generation, content moderation etc. Constitutional AI and LangGraph‘s reflection mechanisms represent two complementary approaches to ensuring AI systems behave ethically – with Anthropic embedding principles during training while LangGraph implements them during inference/runtime through reflection and self-correction mechanisms. By using LanGraph’s Constitutional AI, content creators can streamline their workflow while maintaining high standards of user-defined compliance and ethical integrity. This method not only reduces the need for extensive human oversight but also enhances the transparency and accountability of content generation process by AI.

In this post, we explore practical strategies for using Constitutional AI to produce compliant content efficiently and effectively using Amazon Bedrock and LangGraph to build ConstitutionalChain for rapid content creation in highly regulated industries like finance and healthcare. Although AI offers significant productivity benefits, maintaining compliance with strict regulations are crucial. Manual validation of AI-generated content for regulatory adherence can be time-consuming and challenging. We also provide an overview of how Insagic, a Publicis Groupe company, integrated this concept into their existing healthcare marketing workflow using Amazon Bedrock. Insagic is a next-generation insights and advisory business that combines data, design, and dialogues to deliver actionable insights and transformational intelligence for healthcare marketers. It uses expertise from data scientists, behavior scientists, and strategists to drive better outcomes in the healthcare industry.

Understanding Constitutional AI

Constitutional AI is designed to align large language models (LLMs) with human values and ethical considerations. It works by integrating a set of predefined rules, principles, and constraints into the LLM’s core architecture and training process. This approach makes sure that the LLM operates within specified ethical and legal parameters, much like how a constitution governs a nation’s laws and actions.

The key benefits of Constitutional AI for content creation include:

  • Ethical alignment – Content generated using Constitutional AI is inherently aligned with predefined ethical standards
  • Legal compliance – The LLM is designed to operate within legal frameworks, reducing the risk of producing non-compliant content
  • Transparency – The principles guiding the LLM’s decision-making process are clearly defined and can be inspected
  • Reduced human oversight – By embedding ethical guidelines into the LLM, the need for extensive human review is significantly reduced

Let’s explore how you can harness the power of Constitutional AI to generate compliant content for your organization.

Solution overview

For this solution, we use Amazon Bedrock Knowledge Bases to store a repository of healthcare documents. We employ a Retrieval Augmented Generation (RAG) approach, first retrieving relevant context and synthesizing an answer based on the retrieved context, to generate articles based on the repository. We then use the open source orchestration framework LangGraph and ConstitutionalChain to generate, critique, and review prompts in an Amazon SageMaker notebook and develop an agentic workflow to generate compliance content. The following diagram illustrates this architecture.

This implementation demonstrates a sophisticated agentic workflow that not only generates responses based on a knowledge base but also employs a reflection technique to examine its outputs through ethical principles, allowing it to refine and improve its outputs. We upload a sample set of mental health documents to Amazon Bedrock Knowledge Bases and use those documents to write an article on mental health using a RAG-based approach. Later, we define a constitutional principle with a custom Diversity, Equity, and Inclusion (DEI) principle, specifying how to critique and revise responses for inclusivity.

Prerequisites

To deploy the solution, you need the following prerequisites:

Create an Amazon Bedrock knowledge base

To demonstrate this capability, we download a mental health article from the following GitHub repo and store it in Amazon S3. We then use Amazon Bedrock Knowledge Bases to index the articles. By default, Amazon Bedrock uses Amazon OpenSearch Serverless as a vector database. For full instructions to create an Amazon Bedrock knowledge base with Amazon S3 as the data source, see Create a knowledge base in Amazon Bedrock Knowledge Bases.

    1. On the Amazon Bedrock console, create a new knowledge base.
    2. Provide a name for your knowledge base and create a new IAM service role.Provide Knowledge Base details in the Amazon Bedrock console.
    3. Choose Amazon S3 as the data source and provide the S3 bucket storing the mental health article.
    4. Choose Amazon Titan Text Embeddings v2 as the embeddings model and OpenSearch Serverless as the vector store.
    5. Choose Create Knowledge Base.Create Knowledge Base in the Amazon Bedrock console.

Import statements and set up an Amazon Bedrock client

Follow the instructions provided in the README file in the GitHub repo. Clone the GitHub repo to make a local copy. We recommend running this code in a SageMaker JupyterLab environment. The following code imports the necessary libraries, including Boto3 for AWS services, LangChain components, and Streamlit. It sets up an Amazon Bedrock client and configures Anthropic’s Claude 3 Haiku model with specific parameters.

import boto3
from langchain_aws import ChatBedrock
…

bedrock_runtime = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")
llm = ChatBedrock(client=bedrock_runtime, model_id="anthropic.claude-3-haiku-20240307-v1:0")
…..

Define Constitutional AI components

Next, we define a Critique class to structure the output of the critique process. Then we create prompt templates for critique and revision. Lastly, we set up chains using LangChain for generating responses, critiques, and revisions.

# LangChain Constitutional chain migration to LangGraph

class Critique(TypedDict):
    """Generate a critique, if needed."""

    critique_needed: Annotated[bool, ..., "Whether or not a critique is needed."]
    critique: Annotated[str, ..., "If needed, the critique."]

critique_prompt = ChatPromptTemplate.from_template(
    "Critique this response according to the critique request. "
…
)

revision_prompt = ChatPromptTemplate.from_template(
    "Revise this response according to the critique and reivsion request.nn"
    ….
)
chain = llm | StrOutputParser()
critique_chain = critique_prompt | llm.with_structured_output(Critique)
revision_chain = revision_prompt | llm | StrOutputParser()

Define a State class and refer to the Amazon Bedrock Knowledge Bases retriever

We define a LangGraph State class to manage the conversation state, including the query, principles, responses, and critiques:

# LangGraph State

class State(TypedDict):
    query: str
    constitutional_principles: List[ConstitutionalPrinciple]

Next, we set up an Amazon Bedrock Knowledge Bases retriever to extract the relevant information. We refer to the Amazon Bedrock knowledge base we created earlier to create an article based on mental health documents. Make sure to update the knowledge base ID in the following code with the knowledge base you created in previous steps:

#-----------------------------------------------------------------
# Amazon Bedrock KnowledgeBase

from langchain_aws.retrievers import AmazonKnowledgeBasesRetriever

retriever = AmazonKnowledgeBasesRetriever(
knowledge_base_id="W3NMIJXLUE", # Change it to your Knowledge base ID
…
)

Create LangGraph nodes and a LangGraph graph along with constitutional principles

The next section of code integrates graph-based workflow orchestration, ethical principles, and a user-friendly interface to create a sophisticated Constitutional AI model. The following diagram illustrates the workflow.

Workflow of start, retrieval augmented generation, critique and revise, and end.

It uses a StateGraph to manage the flow between RAG and critique/revision nodes, incorporating a custom DEI principle to guide the LLM’s responses. The system is presented through a Streamlit application, which provides an interactive chat interface where users can input queries and view the LLM’s initial responses, critiques, and revised answers. The application also features a sidebar displaying a graph visualization of the workflow and a description of the applied ethical principle. This comprehensive approach makes sure that the LLM’s outputs are not only knowledge-based but also ethically aligned by using customizable constitutional principles that guide a reflection flow (critique and revise), all while maintaining a user-friendly experience with features like chat history management and a clear chat option.

Streamlit application

The Streamlit application component of this code creates an interactive and user-friendly interface for the Constitutional AI model. It sets up a side pane that displays a visualization of the LLM’s workflow graph and provides a description of the DEI principle being applied. The main interface features a chat section where users can input their queries and view the LLM’s responses.

# ------------------------------------------------------------------------
# Streamlit App

# Clear Chat History fuction
def clear_screen():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]

with st.sidebar:
    st.subheader('Constitutional AI Demo')
…..
    ConstitutionalPrinciple(
        name="DEI Principle",
        critique_request="Analyze the content for any lack of diversity, equity, or inclusion. Identify specific instances where the text could be more inclusive or representative of diverse perspectives.",
        revision_request="Rewrite the content by incorporating critiques to be more diverse, equitable, and inclusive. Ensure representation of various perspectives and use inclusive language throughout."
    )
    """)
    st.button('Clear Screen', on_click=clear_screen)

# Store LLM generated responses
if "messages" not in st.session_state.keys():
    st.session_state.messages = [{"role": "assistant", "content": "How may I assist you today?"}]

# Chat Input - User Prompt 
if prompt := st.chat_input():
….

    with st.spinner(f"Generating..."):
        ….
    with st.chat_message("assistant"):
        st.markdown("**[initial response]**")
….
        st.session_state.messages.append({"role": "assistant", "content": "[revised response] " + generation['response']})

The application maintains a chat history, displaying both user inputs and LLM responses, including the initial response, any critiques generated, and the final revised response. Each step of the LLM’s process is clearly labeled and presented to the user. The interface also includes a Clear Screen button to reset the chat history. When processing a query, the application shows a loading spinner and displays the runtime, providing transparency into the LLM’s operation. This comprehensive UI design allows users to interact with the LLM while observing how constitutional principles are applied to refine the LLM’s outputs.

Test the solution using the Streamlit UI

In the Streamlit application, when a user inputs a query, the application initiates the process by creating and compiling the graph defined earlier. It then streams the execution of this graph, which includes the RAG and critique/revise steps. During this process, the application displays real-time updates for each node’s execution, showing the user what’s happening behind the scenes. The system measures the total runtime, providing transparency about the processing duration. When it’s complete, the application presents the results in a structured manner within the chat interface. It displays the initial LLM-generated response, followed by any critiques made based on the constitutional principles, and finally shows the revised response that incorporates these ethical considerations. This step-by-step presentation allows users to see how the LLM’s response evolves through the constitutional AI process, from initial generation to ethical refinement. As mentioned, in the GitHub README file, in order to run the Streamlit application, use the following code:

pip install -r requirements.txt
streamlit run main.py

For details on using a Jupyter proxy to access the Streamlit application, refer to Build Streamlit apps in Amazon SageMaker Studio.

Modify the Studio URL, replacing lab? with proxy/8501/.

Chat interface showing the RAG and critique and revise steps.

How Insagic uses Constitutional AI to generate compliant content

Insagic uses real-world medical data to help brands understand people as patients and patients as people, enabling them to deliver actionable insights in the healthcare marketing space. Although generating deep insights in the health space can yield profound dividends, it must be done with consideration for compliance and the personal nature of health data. By defining federal guidelines as constitutional principles, Insagic makes sure that the content delivered by generative AI complies with federal guidelines for healthcare marketing.

Clean up

When you have finished experimenting with this solution, clean up your resources to prevent AWS charges from being incurred:

  1. Empty the S3 buckets.
  2. Delete the SageMaker notebook instance.
  3. Delete the Amazon Bedrock knowledge base.

Conclusion

This post demonstrated how to implement a sophisticated generative AI solution using Amazon Bedrock and LangGraph to generate compliant content. You can also integrate this workflow to generate responses based on a knowledge base and apply ethical principles to critique and revise its outputs, all within an interactive web interface. Insagic is looking at more ways to incorporate this into existing workflows by defining custom principles to achieve compliance goals.

You can expand this concept further by incorporating Amazon Bedrock Guardrails. Amazon Bedrock Guardrails and LangGraph Constitutional AI can create a comprehensive safety system by operating at different levels. Amazon Bedrock provides API-level content filtering and safety boundaries, and LangGraph implements constitutional principles in reasoning workflows. Together, they enable multi-layered protection through I/O filtering, topic restrictions, ethical constraints, and logical validation steps in AI applications.

Try out the solution for your own use case, and leave your feedback in the comments.


About the authors

Sriharsh AdariSriharsh Adari is a Senior Solutions Architect at Amazon Web Services (AWS), where he helps customers work backwards from business outcomes to develop innovative solutions on AWS. Over the years, he has helped multiple customers on data platform transformations across industry verticals. His core area of expertise include Technology Strategy, Data Analytics, and Data Science. In his spare time, he enjoys playing sports, binge-watching TV shows, and playing Tabla.

David Min is a Senior Partner Sales Solutions Architect at Amazon Web Services (AWS) specializing in Generative AI, where he helps customers transform their businesses through innovative AI solutions. Throughout his career, David has helped numerous organizations across industries bridge the gap between cutting-edge AI technology and practical business applications, focusing on executive engagement and successful solution adoption.

Stephen Garth is a Data Scientist at Insagic, where he develops advanced machine learning solutions, including LLM-powered automation tools and deep clustering models for actionable, consumer insights. With a strong background spanning software engineering, healthcare data science, and computational research, he is passionate to bring his expertise in AI-driven analytics and large-scale data processing to drive solutions.

Chris Cocking specializes in scalable enterprise application design using multiple programming languages. With a nearly 20 years of experience, he excels in LAMP and IIS environments, SEO strategies, and most recently designing agentic systems. Outside of work, Chris is an avid bassist and music lover, which helps fuel his creativity and problem-solving skills.

Read More

Minimize generative AI hallucinations with Amazon Bedrock Automated Reasoning checks

Minimize generative AI hallucinations with Amazon Bedrock Automated Reasoning checks

Foundation models (FMs) and generative AI are transforming enterprise operations across industries. McKinsey & Company’s recent research estimates generative AI could contribute up to $4.4 trillion annually to the global economy through enhanced operational efficiency, productivity growth of 0.1% to 0.6% annually, improved customer experience through personalized interactions, and accelerated digital transformation.

Today, organizations struggle with AI hallucination when moving generative AI applications from experimental to production environments. Model hallucination, where AI systems generate plausible but incorrect information, remains a primary concern. The 2024 Gartner CIO Generative AI Survey highlights three major risks: reasoning errors from hallucinations (59% of respondents), misinformation from bad actors (48%), and privacy concerns (44%).

To improve factual accuracy of large language model (LLM) responses, AWS announced Amazon Bedrock Automated Reasoning checks (in gated preview) at AWS re:Invent 2024. Through logic-based algorithms and mathematical validation, Automated Reasoning checks validate LLM outputs against domain knowledge encoded in the Automated Reasoning policy to help prevent factual inaccuracies. Automated reasoning checks is part of Amazon Bedrock Guardrails, a comprehensive framework that also provides content filtering, personally identifiable information (PII) redaction, and enhanced security measures. Together, these capabilities enable organizations to implement reliable generative AI safeguards—with Automated Reasoning checks addressing factual accuracy while other Amazon Bedrock Guardrails features help protect against harmful content and safeguard sensitive information.

In this post, we discuss how to help prevent generative AI hallucinations using Amazon Bedrock Automated Reasoning checks.

Automated Reasoning overview

Automated Reasoning is a specialized branch of computer science that uses mathematical proof techniques and formal logical deduction to verify compliance with rules and requirements with absolute certainty under given assumptions. As organizations face increasing needs to verify complex rules and requirements with mathematical certainty, automated reasoning techniques offer powerful capabilities. For example, AWS customers have direct access to automated reasoning-based features such as IAM Access Analyzer, S3 Block Public Access, or VPC Reachability Analyzer.

Unlike probabilistic approaches prevalent in machine learning, Automated Reasoning relies on formal mathematical logic to provide definitive guarantees about what can and can’t be proven. This approach mirrors the rigors of auditors verifying financial statements or compliance officers validating regulatory requirements, but with mathematical precision. By using rigorous logical frameworks and theorem-proving methodologies, Automated Reasoning can conclusively determine whether statements are true or false under given assumptions. This makes it exceptionally valuable for applications that demand high assurance and need to deliver unambiguous conclusions to their users.

The following workflow illustrates solver-based formal verification, showing both the process flow and algorithm for verifying formal system properties through logical analysis and SAT/SMT solvers.

One of the widely used Automated Reasoning techniques is SAT/SMT solving, which involves encoding a representation of rules and requirements into logical formulas. A logical formula is a mathematical expression that uses variables and logical operators to represent conditions and relationships. After the rules and requirements are encoded into these formulas, specialized tools known as solvers are applied to compute solutions that satisfy these constraints. These solvers determine whether the formulas can be satisfied—whether there exist values for variables that make the formulas true.

This process starts with two main inputs: a formal representation of the system (like code or a policy) expressed as logical formulas, and a property to analyze (such as whether certain conditions are possible or requirements can be met). The solver can return one of three possible outcomes:

  • Satisfiable – The solver finds an assignment of values that makes the formulas true, proving that the system can satisfy the given requirements. The solver provides this assignment, which can serve as a concrete example of correct behavior.
  • Unsatisfiable – The solver proves that no assignment exists that make all formulas true, proving that the requirements can’t be met. This often comes with information about which constraints are in conflict, helping identify the incorrect assumptions in the system.
  • Unknown – In some cases, the solver might not be able to determine satisfiability within reasonable computational limits, or the encoding might not contain enough information to reach a conclusion.

This technique makes sure that you either get confirmation that the specific property holds (with a concrete example), proof that it can’t be satisfied (with information on conflicting constraints), or an indication that the problem needs to be reformulated or analyzed differently.

Key features of Automated Reasoning checks

Automated Reasoning checks offer the following key features:

  • Mathematical validation framework – The feature verifies LLM outputs using mathematical logical deduction. Unlike probabilistic methods, it uses sound mathematical approaches to provide definitive guarantees about system behaviors within defined parameters.
  • Policy-based knowledge representation – Organizations can create Automated Reasoning policies that encode their rules, procedures, and guidelines into structured, mathematical formats. Organizations can upload documents like PDFs containing HR guidelines or operational workflows, which are then automatically converted into formal logic structures. Policy changes are automatically versioned with unique Amazon Resource Names (ARNs), allowing for change tracking, auditing, and rollback capabilities to maintain consistent policy enforcement.
  • Domain expert enablement – The feature is designed to empower domain experts, such as HR personnel or operational managers, to directly encode their knowledge without technical intermediaries. This makes sure that business rules and policies are accurately captured and maintained by those who understand them best.
  • Natural language to logic translation – The system uses two complementary approaches: LLMs handle natural language understanding, and a symbolic reasoning engine performs mathematical validation. This hybrid architecture allows users to input policies in plain language while maintaining mathematically rigorous verification.
  • Explainable validation results – Each validation check produces detailed findings that indicate whether content is Valid, Invalid, or No Data. The feature provides clear explanations for its decisions, including extracted factual statements, and suggested corrections for invalid content.
  • Interactive testing environment – Users can access a chat playground on the Amazon Bedrock console to test and refine policies in real time. The feature supports both interactive testing through the Amazon Bedrock console and automated testing through API integrations, with the ability to export test cases in JSON format for integration into continuous testing pipelines or documentation workflows.
  • Seamless AWS integration – The feature integrates directly with Amazon Bedrock Guardrails and can be used alongside other configurable guardrails like Contextual Grounding checks. It can be accessed through both the Amazon Bedrock console and APIs, making it flexible for various implementation needs.

These features combine to create a powerful framework that helps organizations maintain factual accuracy in their AI applications while providing transparent and mathematically sound validation processes.

Solution overview

Now that we understand the key features of Automated Reasoning checks, let’s examine how this capability works within Amazon Bedrock Guardrails. The following section provides a comprehensive overview of the architecture and demonstrates how different components work together to promote factual accuracy and help prevent hallucinations in generative AI applications.

Automated Reasoning checks in Amazon Bedrock Guardrails provides an end-to-end solution for validating AI model outputs using mathematically sound principles. This automated process uses formal logic and mathematical proofs to verify responses against established policies, offering definitive validation results that can significantly improve the reliability of your AI applications.

The following solution architecture follows a systematic workflow that enables rigorous validation of model outputs.

The workflow consists of the following steps:

  1. Source documents (such as HR guidelines or operational procedures) are uploaded to the system. These documents, along with optional intent descriptions, are processed to create structured rules and variables that form the foundation of an Automated Reasoning policy.
  2. Subject matter experts review and inspect the created policy to verify accurate representation of business rules. Each validated policy is versioned and assigned a unique ARN for tracking and governance purposes.
  3. The validated Automated Reasoning policy is associated with Amazon Bedrock Guardrails, where specific policy versions can be selected for implementation. This integration enables automated validation of generative AI outputs.
  4. When the generative AI application produces a response, Amazon Bedrock Guardrails triggers the Automated Reasoning checks. The system creates logical representations of both the input question and the application’s response, evaluating them against the established policy rules.
  5. The Automated Reasoning check provides detailed validation results, including whether statements are Valid, Invalid, or No Data. For each finding, it explains which rules and variables were considered, and provides suggestions for making invalid statements valid.

With this solution architecture in place, organizations can confidently deploy generative AI applications knowing that responses will be automatically validated against your established policies using mathematically sound principles.

Prerequisites

To use Automated Reasoning checks in Amazon Bedrock, make sure you have met the following prerequisites:

  • An active AWS account
  • Access permission through your AWS Account Manager, because Automated Reasoning checks is currently in gated preview
  • Confirmation of AWS Regions where Automated Reasoning checks is available

Input dataset

For this post, we examine a sample Paid Leave of Absence (LoAP) policy document as our example dataset. This policy document contains detailed guidelines covering employee eligibility criteria, duration limits, application procedures, and benefits coverage for paid leave. It’s an ideal example to demonstrate how Automated Reasoning checks can validate AI-generated responses against structured business policies, because it contains clear rules and conditions that can be converted into logical statements. The document’s mix of quantitative requirements (such as minimum tenure and leave duration) and qualitative conditions (like performance status and approval processes) makes it particularly suitable for showcasing the capabilities of automated reasoning validation.

The following screenshot shows an example of our policy document.

Start an Automated Reasoning check using the Amazon Bedrock console

The first step is to encode your knowledge—in this case, the sample LoAP policy—into an Automated Reasoning policy. Complete the following steps to initiate an Automated Reasoning check using the Amazon Bedrock console:

  1. On the Amazon Bedrock console, choose Automated Reasoning Preview under Safeguards in the navigation pane.
  2. Choose Create policy.

  1. Provide a policy name and policy description.

  1. Upload your source document. The source content can’t be modified after creation and must not exceed 6,000 characters with limitations on table sizes and image processing.
  2. Include a description of the intent of the Automated Reasoning policy you’re creating. For the sample policy, you can use the following intent:
Create a logical model of the Leave of Absence, Paid (LoAP) policy in this document.
Employees will ask questions about what are the eligibility requirements for the program,
whether they are allowed to take LOAP and for how long, duration and benefits during the
time off, and return to work. 
Below is an example question:
QUESTION: I am a temporary contractor working in operations. Am I eligible for LOAP?
ANSWER: No, only full-time employees are eligible for LoAP.

The policy creation process takes a few minutes to complete. The rules and variables are created after creating the policy and they can be edited, removed, or have new rules or variables added to them.

The policy document version is outlined in the details section along with the intent description and build status.

Next, you create a guardrail in Amazon Bedrock by configuring as many filters as you need.

  1. On the Amazon Bedrock console, choose Guardrails under Safeguards in the navigation pane.
  2. Choose Create guardrail.

  1. Provide guardrail details such as a name and an optional description.

  1. Add an Automated Reasoning check by choosing Enable Automated Reasoning policy, and choose the policy name and version.
  2. Choose Next and complete the creation of the guardrail.

  1. Navigate back to the Automated Reasoning section of the Amazon Bedrock console and open the newly created policy. You can use the test playground and input sample questions and answers that represent real user interactions with your LLM.
  2. Choose the guardrail you created, then choose Submit to evaluate how your policy handles these exchanges.

After submitting, you’ll be presented with one or more findings. A finding contains a set of facts that were extracted from the input Q&A and are analyzed independently. Each finding includes four key components:

  • Validation results – Shows the outcome of Automated Reasoning checks. The system determines these results by evaluating extracted variable assignments against your defined policy rules.
  • Applied rules – Displays the specific rules from your policy that were used to reach the validation conclusion.
  • Extracted variables – Lists the variables that were identified and used in the validation process.
  • Suggestions – Shows variable assignments that would make invalid responses valid, or for valid responses, identifies necessary assumptions. These can be used to generate feedback for your LLM.

Finally, you can use the feedback suggestions to improve your LLM’s responses.

  1. Collect rules from valid results with suggestions and invalid results.
  2. Feed these collected variables and rules back to your LLM to revise its original.
  3. Refine your policy:
    1. Edit incorrect rules using natural language.
    2. Improve variable descriptions when Automated Reasoning checks fail to assign values.
    3. For effective variable descriptions, include both technical definitions and common user expressions. For example, for a variable named is_full_time, "works more than 20 hours per week" is technically correct because it’s a quote from the source policy, but won’t help Automated Reasoning checks understand what users mean when they say “part-time.” Instead, use "works more than 20 hours per week; set to true if user says 'full-time' and false if user says 'part-time'".

Start an Automated Reasoning check using Python SDK and APIs

First, you need to create an Automated Reasoning policy from your documents using the Amazon Bedrock console as outlined in the previous section. Next, you can use the policy created with the ApplyGuardrail API to validate your generative AI application.

To use the Python SDK for validation using Automated Reasoning checks, follow these steps:

  1. First, set up the required configurations:
import boto3
import botocore
import os
import json

# Configuration parameters
DEFAULT_GUARDRAIL_NAME = "<YOUR_GUARDRAIL_NAME>"  # e.g., "my_policy_guardrail"
DEFAULT_AR_POLICY_VERSION = "1"

# AWS configuration
region = "us-west-2"
ar_policy = "<YOUR_AR_POLICY_ID>"  # e.g., "ABC123DEF456"
model_id = "<YOUR_MODEL_ID>"  # e.g., "anthropic.claude-3-haiku-20240307-v1:0"
  1. Before using Amazon Bedrock with Automated Reasoning policies, you will need to load the required service models. After being allowlisted for Amazon Bedrock access, you will receive two model files along with their corresponding version information. The following is a Python script to help you load these service models:
def add_service_model(model_file, service_name, version):
    """
    Adds a service model to the AWS configuration directory.
    
    Args:
        model_file (str): Path to the model file
        service_name (str): Name of the AWS service
        version (str): Service model version
    """
    # Configure paths
    source = f"models/{model_file}"  # Your downloaded model files directory
    dest_dir = os.path.expanduser(f"~/.aws/models/{service_name}/{version}")
    dest_file = f"{dest_dir}/service-2.json"

    try:
        # Create directory and copy model file
        os.makedirs(dest_dir, exist_ok=True)
        with open(source) as f:
            model = json.load(f)
        with open(dest_file, 'w') as f:
            json.dump(model, f, indent=2)
        print(f"Successfully added model for {service_name}")
        return True
    except Exception as e:
        print(f"Error adding {service_name} model: {e}")
        return False

def main():
    # Define your model files and versions
    # Replace with your actual model information provided by AWS
    models = {
        '<bedrock-model-file>.json': ('bedrock', '<bedrock-version>'),
        '<runtime-model-file>.json': ('bedrock-runtime', '<runtime-version>')
    }
    
    # Load each model
    for model_file, (service_name, version) in models.items():
        add_service_model(model_file, service_name, version)

if __name__ == "__main__":
    main()

  1. After you set up the service models, initialize the AWS clients for both Amazon Bedrock and Amazon Bedrock Runtime services. These clients will be used to interact with the models and apply guardrails.
# Initialize AWS clients
boto_session = boto3.Session(region_name=region)
runtime_client = boto_session.client("bedrock-runtime")
bedrock_client = boto_session.client("bedrock")
  1. Before applying Automated Reasoning policies, you need to either locate an existing guardrail or create a new one. The following code first attempts to find a guardrail by name, and if not found, creates a new guardrail with the specified Automated Reasoning policy configuration. This makes sure you have a valid guardrail to work with before proceeding with policy enforcement.
def find_guardrail_id(client, name) -> tuple[str, str]:
    """
    Finds the ID and version of a guardrail by its name.
    
    Args:
        client: The Bedrock client object
        name (str): Name of the guardrail to find
    
    Returns:
        tuple[str, str]: Guardrail ID and version if found, None otherwise
    """
    next_token = None
    while True:
        # List existing guardrails
        resp = client.list_guardrails(
        ) if next_token is None else client.list_guardrail(nextToken=next_token)

        # Search for matching guardrail
        for g in resp["guardrails"]:
            if g["name"] == name:
                return g["id"], g["version"]

        # Handle pagination
        if "nextToken" in resp and resp["nextToken"] != "":
            next_token = resp["nextToken"]
        else:
            break
    return None, None

# Find or create guardrail with AR policy
try:
    # First, try to find existing guardrail
    guardrail_id, guardrail_version = find_guardrail_id(
        bedrock_client, DEFAULT_GUARDRAIL_NAME)
    
    # If not found, create new guardrail
    if guardrail_id is None:
        create_resp = bedrock_client.create_guardrail(
            name=DEFAULT_GUARDRAIL_NAME,
            description="Automated Reasoning checks demo guardrail",
            automatedReasoningPolicyConfig={
                "policyIdentifier": ar_policy,
                "policyVersion": DEFAULT_AR_POLICY_VERSION
            },
            blockedInputMessaging='Input is blocked',
            blockedOutputsMessaging='Output is blocked',
        )
        guardrail_id = create_resp["guardrailId"]
        guardrail_version = create_resp["version"]
        print(f"✓ Created new guardrail: {guardrail_id}")
    else:
        print(f"✓ Found existing guardrail: {guardrail_id}")
        
except botocore.exceptions.ClientError as e:
    print(f"✗ Error managing guardrail: {str(e)}")
    raise
  1. When testing guardrails with Automated Reasoning policies, you need to properly format your input data. The following code shows how to structure a sample question and answer pair for validation:
def create_sample_input():
    """
    Creates a formatted sample input for guardrail validation.
    
    The format requires both the query and response to be properly structured
    with appropriate qualifiers.
    
    Returns:
        list: Formatted input for guardrail validation
    """
    sample_query = "I am a part-time employee, am I eligible for LoAP?"
    sample_response = "Yes, part time employees are allowed to use LoAP"
    
    return [
        {
            "text": {
                "text": sample_query,
                "qualifiers": ["query"]
            }
        },
        {
            "text": {
                "text": sample_response,
                "qualifiers": ["guard_content"]
            }
        }
    ]

# Example usage
guardrail_input = create_sample_input()

print(json.dumps(guardrail_input, indent=2))
  1. Now that you have your formatted input data, you can apply the guardrail with Automated Reasoning policies to validate the content. The following code sends the input to Amazon Bedrock Guardrails and returns the validation results:
guardrails_output = runtime_client.apply_guardrail(
            guardrailIdentifier= guardrail_id,
            guardrailVersion= guardrail_version,
            source="OUTPUT",
            content=guardrail_input,
        )
  1. After applying guardrails, you need to extract and analyze the Automated Reasoning assessment results. The following code shows how to process the guardrail output:
# Extract Automated Reasoning assessment
ar_assessment = None
for assessment in guardrails_output["assessments"]:
    if "automatedReasoningPolicy" in assessment:
        ar_assessment = assessment["automatedReasoningPolicy"]["findings"]
        break

if ar_assessment is None:
    print("No Automated Reasoning assessment found")
else:
    print("Automated Reasoning Assessment Results:")
    print(json.dumps(ar_assessment, indent=2))

    # Process any policy violations
    for finding in ar_assessment:
        if finding["result"] == "INVALID":
            print("nPolicy Violations Found:")
            # Print violated rules
            for rule in finding.get("rules", []):
                print(f"Rule: {rule['description']}")
            
            # Print suggestions if any
            if "suggestions" in finding:
                print("nSuggested Corrections:")
                for suggestion in finding["suggestions"]:
                    print(f"- {suggestion}")

The output will look something like the following:

{
    "result": "INVALID",
    "assignments": [...],
    "suggestions": [...],
    "rules": [
        {
            "identifier": "<IDENTIFIER>",
            "description": "An employee is eligible for LoAP if and only if..."
        }
    ]
}

When a response violates AR policies, the system identifies which rules were violated and provides information about the conflicts. The feedback from the AR policy validation can be routed back to improve the model’s output, promoting compliance while maintaining response quality.

Possible use cases

Automated Reasoning checks can be applied across various industries to promote accuracy, compliance, and reliability in AI-generated responses while maintaining industry-specific standards and regulations. Although we have tested these checks across multiple applications, we continue to explore additional potential use cases. The following table provides some applications across different sectors.

Industry Use Cases
Healthcare
  • Validate AI-generated treatment recommendations against clinical care protocols and guidelines
  • Verify medication dosage calculations and check for potential drug interactions
  • Make sure patient education materials align with medical best practices
  • Validate clinical documentation for regulatory compliance
Financial Services
  • Verify investment recommendations against regulatory requirements and risk policies
  • Validate customer communications for compliance with financial regulations
  • Verify that credit decision explanations meet fairness and transparency guidelines
  • Check transaction processing against anti-fraud and anti-money laundering policies
Travel and Hospitality
  • Validate booking and ticketing policies for accuracy
  • Verify loyalty program benefit calculations follow established rules
  • Verify travel documentation requirements and restrictions
  • Validate pricing and refund calculations
Insurance
  • Verify claim processing decisions against policy terms
  • Validate coverage explanations for accuracy and completeness
  • Make sure that risk assessment recommendations follow underwriting guidelines
  • Check policy documentation for regulatory compliance
Energy and Utilities
  • Validate maintenance scheduling against equipment specifications
  • Verify emergency response protocols for different scenarios
  • Make sure that field operation instructions follow safety guidelines
  • Check grid management decisions against operational parameters
Manufacturing
  • Validate quality control procedures against industry standards
  • Verify production scheduling against capacity and resource constraints
  • Make sure that safety protocols are followed in operational instructions
  • Check inventory management decisions against supply chain policies

Best practices for implementation

Successfully implementing Automated Reasoning checks requires careful attention to detail and a systematic approach to achieve optimal validation accuracy and reliable results. The following are some key best practices:

  • Document preparation – Use structured text-based PDF documents. Content should be limited to 6,000 characters. Avoid complex formatting that could interfere with the logical model generation.
  • Intent description engineering – Create precise policy intents using a clear format. The intent should comprehensively cover expected use cases and potential edge cases. For example:
    Create a logical model for [USE CASE] with policy rules. 
    Users will ask questions about [SPECIFIC TOPICS].
    Example Q&A: [INCLUDE SAMPLE].

  • Policy validation – Review the generated rules and variables to make sure they accurately capture your business logic and policy requirements. Regular audits of these rules help maintain alignment with current business policies.
  • Comprehensive testing –Develop a diverse set of sample Q&As in the test playground to evaluate different validation scenarios (valid, valid with suggestions, and invalid responses). Include edge cases and complex scenarios to provide robust validation coverage.
  • Iterative improvement –Regularly update rules and LLM applications based on validation feedback, paying special attention to suggested variables and invalid results to enhance response accuracy. Maintain a feedback loop for continuous refinement.
  • Version control management – Implement a systematic approach to policy versioning, maintaining detailed documentation of changes and conducting proper testing before deploying new versions. This helps track policy evolution and facilitates rollbacks if needed.
  • Error handling strategy – Develop a comprehensive plan for handling different validation results, including specific procedures for managing invalid responses and incorporating suggested improvements into the response generation process.
  • Runtime optimization – Understand and monitor the two-step validation process (fact extraction and logic validation) to achieve optimal performance. Regularly review validation results to identify patterns that might indicate needed improvements in variable descriptions or rule definitions.
  • Feedback integration – Establish a systematic process for collecting and analyzing validation feedback, particularly focusing on cases where NO_DATA is returned or when factual claims are incorrectly extracted. Use this information to continuously refine variable descriptions and policy rules.

Conclusion

Amazon Bedrock Automated Reasoning checks represent a significant advancement in formally verifying the outputs of generative AI applications. By combining rigorous mathematical validation with a user-friendly interface, this feature addresses one of the most critical challenges in AI deployment: maintaining factual consistency and minimizing hallucinations. The solution’s ability to validate AI-generated responses against established policies using formal logic provides organizations with a powerful framework for building trustworthy AI applications that can be confidently deployed in production environments.

The versatility of Automated Reasoning checks, demonstrated through various industry use cases and implementation approaches, makes it a valuable tool for organizations across sectors. Whether implemented through the Amazon Bedrock console or programmatically using APIs, the feature’s comprehensive validation capabilities, detailed feedback mechanisms, and integration with existing AWS services enable organizations to establish quality control processes that scale with their needs. The best practices outlined in this post provide a foundation for organizations to maximize the benefits of this technology while maintaining high standards of accuracy.

As enterprises continue to expand their use of generative AI, the importance of automated validation mechanisms becomes increasingly critical. We encourage organizations to explore Amazon Bedrock Automated Reasoning checks and use its capabilities to build more reliable and accurate AI applications. To help you get started, we’ve provided detailed implementation guidance, practical examples, and a Jupyter notebook with code snippets in our GitHub repository that demonstrate how to effectively integrate this feature into your generative AI development workflow. Through systematic validation and continuous refinement, organizations can make sure that their AI applications deliver consistent, accurate, and trustworthy results.


About the Authors

Adewale Akinfaderin is a Sr. Data Scientist–Generative AI, Amazon Bedrock, where he contributes to cutting edge innovations in foundational models and generative AI applications at AWS. His expertise is in reproducible and end-to-end AI/ML methods, practical implementations, and helping global customers formulate and develop scalable solutions to interdisciplinary problems. He has two graduate degrees in physics and a doctorate in engineering.

Nafi Diallo is a Sr. Applied Scientist in the Automated Reasoning Group and holds a PhD in Computer Science. She is passionate about using automated reasoning to ensure the security of computer systems, improve builder productivity, and enable the development of trustworthy and responsible AI workloads. She worked for more than 5 years in the AWS Application Security organization, helping build scalable API security testing solutions and shifting security assessment left.

Read More

AWS App Studio introduces a prebuilt solutions catalog and cross-instance Import and Export

AWS App Studio introduces a prebuilt solutions catalog and cross-instance Import and Export

AWS App Studio is a generative AI-powered service that uses natural language to build business applications, empowering a new set of builders to create applications in minutes. With App Studio, technical professionals such as IT project managers, data engineers, enterprise architects, and solution architects can quickly develop applications tailored to their organization’s needs—without requiring deep software development skills. Common use cases range from inventory management and approval workflows to content management and operational portals, and beyond – App Studio adapts to streamline a wide variety of business applications.

Since the general availability of App Studio in November 2024, customers across diverse industries have adopted it to build scalable, enterprise-grade applications, transforming their development processes and accelerating time-to-market. App Studio customers, including both enterprises and system integrators, have shared the need for portability and reusability across App Studio instances. Based on their experience, two areas of interests emerged:

  • Getting started – New customers and builders asked to learn and explore the product through readily available examples and patterns that explain application building possibilities in App Studio.
  • Optimizing time to value – Teams often validate use cases in a sandbox before moving to production. This highlights an interest in a more efficient approach to share and deploy applications across multiple App Studio instances.

Today, App Studio announced two new features to accelerate application building:

  • Prebuilt solutions catalog – Featuring a set of practical examples and common patterns (like S3 and Bedrock integration) to accelerate getting started and enable deployment of applications from the catalog to production environments in less than 15 minutes.
  • Cross-instance Import and Export – Enabling straightforward and self-service migration of App Studio applications across AWS Regions and AWS accounts.

In this post, we walk through how to use the prebuilt solutions catalog to get started quickly and use the Import and Export feature

Prerequisites

To follow along with this post, you should have the following prerequisites:

  • Access to App Studio. For more information, see Setting up and signing in to App Studio.
  • Optional: Review App Studio concepts to familiarize yourself with important App Studio concepts.
  • Optional: An understanding of basic web development concepts, such as JavaScript syntax.
  • Optional: Familiarity with AWS services.

Prebuilt solutions catalog

App Studio is introducing a prebuilt solutions catalog to accelerate the way builders approach application building. This resource offers a diverse collection of prebuilt applications that can be seamlessly imported into your App Studio instance, serving as both a learning tool and a rapid deployment solution. By providing access to proven patterns and prebuilt solutions, App Studio significantly reduces the initial setup time for builders, enabling you to move from concept to production in less than 15 minutes.

The catalog includes a variety of practical use cases including a Product Adoption Tracker to manage customer feedback, track feature requests, and summarize meeting notes with AI. To import the Product Adoption Tracker, navigate to the prebuilt solutions catalog, copy an import code, and follow the import instructions in the next section.

Import an application

You now have the ability to import an App Studio application from a different App Studio instance. Importing applications is available to all builders and admins.

Complete the following steps to import an App Studio application:

  1. Sign in and launch the App Studio instance where you want to import an application.
  2. Choose My applications in the navigation pane.
  3. Choose the dropdown menu next to Create app and choose Import app.
  1. Enter an import code from the prebuilt app catalog or that you generated by the export process outlined in the next section and choose Import. Depending on the application size, you might need to wait a few seconds for the import to finish.
  2. After completion, the application will be imported to your development environment. You can explore the debug panel at the bottom of the page to understand which custom connectors need to be connected to automations and entities.

Now that we have successfully imported an application, let’s walk through how we can export our own applications to a different App Studio instance.

Export an application

You now have the ability to export an App Studio application to a different App Studio instance. Generating an application export creates a static snapshot with all artifacts needed to recreate the application—automations, components, and entities. After importing, you will need to reconnect custom connectors to automations and entities.

Application security and control are maintained through a robust permissions system. Only authorized application owners and co-owners can generate application exports and restrict which App Studio instances can import a given application. If needed, application owners can revoke access by deactivating the import link at any time.

To export an App Studio application, complete the following steps:

    1. Sign in to the App Studio instance that you want to export an application from.
    2. Choose My applications in the navigation pane.
    3. Choose the dropdown menu next to Edit and choose Export.
    4. To restrict which App Studio instances can import this application, configure application import permissions:
      • Anyone with the import code can import this application – Grant import permissions to all instances. Only select this option if you want anyone with the import code to have access to import your application.
      • Only specified App Studio instances can import this application – Provide the specific instance IDs that can import the application (multiple instances can be separated by commas). To find your instance ID, navigate to your instance’s account settings by choosing Account settings on the App Studio console.
    5. Choose Generate import code to generate a unique import code.
    6. Two additional options for managing import codes are available after the application has been exported at least once to application owners and co-owners:
      • Generate new import code – When you make updates to this application, you will need to generate a new import code by choosing Generate new code. Generating a new code invalidates the old code, but will not automatically refresh existing imported applications.
      • Delete import code – To stop application import access, choose this option. Deleting the import code will invalidate the current code and prevent subsequent import attempts. Applications previously created using this code will continue to work.

Considerations

The following are some key considerations for using the prebuilt solutions catalog and importing and exporting applications across App Studio instances:

  • There is no cost associated with importing and exporting applications, including importing applications from the prebuilt solutions catalog.
  • Applications cannot be imported into the same instance, but you can achieve a similar result of replicating functionality within an instance by duplicating apps, components, and pages.
  • There are no limits on the number of applications you can import or export. The maximum number of applications in an App Studio instance is subject to service quotas.

Conclusion

Jumpstart your app building workflow with App Studio’s prebuilt solutions catalog and Import and Export features. Effortlessly migrate applications across AWS instances, collaborate with teams, and transfer applications to clients. Start using App Studio’s prebuilt solutions catalog and Import and Export features today – we’re excited to see how you will use these features to accelerate your application building journey.

To learn more about App Studio, explore more features on the App Studio page. Get started with App Studio in the AWS Management Console. Experience the App Studio workshop for hands-on learning, and join the conversation in the #aws-app-studio channel in the AWS Developers Slack workspace.

Read more about App Studio

Watch App Studio demos


About the Authors

Umesh Kalaspurkar is a Principal Solutions Architect at AWS based in New York, bringing over two decades of expertise in digital transformation and innovation across both enterprise and startup environments. He specializes in designing solutions that help organizations overcome their most pressing challenges. When not architecting cloud solutions, Umesh cherishes time spent with his children, carving down ski slopes, and exploring new destinations around the world.

Samit Kumbhani is an AWS Senior Solutions Architect in the New York City area with over 18 years of experience. He currently partners with independent software vendors (ISVs) to build highly scalable, innovative, and secure cloud solutions. Outside of work, Samit enjoys playing cricket, traveling, and biking.

Haoran (Hao) Su is a Senior Technical Account Manager in New York City with over 8 years of experience with the cloud. He collaborates with Software, Internet and Model providers (SWIM) and Digitally Native Businesses (DNB) to improve their financial and operational efficiency, and architectural resiliency. Outside of work, Hao enjoys international traveling, exercising, and streaming.

Anshika Tandon is a Senior Product Manager – Technical at AWS with a decade of experience building AI and B2B SaaS products from concept to launch. She excels in cross-functional product leadership, focusing on delivering measurable business value through strategic initiatives. A global citizen having lived in 10 cities and visited 26 countries, Anshika balances her professional life with interests in skiing, travel, and performing in improv comedy shows.

Alex (Tao) Jia is a Senior Product Marketing Manager at AWS, focusing on generative AI. With 15+ years in tech marketing, she drives products from concept to scale, shaping positioning, fostering adoption, and leading global go-to-market strategies. She has worked with enterprises and ISVs, reaching millions of developers. Outside work, Alex enjoys exploring technology’s impact on humanity through books, research, and conversations.

Read More

Build a generative AI enabled virtual IT troubleshooting assistant using Amazon Q Business

Build a generative AI enabled virtual IT troubleshooting assistant using Amazon Q Business

Today’s organizations face a critical challenge with the fragmentation of vital information across multiple environments. As businesses increasingly rely on diverse project management and IT service management (ITSM) tools such as ServiceNow, Atlassian Jira and Confluence, employees find themselves navigating a complex web of systems to access crucial data.

This isolated approach leads to several challenges for IT leaders, developers, program managers, and new employees. For example:

  • Inefficiency: Employees need to access multiple systems independently to gather data insights and remediation steps during incident troubleshooting
  • Lack of integration: Information is isolated across different environments, making it difficult to get a holistic view of ITSM activities
  • Time-consuming: Searching for relevant information across multiple systems is time-consuming and reduces productivity
  • Potential for inconsistency: Using multiple systems increases the risk of inconsistent data and processes across the organization.

Amazon Q Business is a fully managed, generative artificial intelligence (AI) powered assistant that can address challenges such as inefficient, inconsistent information access within an organization by providing 24/7 support tailored to individual needs. It handles a wide range of tasks such as answering questions, providing summaries, generating content, and completing tasks based on data in your organization. Amazon Q Business offers over 40 data source connectors that connect to your enterprise data sources and help you create a generative AI solution with minimal configuration. Amazon Q Business also supports over 50 actions across popular business applications and platforms. Additionally, Amazon Q Business offers enterprise-grade data security, privacy, and built-in guardrails that you can configure.

This blog post explores an innovative solution that harnesses the power of generative AI to bring value to your organization and ITSM tools with Amazon Q Business.

Solution overview

The solution architecture shown in the following figure demonstrates how to build a virtual IT troubleshooting assistant by integrating with multiple data sources such as Atlassian Jira, Confluence, and ServiceNow. This solution helps streamline information retrieval, enhance collaboration, and significantly boost overall operational efficiency, offering a glimpse into the future of intelligent enterprise information management.

Reference Architecture to build a generative AI-enabled virtual IT troubleshooting assistant using Amazon Q Business

This solution integrates with ITSM tools such as ServiceNow Online and project management software such as Atlassian Jira and Confluence using the Amazon Q Business data source connectors. You can use a data source connector to combine data from different places into a central index for your Amazon Q Business application. For this demonstration, we use the Amazon Q Business native index and retriever. We also configure an application environment and grant access to users to interact with an application environment using AWS IAM Identity Center for user management. Then, we provision subscriptions for IAM Identity Center users and groups.

Authorized users interact with the application environment through a web experience. You can share the web experience endpoint URL with your users so they can open the URL and authenticate themselves to start chatting with the generative AI application powered by Amazon Q Business.

Deployment

Start by setting up the architecture and data needed for the demonstration.

  1. We’ve provided an AWS CloudFormation template in our GitHub repository that you can use to set up the environment for this demonstration. If you don’t have existing Atlassian Jira, Confluence, and ServiceNow accounts follow these steps to create trial accounts for the demonstration
  2. Once step 1 is complete, open the AWS Management Console for Amazon Q Business. On the Applications tab, open your application to see the data sources. See Best practices for data source connector configuration in Amazon Q Business to understand best practicesSolution Deployment steps for Reference Architecture to build a generative AI-enabled virtual IT troubleshooting assistant using Amazon Q Business
  3. To improve retrieved results and customize the end user chat experience, use Amazon Q to map document attributes from your data sources to fields in your Amazon Q index. Choose the Atlassian Jira, Confluence Cloud and ServiceNow Online links to learn more about their document attributes and field mappings. Select the data source to edit its configurations under Actions. Select the appropriate fields that you think would be important for your search needs. Repeat the process for all of the data sources. The following figure is an example of some of the Atlassian Jira project field mappings that we selected
    Solution Deployment steps for Reference Architecture to build a generative AI-enabled virtual IT troubleshooting assistant using Amazon Q Business
  4. Sync mode enables you to choose how you want to update your index when your data source content changes. Sync run schedule sets how often you want Amazon Q Business to synchronize your index with the data source. For this demonstration, we set the Sync mode to Full Sync and the Frequency to Run on demand. Update Sync mode with your changes and choose Sync Now to start syncing data sources. When you initiate a sync, Amazon Q will crawl the data source to extract relevant documents, then sync them to the Amazon Q index, making them searchableSolution Deployment steps for Reference Architecture to build a generative AI-enabled virtual IT troubleshooting assistant using Amazon Q Business
  5. After syncing data sources, you can configure the metadata controls in Amazon Q Business. An Amazon Q Business index has fields that you can map your document attributes to. After the index fields are mapped to document attributes and are search-enabled, admins can use the index fields to boost results from specific sources, or by end users to filter and scope their chat results to specific data. Boosting chat responses based on document attributes helps you rank sources that are more authoritative higher than other sources in your application environment. See Boosting chat responses using metadata boosting to learn more about metadata boosting and metadata controls. The following figure is an example of some of the metadata controls that we selectedSolution Deployment steps for Reference Architecture to build a generative AI-enabled virtual IT troubleshooting assistant using Amazon Q Business
  6. For the purposes of the demonstration, use the Amazon Q Business web experience. Select your application under Applications and then select the Deployed URL link in the web experience settingsSolution Deployment steps for Reference Architecture to build a generative AI-enabled virtual IT troubleshooting assistant using Amazon Q Business
  7. Enter the same username, password and multi-factor authentication (MFA) authentication for the user that you created previously in IAM Identity Center to sign in to the Amazon Q Business web experience generative AI assistantSolution Deployment steps for Reference Architecture to build a generative AI-enabled virtual IT troubleshooting assistant using Amazon Q Business

Demonstration

Now that you’ve signed in to the Amazon Q Business web experience generative AI assistant (shown in the previous figure), let’s try some natural language queries.

IT leaders: You’re an IT leader and your team is working on a critical project that needs to hit the market quickly. You can now ask questions in natural language to Amazon Q Business to get answers based on your company data.

Developers: Developers who want to know information such as the tasks that are assigned to them, specific tasks details, or issues in a particular sub segment. They can now get these questions answered from Amazon Q Business without necessarily signing in to either Atlassian Jira or Confluence.

Project and program managers: Project and program managers can monitor the activities or developments in their projects or programs from Amazon Q Business without having to contact various teams to get individual status updates.

New employees or business users: A newly hired employee who’s looking for information to get started on a project or a business user who needs tech support can use the generative AI assistant to get the information and support they need.

Benefits and outcomes

From the demonstrations, you saw that various users whether they are leaders, managers, developers, or business users can benefit from using a generative AI solution like our virtual IT assistant built using Amazon Q Business. It removes the undifferentiated heavy lifting of having to navigate multiple solutions and cross-reference multiple items and data points to get answers. Amazon Q Business can use the generative AI to provide responses with actionable insights in just few seconds. Now, let’s dive deeper into some of the additional benefits that this solution provides.

  • Increased efficiency: Centralized access to information from ServiceNow, Atlassian Jira, and Confluence saves time and reduces the need to switch between multiple systems.
  • Enhanced decision-making: Comprehensive data insights from multiple systems leads to better-informed decisions in incident management and problem-solving for various users across the organization.
  • Faster incident resolution: Quick access to enterprise data sources and knowledge and AI-assisted remediation steps can significantly reduce mean time to resolutions (MTTR) for cases with elevated priorities.
  • Improved knowledge management: Access to Confluence’s architectural documents and other knowledge bases such as ServiceNow’s Knowledge Articles promotes better knowledge sharing across the organization. Users can now get responses based on information from multiple systems.
  • Seamless integration and enhanced user experience: Better integration between ITSM processes, project management, and software development streamlines operations. This is helpful for organizations and teams that incorporate agile methodologies.
  • Cost savings: Reduction in time spent searching for information and resolving incidents can lead to significant cost savings in IT operations.
  • Scalability: Amazon Q Business can grow with the organization, accommodating future needs and additional data sources as required. Organization can create more Amazon Q Business applications and share purpose-built Amazon Q Business apps within their organizations to manage repetitive tasks.

Clean up

After completing your exploration of the virtual IT troubleshooting assistant, delete the CloudFormation stack from your AWS account. This action terminates all resources created during deployment of this demonstration and prevents unnecessary costs from accruing in your AWS account.

Conclusion

By integrating Amazon Q Business with enterprise systems, you can create a powerful virtual IT assistant that streamlines information access and improves productivity. The solution presented in this post demonstrates the power of combining AI capabilities with existing enterprise systems to create powerful unified ITSM solutions and more efficient and user-friendly experiences.

We provide the sample virtual IT assistant using an Amazon Q Business solution as open source—use it as a starting point for your own solution and help us make it better by contributing fixes and features through GitHub pull requests. Visit the GitHub repository to explore the code, choose Watch to be notified of new releases, and check the README for the latest documentation updates.

Learn more:

For expert assistance, AWS Professional Services, AWS Generative AI partner solutions, and AWS Generative AI Competency Partners are here to help.

We’d love to hear from you. Let us know what you think in the comments section, or use the issues forum in the GitHub repository.


About the Authors

Jasmine Rasheed Syed is a Senior Customer Solutions manager at AWS, focused on accelerating time to value for the customers on their cloud journey by adopting best practices and mechanisms to transform their business at scale. Jasmine is a seasoned, result oriented leader with 20+ years of progressive experience in Insurance, Retail & CPG with exemplary track record spanning across Business Development, Cloud/Digital Transformation, Delivery, Operational & Process Excellence and Executive Management.

Suprakash Dutta is a Sr. Solutions Architect at Amazon Web Services. He focuses on digital transformation strategy, application modernization and migration, data analytics, and machine learning. He is part of the AI/ML community at AWS and designs Generative AI and Intelligent Document Processing(IDP) solutions.

Joshua Amah is a Partner Solutions Architect at Amazon Web Services, specializing in supporting SI partners with a focus on AI/ML and generative AI technologies. He is passionate about guiding AWS Partners in using cutting-edge technologies and best practices to build innovative solutions that meet customer needs. Joshua provides architectural guidance and strategic recommendations for both new and existing workloads.

Brad King is an Enterprise Account Executive at Amazon Web Services specializing in translating complex technical concepts into business value and making sure that clients achieve their digital transformation goals efficiently and effectively through long term partnerships.

Joseph Mart is an AI/ML Specialist Solutions Architect at Amazon Web Services (AWS). His core competence and interests lie in machine learning applications and generative AI. Joseph is a technology addict who enjoys guiding AWS customers on architecting their workload in the AWS Cloud. In his spare time, he loves playing soccer and visiting nature.

Read More

Process formulas and charts with Anthropic’s Claude on Amazon Bedrock

Process formulas and charts with Anthropic’s Claude on Amazon Bedrock

Research papers and engineering documents often contain a wealth of information in the form of mathematical formulas, charts, and graphs. Navigating these unstructured documents to find relevant information can be a tedious and time-consuming task, especially when dealing with large volumes of data. However, by using Anthropic’s Claude on Amazon Bedrock, researchers and engineers can now automate the indexing and tagging of these technical documents. This enables the efficient processing of content, including scientific formulas and data visualizations, and the population of Amazon Bedrock Knowledge Bases with appropriate metadata.

Amazon Bedrock is a fully managed service that provides a single API to access and use various high-performing foundation models (FMs) from leading AI companies. It offers a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI practices. Anthropic’s Claude 3 Sonnet offers best-in-class vision capabilities compared to other leading models. It can accurately transcribe text from imperfect images—a core capability for retail, logistics, and financial services, where AI might glean more insights from an image, graphic, or illustration than from text alone. The latest of Anthropic’s Claude models demonstrate a strong aptitude for understanding a wide range of visual formats, including photos, charts, graphs and technical diagrams. With Anthropic’s Claude, you can extract more insights from documents, process web UIs and diverse product documentation, generate image catalog metadata, and more.

In this post, we explore how you can use these multi-modal generative AI models to streamline the management of technical documents. By extracting and structuring the key information from the source materials, the models can create a searchable knowledge base that allows you to quickly locate the data, formulas, and visualizations you need to support your work. With the document content organized in a knowledge base, researchers and engineers can use advanced search capabilities to surface the most relevant information for their specific needs. This can significantly accelerate research and development workflows, because professionals no longer have to manually sift through large volumes of unstructured data to find the references they need.

Solution overview

This solution demonstrates the transformative potential of multi-modal generative AI when applied to the challenges faced by scientific and engineering communities. By automating the indexing and tagging of technical documents, these powerful models can enable more efficient knowledge management and accelerate innovation across a variety of industries.

In addition to Anthropic’s Claude on Amazon Bedrock, the solution uses the following services:

  • Amazon SageMaker JupyterLab – The SageMakerJupyterLab application is a web-based interactive development environment (IDE) for notebooks, code, and data. JupyterLab application’s flexible and extensive interface can be used to configure and arrange machine learning (ML) workflows. We use JupyterLab to run the code for processing formulae and charts.
  • Amazon Simple Storage Service (Amazon S3)Amazon S3 is an object storage service built to store and protect any amount of data. We use Amazon S3 to store sample documents that are used in this solution.
  • AWS LambdaAWS Lambda is a compute service that runs code in response to triggers such as changes in data, changes in application state, or user actions. Because services such as Amazon S3 and Amazon Simple Notification Service (Amazon SNS) can directly trigger a Lambda function, you can build a variety of real-time serverless data-processing systems.

The solution workflow contains the following steps:

  1. Split the PDF into individual pages and save them as PNG files.
  2. With each page:
    1. Extract the original text.
    2. Render the formulas in LaTeX.
    3. Generate a semantic description of each formula.
    4. Generate an explanation of each formula.
    5. Generate a semantic description of each graph.
    6. Generate an interpretation for each graph.
    7. Generate metadata for the page.
  3. Generate metadata for the full document.
  4. Upload the content and metadata to Amazon S3.
  5. Create an Amazon Bedrock knowledge base.

The following diagram illustrates this workflow.

Prerequisites

  1. If you’re new to AWS, you first need to create and set up an AWS account.
  2. Additionally, in your account under Amazon Bedrock, request access to anthropic.claude-3-5-sonnet-20241022-v2:0 if you don’t have it already.

Deploy the solution

Complete the following steps to set up the solution:

  1. Launch the AWS CloudFormation template by choosing Launch Stack (this creates the stack in the us-east-1 AWS Region):

Launch CloudFormation stack

  1. When the stack deployment is complete, open the Amazon SageMaker AI
  2. Choose Notebooks in the navigation pane.
  3. Locate the notebook claude-scientific-docs-notebook and choose Open JupyterLab.

  1. In the notebook, navigate to notebooks/process_scientific_docs.ipynb.

  1. Choose conda_python3 as the kernel, then choose Select.

  1. Walk through the sample code.

Explanation of the notebook code

In this section, we walk through the notebook code.

Load data

We use example research papers from arXiv to demonstrate the capability outlined here. arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.

We download the documents and store them under a samples folder locally. Multi-modal generative AI models work well with text extraction from image files, so we start by converting the PDF to a collection of images, one for each page.

Get Metadata from formulas

After the image documents are available, you can use Anthropic’s Claude to extract formulas and metadata with the Amazon Bedrock Converse API. Additionally, you can use the Amazon Bedrock Converse API to obtain an explanation of the extracted formulas in plain language. By combining the formula and metadata extraction capabilities of Anthropic’s Claude with the conversational abilities of the Amazon Bedrock Converse API, you can create a comprehensive solution for processing and understanding the information contained within the image documents.

We start with the following example PNG file.

We use the following request prompt:

sample_prompt = """

Evaluate this page line by line. 
For each line, if it is a formula, convert this math expression to latex format. 
Next describe the formula in plain language Be sure to enclose Latex formulas in double dollar sign for example: $$ <math expression> $$ Use markdown syntax to format your output
"""

file = "./samples/2003.10304/page_2.png"

display(Image(filename=file, width=600))
output, result = stream_conversation(message=sample_prompt, file_paths=[file])
response_text = result["content"]
display(Markdown(response_text))
print(output)

We get the following response, which shows the extracted formula converted to LaTeX format and described in plain language, enclosed in double dollar signs.

Get metadata from charts

Another useful capability of multi-modal generative AI models is the ability to interpret graphs and generate summaries and metadata. The following is an example of how you can obtain metadata of the charts and graphs using simple natural language conversation with models. We use the following graph.

We provide the following request:

sample_prompt = f"""
You are a data scientist expert who has perfect vision and pay a lot of attention to details. 
interpret the graph on this page
provide the answer in markdown format """

file = "./samples/2003.10304/page_5.png"

display(Image(filename=file, width=600))
output, result = stream_conversation(message=sample_prompt, file_paths=[file])
response_text = result["content"]
display(Markdown(response_text))
print(output)

The response returned provides its interpretation of the graph explaining the color-coded lines and suggesting that overall, the DSC model is performing well on the training data, achieving a high Dice coefficient of around 0.98. However, the lower and fluctuating validation Dice coefficient indicates potential overfitting and room for improvement in the model’s generalization performance.

Generate metadata

Using natural language processing, you can generate metadata for the paper to aid in searchability.

We use the following request:

sample_prompt = f"""
Generate a metadata json object for this research paper. 

{{
"title": "",
"authors":  [],
"institutions": [],
"topics": [],
"funding-sources": [],
"algorithms":[],
"data_sets":[]
}}
"""

file = './samples/2003.10304/page_0.png'

We get the following response, including formula markdown and a description.

{

"title": "Attention U-Net Based Adversarial Architectures for Chest X-ray Lung Segmentation",

"authors": ["Gusztáv Gaál", "Balázs Maga", "András Lukács"], "institutions": ["AI Research Group, Institute of Mathematics, Eötvös Loránd University, Budapest, Hungary"],

"topics": [ "Chest X-ray segmentation", "Medical imaging", "Deep learning", "Computer-aided detection", "Lung segmentation" ],

"funding-sources": [],

"algorithms": [ "U-Net", "Adversarial architectures", "Fully Convolutional Neural Networks (FCN)", "Mask R-CNN" ],

"data_sets": ["JSRT dataset"]

}

Use your extracted data in a knowledge base

Now that we’ve prepared our data with formulas, analyzed charts, and metadata, we will create an Amazon Bedrock knowledge base. This will make the information searchable and enable question-answering capabilities.

Prepare your Amazon Bedrock knowledge base

To create a knowledge base, first upload the processed files and metadata to Amazon S3:

markdown_file_key = "2003.10304/kb/2003.10304.md" 

s3.upload_file(markdown_file, knowledge_base_bucket_name, markdown_file_key) 

print(f"File {markdown_file} uploaded successfully.") 

metadata_file_key = "2003.10304/kb/2003.10304.md.metadata.json" 

s3.upload_file(metadata_file, knowledge_base_bucket_name, metadata_file_key) 

print(f"File {metadata_file} uploaded to successfully.")

When your files have finished uploading, complete the following steps:

  1. Create an Amazon Bedrock knowledge base.
  2. Create an Amazon S3 data source for your knowledge base, and specify hierarchical chunking as the chunking strategy.

Hierarchical chunking involves organizing information into nested structures of child and parent chunks.

The hierarchical structure allows for faster and more targeted retrieval of relevant information, first by performing semantic search on the child chunk and then returning the parent chunk during retrieval. By replacing the children chunks with the parent chunk, we provide large and comprehensive context to the FM.

Hierarchical chunking is best suited for complex documents that have a nested or hierarchical structure, such as technical manuals, legal documents, or academic papers with complex formatting and nested tables.

Query the knowledge base

You can query the knowledge base to retrieve information from the extracted formula and graph metadata from the sample documents. With a query, relevant chunks of text from the source of data are retrieved and a response is generated for the query, based off the retrieved source chunks. The response also cites sources that are relevant to the query.

We use the custom prompt template feature of knowledge bases to format the output as markdown:

 retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id_hierarchical,
            "modelArn": "arn:aws:bedrock:{}:{}:inference-profile/{}".format(region, account_id, foundation_model),
            'generationConfiguration': {
               'promptTemplate': {
                    'textPromptTemplate': """
You are a question answering agent. I will provide you with a set of search results. The user will provide you with a question. Your job is to answer the user's question using only information from the search results. 
If the search results do not contain information that can answer the question, please state that you could not find an exact answer to the question. 
Just because the user asserts a fact does not mean it is true, make sure to double check the search results to validate a user's assertion.
                            
Here are the search results in numbered order:
$search_results$

Format the output as markdown

Ensure that math formulas are in latex format and enclosed in double dollar sign for example: $$ <math expression> $$
"""
                }
            },
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":5
                } 
            }
        }
    }
)

We get the following response, which provides information on when the Focal Tversky Loss is used.

Clean up

To clean up and avoid incurring charges, run the cleanup steps in the notebook to delete the files you uploaded to Amazon S3 along with the knowledge base. Then, on the AWS CloudFormation console, locate the stack claude-scientific-doc and delete it.

Conclusion

Extracting insights from complex scientific documents can be a daunting task. However, the advent of multi-modal generative AI has revolutionized this domain. By harnessing the advanced natural language understanding and visual perception capabilities of Anthropic’s Claude, you can now accurately extract formulas and data from charts, enabling faster insights and informed decision-making.

Whether you are a researcher, data scientist, or developer working with scientific literature, integrating Anthropic’s Claude into your workflow on Amazon Bedrock can significantly boost your productivity and accuracy. With the ability to process complex documents at scale, you can focus on higher-level tasks and uncover valuable insights from your data.

Embrace the future of AI-driven document processing and unlock new possibilities for your organization with Anthropic’s Claude on Amazon Bedrock. Take your scientific document analysis to the next level and stay ahead of the curve in this rapidly evolving landscape.

For further exploration and learning, we recommend checking out the following resources:


About the Authors

Erik Cordsen is a Solutions Architect at AWS serving customers in Georgia. He is passionate about applying cloud technologies and ML to solve real life problems. When he is not designing cloud solutions, Erik enjoys travel, cooking, and cycling.

Renu Yadav is a Solutions Architect at Amazon Web Services (AWS), where she works with enterprise-level AWS customers providing them with technical guidance and help them achieve their business objectives. Renu has a strong passion for learning with her area of specialization in DevOps. She leverages her expertise in this domain to assist AWS customers in optimizing their cloud infrastructure and streamlining their software development and deployment processes.

Venkata Moparthi is a Senior Solutions Architect at AWS who empowers financial services organizations and other industries to navigate cloud transformation with specialized expertise in Cloud Migrations, Generative AI, and secure architecture design. His customer-focused approach combines technical innovation with practical implementation, helping businesses accelerate digital initiatives and achieve strategic outcomes through tailored AWS solutions that maximize cloud potential.

Read More