Revolutionizing earth observation with geospatial foundation models on AWS

Revolutionizing earth observation with geospatial foundation models on AWS

Emerging transformer-based vision models for geospatial data—also called geospatial foundation models (GeoFMs)—offer a new and powerful technology for mapping the earth’s surface at a continental scale, providing stakeholders with the tooling to detect and monitor surface-level ecosystem conditions such as forest degradation, natural disaster impact, crop yield, and many others.

GeoFMs represent an emerging research field and are a type of pre-trained vision transformer (ViT) specifically adapted to geospatial data sources. GeoFMs offer immediate value without training. The models excel as embedding models for geospatial similarity search and ecosystem change detection. With minimal labeled data, GeoFMs can be fine-tuned for custom tasks such as land surface classification, semantic segmentation, or pixel-level regression. Many leading models are available under very permissive licenses making them accessible for a wide audience. Examples include SatVision-BasePrithvi-100MSatMAE, and Clay (used in this solution).

In this post, we explore how Clay Foundation’s Clay foundation model, available on Hugging Face, can be deployed for large-scale inference and fine-tuning on Amazon SageMaker. For illustrative purposes, we focus on a deforestation use case from the Amazon rainforest, one of the most biodiverse ecosystems in the world. Given the strong evidence that the Amazon forest system could soon be reaching a tipping point, it presents an important domain of study and a high-impact application area for GeoFMs, for example, through early detection of forest degradation. However, the solution presented here generalizes to a wide range of geospatial use cases. It also comes with ready-to-deploy code samples to help you get started quickly with deploying GeoFMs in your own applications on AWS.

Let’s dive in!

Solution overview

At the core of our solution is a GeoFM. Architecturally, GeoFMs build on the ViT architecture first introduced in the seminal 2022 research paper An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. To account for the specific properties of geospatial data (multiple channels ranging from ultraviolet to infrared, varying electromagnetic spectrum coverage, and spatio-temporal nature of data), GeoFMs incorporate several architectural innovations such as variable input size (to capture multiple channels) or the addition of positional embeddings that capture spatio-temporal aspects such as seasonality and location on earth. The pre-training of these models is conducted on unlabeled geospatial data sampled from across the globe using masked autoencoders (MAE) as self-supervised learners. Sampling from global-scale data helps ensure that diverse ecosystems and surface types are represented appropriately in the training set. What results are general purpose models that can be used for three core use cases:

  • Geospatial similarity search: Quickly map diverse surface types with semantic geospatial search using the embeddings to find similar items (such as deforested areas).
  • Embedding-based change detection: Analyze a time series of geospatial embeddings to identify surface disruptions over time for a specific region.
  • Custom geospatial machine learning: Fine-tune a specialized regression, classification, or segmentation model for geospatial machine learning (ML) tasks. While this requires a certain amount of labeled data, overall data requirements are typically much lower compared to training a dedicated model from the ground up.

The general solution flow is shown in the following diagram. Note that this flow diagram is highly abstracted and omits certain architectural details for reasons of clarity. For a full architecture diagram demonstrating how the flow can be implemented on AWS, see the accompanying GitHub repository. This repository also contains detailed deployment instructions to get you started quickly with applying GeoFMs to your own use cases.

End-to-end AWS GeoFM workflow showing data ingestion, model processing, embeddings generation, and specialized geospatial applications

  1. Retrieve and process satellite imagery for GeoFM inference or training: The first step is to get the raw geospatial data into a format that’s consumable by the GeoFM. This entails breaking down the large raw satellite imagery into equally-sized 256×256 pixel chips (the size that the mode expects) and normalizing pixel values, among other data preparation steps required by the GeoFM that you choose. This routine can be conducted at scale using an Amazon SageMaker AI processing job.
  2. Retrieve model weights and deploy the GeoFM: Next, retrieve the open weights of the GeoFM from a model registry of your choice (HuggingFace in this example) and deploy the model for inference. The best deployment option ultimately depends on how the model is consumed. If you need to generate embedding asynchronously, use a SageMaker AI processing or transform step. For real-time inference, consider deploying to a SageMaker AI real-time endpoint, which can be configured to auto-scale with demand, allowing for large-scale inference. In this example, we use a SageMaker AI processing job with a custom Docker image for generating embeddings in batch.
  3. Generate geospatial embeddings: The GeoFM is an encoder-only model, meaning that it outputs an embedding vector. During inference, you perform a forward pass of the pre-processed satellite image chip through the GeoFM. This produces the corresponding embedding vector, which can be thought of as a compressed representation of the information contained in the image. This process is equivalent to using text embedding models for RAG use cases or similar.

The generated geospatial embeddings can be used largely as-is for two key use cases: geospatial similarity search and ecosystem change detection.

  1. Run similarity search on the embeddings to identify semantically similar images: The GeoFM embeddings reside in the same vector space. This allows us to identify similar items by identifying vectors that are very close to a given query point. A common high-performance search algorithm for this is approximate nearest neighbor (ANN). For scalability and search performance, we index the embedding vectors in a vector database.
  2. Analyze time-series of embeddings for break points that indicate change: Instead of looking for similarity between embedding vectors, you can also look for distance. Doing this for a specific region and across time lets you pinpoint specific times where change occurs. This allows you to use embeddings for surface change detection over time, a very common use case in geospatial analytics.

Optionally, you can also fine-tune a model on top of the GeoFM.

  1. Train a custom head and run inference: To fine-tune a model you add a custom (and typically lightweight) head on top of the GeoFM and fine-tune it on a (often small) labeled dataset. The GeoFM weights remain frozen and are not retrained. The custom head takes the GeoFM-generated embedding vectors as input and produces classification masks, pixel-level recessions results, or simply a class per image, depending on the use case.

We explore the key steps of this workflow in the next sections. For additional details on the implementation—including. how to build a high-quality user interface with Solara—see the accompanying GitHub repository.

Geospatial data processing and embedding generation

Our comprehensive, four-stage data processing pipeline transforms raw satellite imagery into analysis-ready vector embeddings that power advanced geospatial analytics. This orchestrated workflow uses Amazon SageMaker AI Pipelines to create a robust, reproducible, and scalable processing architecture. The end-to-end solution can process Earth observation data for a selected region of interest, with built-in flexibility to adapt to different use cases. In this example, we use Sentinel-2 imagery from the Amazon Registry of Open Data for monitoring deforestation in the Brazilian rainforest. However, our pipeline architecture is designed to work seamlessly with other satellite image providers and resolutions (such as NAIP with 1m/pixel resolution, or Maxar and Planet Labs up to below 1m/pixel resolution).

Pipeline architecture overview

AWS GeoFM pipeline visualization showing completed preprocessing, generation, processing and consolidation steps

The SageMaker pipeline consists of four processing steps, shown in the preceding figure, each step builds on the outputs of the previous steps with intermediate results stored in Amazon Simple Storage Service (Amazon S3).

  1. Pre-process satellite tiles: Divides the satellite imagery into chips. We chose a chip size of 256×256 pixels as expected by Clay v1. For Sentinel-2 images this corresponds to an area of 2.56 x 2.56 km2.
  2. Generate embeddings: Creates 768-dimensional vector representations for the chips using the Clay v1 model.
  3. Process embeddings: Performs dimensionality reduction and computes similarity metrics (for downstream analyses).
  4. Consolidate and index: Consolidates outputs and loads embeddings vectors into a Vector store.
# Pipeline definition (simplified)
pipeline = Pipeline(
    name=f"EmbeddingPipeline-{aoi_name}",
    steps=[preprocess_step, embeddings_step, postprocessing_step, consolidation_step],
    parameters=[bucket_name_param, aoi_name_param, s2_grid_id_param, 
               chip_size_param, batch_size_param, s2_bands_param],
    sagemaker_session=pipeline_session
)

Step 1: Satellite data acquisition and chipping

The pipeline starts by accessing Sentinel-2 multispectral satellite imagery through the AWS Open Data program from S3 buckets. This imagery provides 10-meter resolution across multiple spectral bands including RGB (visible light) and NIR (near-infrared), which are critical for environmental monitoring.

This step filters out chips that have excessive cloud cover and divides large satellite scenes into manageable 256×256 pixel chips, which enables efficient parallel processing and creates uniform inputs for the foundation model. This step also runs on a SageMaker AI Processing job with a custom Docker image optimized for geospatial operations.

# Chip generation – Simplified version
def generate_chips_from_xarray(ds, chip_size, aoi_name, s2_scene_id):
    """Generates image chips of specified size from xarray dataset
       using Area of Interest (AOI) name and Sentinel-2 scene id"""
    try:
        # Calculate optimal number of jobs for parallel processing
        n_jobs = min(os.cpu_count(), 16)  # Use optimal number of cores or available CPU count
        logger.info(f"Using {n_jobs} parallel jobs for chip generation")
        
        # Generate all x,y coordinates for processing
        coordinates = [(x, y) 
                    for x in range(0, ds.sizes['x'], chip_size)
                    for y in range(0, ds.sizes['y'], chip_size)]
        
        logger.info(f"Generating {len(coordinates)} chips from scene")
        
        # Process chips in parallel
        with parallel_backend('threading', n_jobs=n_jobs):
            results = Parallel(verbose=1)(
                delayed(process_single_chip)(
                    x, y, ds, chip_size, s2_scene_id
                ) for x, y in coordinates
            )
        
        # Filter out None results (e.g., chips with too many cloud pixels)
        valid_results = [r for r in results if r is not None]
        
        logger.info(f"Successfully generated {len(valid_results)} valid chips")
        
        return valid_results
    except Exception as e:
        logger.error(f"Error in chip generation for scene {s2_scene_id}: {str(e)}")
        raise

For each chip, this step generates:

  • NetCDF datacubes (.netcdf) containing the full multispectral information
  • RGB thumbnails (.png) for visualization
  • Rich metadata (.parquet) with geolocation, timestamps, and other metadata

Step 2: Embedding generation using a Clay foundation model

The second step transforms the preprocessed image chips into vector embeddings using the Clay v1 foundation model. This is the most computationally intensive part of the pipeline, using multiple GPU instances (ml.g5.xlarge) to efficiently process the satellite imagery.

For each chip, this step:

  • Accesses the NetCDF datacube from Amazon S3
  • Normalizes the spectral bands according to the Clay v1 model’s input requirements
  • Generates both patch-level and class token (CLS) embeddings
  • Stores the embeddings as NumPy arrays (.npy) alongside the original data on S3 as intermediate store

While Clay can use all Sentinel-2 spectral bands, our implementation uses RGB and NIR as input bands to generate a 768-dimensional embedding, which provide excellent results in our examples. Customers can easily adapt the input bands based on their specific use-cases. These embeddings encapsulate high-level features such as vegetation patterns, urban structures, water bodies, and land use characteristics—without requiring explicit feature engineering.

Step 3: Embedding processing and analysis

The third step analyzes the embeddings to extract meaningful insights, particularly for time-series analysis. Running on high-memory instances, this step:

  1. Performs dimensionality reduction on the embeddings using principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) (to be used later for change detection)
  2. Computes cosine similarity between embeddings over time (an alternative for change detection)
  3. Identifies significant changes in the embeddings that might indicate surface changes
  4. Saves processed embeddings in Parquet format for efficient querying

The output includes processed embedding files that contain both the original high-dimensional vectors and their reduced representations, along with computed similarity metrics.

For change detection applications, this step establishes a baseline for each geographic location and calculates deviations from this baseline over time. These deviations, captured as vector distances, provide a powerful indicator of surface changes like deforestation, urban development, or natural disasters.

Step 4: Consolidation and vector database integration

The final pipeline step consolidates the processed embeddings into a unified dataset and loads them into vector databases optimized for similarity search. The outputs include consolidated embedding files, GeoJSON grid files for visualization, and configuration files for frontend applications.

The solution supports two vector database options:

Both options provide efficient ANN search capabilities, enabling sub-second query performance. The choice between them depends on the scale of deployment, integration requirements, and operational preferences.

With this robust data processing and embedding generation foundation in place, let’s explore the real-world applications enabled by the pipeline, beginning with geospatial similarity search.

Geospatial similarity search

Organizations working with Earth observation data have traditionally struggled with efficiently identifying specific landscape patterns across large geographic regions. Traditional Earth observation analysis requires specialized models trained on labeled datasets for each target feature. This approach forces organizations into a lengthy process of data collection, annotation, and model training before obtaining results.

In contrast, the GeoFM-powered similarity search converts satellite imagery into 768-dimensional vector embeddings that capture the semantic essence of landscape features, eliminating the need for manual feature engineering and computation of specialized indices like NDVI or NDWI.

This capability uses the Clay foundation model’s pre-training on diverse global landscapes to understand complex relationships between features without explicit programming. The result is an intuitive image-to-image search capability where users can select a reference area—such as early-stage deforestation or wildfire damage—and instantly find similar patterns across vast territories in seconds rather than weeks.

Similarity search implementation

Our implementation provides a streamlined workflow for finding similar geographic areas using the embeddings generated by the data processing pipeline. The search process involves:

  1. Reference area selection: Users select a reference chip representing a search term (for example, a deforested patch, urban development, or agricultural field)
  2. Search parameters: Users specify the number of results and a similarity threshold
  3. Vector search execution: The system retrieves similar chips using cosine similarity between embeddings
  4. Result visualization: Matching chips are highlighted on the map

Let’s dive deeper on a real-world application, taking our running example of detecting deforestation in the Mato Grosso region of the Brazilian Amazon. Traditional monitoring approaches often detect forest loss too late—after significant damage has already occurred. The Clay-powered similarity search capability offers a new approach by enabling early detection of emerging deforestation patterns before they expand into large-scale clearing operations.

Using a single reference chip showing the initial signs of forest degradation—such as selective logging, small clearings, or new access roads—analysts can instantly identify similar patterns across vast areas of the Amazon rainforest. As demonstrated in the following example images, the system effectively recognizes the subtle signatures of early-stage deforestation based on a single reference image. This capability enables environmental protection agencies and conservation organizations to deploy resources precisely, improving the anti-deforestation efforts by addressing threats to prevent major forest loss. While a single reference chip image led to good results in our examples, alternative approaches exist, such as an average vector strategy, which leverages embeddings from multiple reference images to enhance the similarity search results.

Geospatial search interface displaying deforestation patterns with reference image, parameters, and top matches

Ecosystem change detection

Unlike vector-based similarity search, change detection focuses on measuring the distance between embedding vectors over time, the core assumption being that the more distant embedding vectors are to each other, the more dissimilar the underlying satellite imagery is. If applied to a single region over time, this lets you pinpoint so called change points—periods where significant and long-lasting change in surface conditions occurred.

Our solution implements a timeline view of Sentinel-2 satellite observations from 2018 to present. Each observation point corresponds to a unique satellite image, allowing for detailed temporal analysis. While embedding vectors are highly dimensional, we use the previously computed PCA (and optionally t-SNE) to reduce dimensionality to a single dimension for visualization purposes.

Let’s review a compelling example from our analysis of deforestation in the Amazon. The following image is a timeseries plot of geospatial embeddings (first principal component) for a single 256×256 pixel chip. Cloudy images and major outliers have been removed.

Environmental change analysis graph for Chip ID 39_18 showing progression with satellite imagery snapshots from 2018-2024

Points clustered closely on the y-axis indicate similar ground conditions; sudden and persistent discontinuities in the embedding values signal significant change. Here’s what the analysis shows:

  • Stable forest conditions from 2018 through 2020
  • A significant discontinuity in embedding values during 2021. Closer review of the underlying satellite imagery shows clear evidence of forest clearing and conversion to agricultural fields
  • Further transformation visible in 2024 imagery

Naturally, we need a way to automate the process of change detection so that it can be applied at scale. Given that we do not typically have extensive changepoint training datasets, we need an unsupervised approach that works without labeled data. The intuition behind unsupervised change detection is the following: identify what normal looks like, then highlight large enough deviations from normal and flag them as change points; after a change point has occurred, characterize the new normal and repeat the process.

The following function performs harmonic regression analysis on the embeddings timeseries data, specifically designed to model yearly seasonality patterns. The function fits a harmonic regression with a specified frequency (default 365 days for annual patterns) to the embedding data of a baseline period (the year 2018 in this example). It then generates predictions and calculates error metrics (absolute and percentage deviations). Large deviations from the normal seasonal pattern indicate change and can be automatically flagged using thresholding.

def fit_harmonic_regression(data, outliers, date_col="date", y_col="patch_emb_pca_1", 
                          baseline_start_date=datetime.datetime(2018,1,1), 
                          monitoring_start_date=datetime.datetime(2019,1,1), 
                          deg=3, reg=0.001, avg_days_yr=365):
    """
    Fits a harmonic regression model to time series data using Lasso regularization.
    
    Parameters:
    - data: DataFrame containing the time series data
    - outliers: List of dates to exclude from the fitting process
    - date_col: Name of the column containing dates (default: 'date')
    - y_col: Name of the column containing target values (default: 'patch_emb_pca_1')
    - baseline_start_date: Start date for the baseline period (default: Jan 1, 2018)
    - monitoring_start_date: Start date for the monitoring period (default: Jan 1, 2019)
    - deg: Degree of polynomial features (default: 3)
    - reg: Regularization strength for Lasso (default: 0.001)
    - avg_days_yr: Number of days in a year (default: 365)
    """

    # Convert date column to datetime format and create numerical representation
    data[date_col] = pd.to_datetime(data[date_col])
    data["date_numerical"] = data[date_col].apply(lambda x: mdates.date2num(x))
    t_full = data["date_numerical"]
    y_full = data[y_col]
    
    # Filter data for model fitting:
    # - Remove outliers
    # - Keep only data between baseline_start_date and monitoring_start_date
    t_fitting = data[
        (~data["date"].isin(outliers)) & 
        (data[date_col] >= baseline_start_date) & 
        (data[date_col] < monitoring_start_date)
    ]["date_numerical"]
    
    y_fitting = data[
        (~data["date"].isin(outliers)) & 
        (data[date_col] >= baseline_start_date) & 
        (data[date_col] < monitoring_start_date)
    ][y_col]
    
    # Create design matrix for harmonic regression
    # Convert time to angular frequency (2π/period)
    w = 2 * np.pi / avg_days_yr
    
    # Generate polynomial features from sine and cosine transformations
    poly = PolynomialFeatures(deg)
    # Create feature matrix for fitting period
    X_fitting = poly.fit_transform(np.column_stack((
        np.sin(w*t_fitting), 
        np.cos(w*t_fitting)
    )))
    # Create feature matrix for all data points
    X_full = poly.fit_transform(np.column_stack((
        np.sin(w*t_full), 
        np.cos(w*t_full)
    )))

    # Initialize and fit Lasso regression model
    lasso_model = Lasso(alpha=reg)
    lasso_model.fit(X_fitting, y_fitting)
    
    # Generate predictions for all time points
    y_fit = lasso_model.predict(X_full)
    
    # Calculate error metrics
    absolute_deviation = np.abs(y_full - y_fit)
    percentage_deviation = (absolute_deviation / np.abs(y_fit)) * 100
    
    # Compile results into a DataFrame
    df = pd.DataFrame()
    df["date"] = data[date_col]
    df["date_numerical"] = data["date_numerical"] 
    df[f"{y_col}_true"] = data[y_col]          # Original values
    df[f"{y_col}_pred"] = y_fit                # Predicted values
    df[f"{y_col}_abs_error"] = absolute_deviation    # Absolute error
    df[f"{y_col}_perc_error"] = percentage_deviation # Percentage error
    # Add time-based features
    df["year"] = df["date"].apply(lambda x: x.year)
    df["month"] = df["date"].apply(lambda x: x.month)
    df["year_month"] = df.apply(
        lambda x: "{}_{}".format(str(x.year), str(x.month)), 
        axis=1
    )
    # Return the fitted model, polynomial transformer, and results DataFrame
    return lasso_model, poly, df

When applied to the chips across an area of observation and defining a threshold on the maximum deviation from the fitted harmonic regression, we can automatically map change intensity allowing analysts to quickly zoom in on problematic areas.

Change detection interface displaying satellite imagery with high to low intensity changes highlighted

While this method performs well in our analyses, it is also quite rigid in that it requires a careful tuning of error thresholds and the definition of a baseline period. There are more sophisticated approaches available ranging from general-purpose time-series analyses that automate the baseline definition and change point detection using recursive methods (for example, Gaussian Processes) to specialized algorithms for geospatial change detection (for example, LandTrendr, and Continuous Change Detection and Classification (CCDC)).

In sum, our approach to change detection demonstrates the power of geospatial embedding vectors in tracking environmental changes over time, providing valuable insights for land use monitoring, environmental protection, and urban planning applications.

GeoFM fine-tuning for your custom use case

Fine-tuning is a specific implementation of transfer learning, in which a pre-trained foundation model is adapted to specific tasks through targeted additional training on specialized labeled datasets. For GeoFMs, these specific tasks can target agriculture, disaster monitoring or urban analysis. The model retains its broad spatial understanding while developing expertise for particular regions, ecosystems or analytical tasks. This approach significantly reduces computational and data requirements compared to building specialized models from scratch, without sacrificing accuracy. Fine-tuning typically involves preserving the pre-trained Clay’s encoder—which has already learned rich representations of spectral patterns, spatial relationships, and temporal dynamics from massive satellite imagery, while attaching and training a specialized task-specific head.

For pixel-wise prediction tasks—such as land use segmentation—the specialized head is typically a decoder architecture, whereas for class-level outputs (classification tasks) the head can be as basic as a multilayer perceptron network. Training focuses exclusively on the new decoder that captures the feature representations from model’s frozen encoder and gradually transforms them back to full-resolution images where each pixel is classified according to its land use type.

The segmentation framework combines the powerful pre-trained Clay encoder with an efficient convolutional decoder, taking Clay’s rich understanding of satellite imagery and converting it into detailed land use maps. The lightweight decoder features convolutional layers and pixel shuffle upsampling techniques that capture the feature representations from Clay’s frozen encoder and gradually transforms them back to full-resolution images where each pixel is classified according to its land use type. By freezing the encoder (which contains 24 transformer heads and 16 attention heads) and only training the compact decoder, the model achieves a good balance between computational efficiency and segmentation accuracy.

We applied this segmentation architecture on a labeled land use land cover (LULC) dataset from Impact Observatory and hosted on the Amazon Registry of Open Data. For illustrative purposes, we again focused on our running example from Brazil’s Mato Grosso region. We trained the decoder head for 10 epochs which took 17 minutes total and tracked intersection over union (IOU) and F1 score as segmentation accuracy metrics. After just one training epoch, the model already achieved 85.7% validation IOU. With the full 10 epochs completed, performance increased to an impressive 92.4% IOU and 95.6% F1 score. In the following image, we show ground truth satellite imagery (upper) and the model’s predictions (lower). The visual comparison highlights how accurately this approach can classify different land use categories.

Comparison showing land-use classification prediction against ground-truth data

Conclusion

Novel GeoFMs provide an encouraging new approach to geospatial analytics. Through their extensive pre-training, these models have incorporated a deep implicit understanding of geospatial data and can be used out-of-the-box for high-impact use cases such as similarity search or change detection. They can also serve as the basis for specialized models using a fine-tuning process that is significantly less data-hungry (fewer labeled data needed) and has lower compute requirements.

In this post, we have shown how you can deploy a state-of-the-art GeoFM (Clay) on AWS and have explored one specific use case – monitoring deforestation in the Amazon rainforest – in greater detail. The same approach is applicable to a large variety of industry use case. For example, insurance companies can use a similar approach to ours to assess damage after natural disasters including hurricanes, floods or fires and keep track of their insured assets. Agricultural organizations can use GeoFMs for crop type identification, crop yield predictions, or other use cases. We also envision high-impact use cases in industries like urban planning, emergency and disaster response, supply chain and global trade, sustainability and environmental modeling, and many others. To get started applying GeoFMs to your own earth observation use case, check out the accompanying GitHub repository, which has the prerequisites and a step-by-step walkthrough to run it on your own area of interest.


About the Authors

Dr. Karsten Schroer is a Senior Machine Learning (ML) Prototyping Architect at AWS, focused on helping customers leverage artificial intelligence (AI), ML, and generative AI technologies. With deep ML expertise, he collaborates with companies across industries to design and implement data- and AI-driven solutions that generate business value. Karsten holds a PhD in applied ML.

Bishesh Adhikari is a Senior ML Prototyping Architect at AWS with over a decade of experience in software engineering and AI/ML. Specializing in GenAI, LLMs, NLP, CV, and GeoSpatial ML, he collaborates with AWS customers to build solutions for challenging problems through co-development. His expertise accelerates customers’ journey from concept to production, tackling complex use cases across various industries. In his free time, he enjoys hiking, traveling, and spending time with family and friends.

Dr. Iza Moise is a Senior Machine Learning (ML) Prototyping Architect at AWS, with expertise in both traditional ML and advanced techniques like foundation models and vision transformers. She focuses on applied ML across diverse scientific fields, publishing and reviewing at Amazon’s internal ML conferences. Her strength lies in translating theoretical advances into practical solutions that deliver measurable impact through thoughtful implementation.

Read More

Create an agentic RAG application for advanced knowledge discovery with LlamaIndex, and Mistral in Amazon Bedrock

Create an agentic RAG application for advanced knowledge discovery with LlamaIndex, and Mistral in Amazon Bedrock

Agentic Retrieval Augmented Generation (RAG) applications represent an advanced approach in AI that integrates foundation models (FMs) with external knowledge retrieval and autonomous agent capabilities. These systems dynamically access and process information, break down complex tasks, use external tools, apply reasoning, and adapt to various contexts. They go beyond simple question answering by performing multi-step processes, making decisions, and generating complex outputs.

In this post, we demonstrate an example of building an agentic RAG application using the LlamaIndex framework. LlamaIndex is a framework that connects FMs with external data sources. It helps ingest, structure, and retrieve information from databases, APIs, PDFs, and more, enabling the agent and RAG for AI applications.

This application serves as a research tool, using the Mistral Large 2 FM on Amazon Bedrock generate responses for the agent flow. The example application interacts with well-known websites, such as Arxiv, GitHub, TechCrunch, and DuckDuckGo, and can access knowledge bases containing documentation and internal knowledge.

This application can be further expanded to accommodate broader use cases requiring dynamic interaction with internal and external APIs, as well as the integration of internal knowledge bases to provide more context-aware responses to user queries.

Solution overview

This solution uses the LlamaIndex framework to build an agent flow with two main components: AgentRunner and AgentWorker. The AgentRunner serves as an orchestrator that manages conversation history, creates and maintains tasks, executes task steps, and provides a user-friendly interface for interactions. The AgentWorker handles the step-by-step reasoning and task execution.

For reasoning and task planning, we use Mistral Large 2 on Amazon Bedrock. You can use other text generation FMs available from Amazon Bedrock. For the full list of supported models, see Supported foundation models in Amazon Bedrock. The agent integrates with GitHub, arXiv, TechCrunch, and DuckDuckGo APIs, while also accessing internal knowledge through a RAG framework to provide context-aware answers.

In this solution, we present two options for building the RAG framework:

  • Document integration with Amazon OpenSearch Serverless – The first option involves using LlamaIndex to programmatically load and process documents. It splits the documents into chunks using various chunking strategies and then stores these chunks in an Amazon OpenSearch Serverless vector store for future retrieval.
  • Document integration with Amazon Bedrock Knowledge Bases – The second option uses Amazon Bedrock Knowledge Bases, a fully managed service that handles the loading, processing, and chunking of documents. This service can quickly create a new vector store on your behalf with a few configurations and clicks. You can choose from Amazon OpenSearch Serverless, Amazon Aurora PostgreSQL-Compatible Edition Serverless, and Amazon Neptune Analytics. Additionally, the solution includes a document retrieval rerank feature to enhance the relevance of the responses.

You can select the RAG implementation option that best suits your preference and developer skill level.

The following diagram illustrates the solution architecture.

AWS Agent architecture diagram illustrating AgentRunner management system and AgentWorker's integrated development toolkit

In the following sections, we present the steps to implement the agentic RAG application. You can also find the sample code in the GitHub repository.

Prerequisites

The solution has been tested in the AWS Region us-west-2. Complete the following steps before proceeding:

  1. Set up the following resources:
    1. Create an Amazon SageMaker
    2. Create a SageMaker domain user profile.
    3. Launch Amazon SageMaker Studio, select JupyterLab, and create a space.
    4. Select the instance t3.medium and the image SageMaker Distribution 2.3.1, then run the space.
  2. Request model access:
    1. On the Amazon Bedrock console, choose Model access in the navigation pane.
    2. Choose Modify model access.
    3. Select the models Mistral Large 2 (24.07), Amazon Titan Text Embeddings V2, and Rerank 1.0 from the list, and request access to these models.
  3. Configure AWS Identity and Access Management (IAM) permissions:
    1. In the SageMaker console, go to the SageMaker user profile details and find the execution role that the SageMaker notebook uses. It should look like AmazonSageMaker-ExecutionRole-20250213T123456.
  4. In the IAM console, create an inline policy for this execution role. that your role can perform the following actions:
    1. Access to Amazon Bedrock services including:
      • Reranking capabilities
      • Retrieving information
      • Invoking models
      • Listing available foundation models
    2. IAM permissions to:
      • Create policies
      • Attach policies to roles within your account
    3. Full access to Amazon OpenSearch Serverless service
  5. Run the following command in the JupyterLab notebook terminal to download the sample code from GitHub:
git init
git remote add origin https://github.com/aws-samples/mistral-on-aws.git
git sparse-checkout init
git sparse-checkout set "notebooks/mistral-llamaindex-agentic-rag"
git pull origin main
  1. Finally, install the required Python packages by running the following command in the terminal:
cd mistral-llamaindex-agentic-rag
pip install -r requirements.txt

Initialize the models

Initialize the FM used for orchestrating the agentic flow with Amazon Bedrock Converse API. This API provides a unified interface for interacting with various FMs available on Amazon Bedrock. This standardization simplifies the development process, allowing developers to write code one time and seamlessly switch between different models without adjusting for model-specific differences. In this example, we use the Mistral Large 2 model on Amazon Bedrock.

Next, initialize the embedding model from Amazon Bedrock, which is used for converting document chunks into embedding vectors. For this example, we use Amazon Titan Text Embeddings V2. See the following code:

# Initialise and configure the BedrockConverse LLM with the Mistral Large 2 model and set it as the default in Settings

from llama_index.llms.bedrock_converse import BedrockConverse
from llama_index.core import Settings
llm = BedrockConverse(model="mistral.mistral-large-2407-v1:0", max_tokens = 2048)
Settings.llm = BedrockConverse(model="mistral.mistral-large-2407-v1:0", max_tokens = 2048)

# Initialise and configure the embedding model with Amazon Titan Text Embeddings V2, and set it as the default in Settings

from llama_index.embeddings.bedrock import BedrockEmbedding
embed_model = BedrockEmbedding(model_name="amazon.titan-embed-text-v2:0")
Settings.embed_model = BedrockEmbedding(model_name="amazon.titan-embed-text-v2:0")

Integrate API tools

Implement two functions to interact with the GitHub and TechCrunch APIs. The APIs shown in this post don’t require credentials. To provide clear communication between the agent and the foundation model, follow Python function best practices, including:

  • Type hints for parameter and return value validation
  • Detailed docstrings explaining function purpose, parameters, and expected returns
  • Clear function descriptions

The following code sample shows the function that integrates with the GitHub API. After the function is created, use the FunctionTool.from_defaults() method to wrap the function as a tool and integrate it seamlessly into the LlamaIndex workflow.

See the code repository for the full code samples of the function that integrates with the TechCrunch API.

# Define a function to search GitHub repositories by topic, sorting by stars or update date, and return top results
import requests
def github_search(topic: str, num_results: int = 3, sort_by: str = "stars") -> list:
    """
    Retrieve a specified number of GitHub repositories based on a given topic, 
    ranked by the specified criteria.

    This function uses the GitHub API to search for repositories related to a 
    specific topic or keyword. The results can be sorted by the number of stars 
    (popularity) or the most recent update, with the most relevant repositories 
    appearing first according to the chosen sorting method.

    Parameters:
    -----------
    topic : str
        The topic or keyword to search for in GitHub repositories.
        The topic cannot contain blank spaces.
    num_results : int, optional
        The number of repository results to retrieve. Defaults to 3.
    sort_by : str, optional
        The criterion for sorting the results. Options include:
        - 'stars': Sort by the number of stars (popularity).
        - 'updated': Sort by the date of the last update (most recent first).
        Defaults to 'stars'.

    Returns:
    --------
    list
        A list of dictionaries, where each dictionary contains information 
        about a repository. Each dictionary includes:
        - 'html_url': The URL of the repository.
        - 'description': A brief description of the repository.
        - 'stargazers_count': The number of stars (popularity) the repository has.
    """

    url = f"https://api.github.com/search/repositories?q=topic:{topic}&sort={sort_by}&order=desc"
    response = requests.get(url).json()
    code_repos = [
        {
            'html_url': item['html_url'],
            'description': item['description'],
            'stargazers_count': item['stargazers_count'],
        }
        for item in response['items'][:num_results]
    ]
    return code_repos

github_tool = FunctionTool.from_defaults(fn=github_search)

For arXiv and DuckDuckGo integration, we use LlamaIndex’s pre-built tools instead of creating custom functions. You can explore other available pre-built tools in the LlamaIndex documentation to avoid duplicating existing solutions.

# Import and configure the ArxivToolSpec and DuckDuckGoSearchToolSpec from LlamaIndex prebuilt tools

from llama_index.tools.arxiv import ArxivToolSpec
from llama_index.tools.duckduckgo import DuckDuckGoSearchToolSpec

arxiv_tool = ArxivToolSpec()
search_tool = DuckDuckGoSearchToolSpec()

api_tools = arxiv_tool.to_tool_list() + search_tool.to_tool_list()

# Consolidate all tools into one list. 
api_tools.extend([news_tool, github_tool])

RAG option 1: Document integration with Amazon OpenSearch Serverless

Next, programmatically build the RAG component using LlamaIndex to load, process, and chunk documents.  store the embedding vectors in Amazon OpenSearch Serverless. This approach offers greater flexibility for advanced scenarios, such as loading various file types (including .epub and .ppt) and selecting advanced chunking strategies based on file types (such as HTML, JSON, and code).

Before moving forward, you can download some PDF documents for testing from the AWS website using the following command, or you can use your own documents. The following documents are AWS guides that help in choosing the right generative AI service (such as Amazon Bedrock or Amazon Q) based on use case, customization needs, and automation potential. They also assist in selecting AWS machine learning (ML) services (such as SageMaker) for building models, using pre-trained AI, and using cloud infrastructure.

# download test documents from below links
!wget -O docs/genai_on_aws.pdf https://docs.aws.amazon.com/pdfs/decision-guides/latest/generative-ai-on-aws-how-to-choose/generative-ai-on-aws-how-to-choose.pdf?did=wp_card&trk=wp_card#guide
!wget -O docs/ml_on_aws.pdf https://docs.aws.amazon.com/pdfs/decision-guides/latest/machine-learning-on-aws-how-to-choose/machine-learning-on-aws-how-to-choose.pdf?did=wp_card&trk=wp_card#guide

Load the PDF documents using SimpleDirectoryReader() in the following code. For a full list of supported file types, see the LlamaIndex documentation.

# use Llamaindex to load documents 
from llama_index.core import SimpleDirectoryReader
loader = SimpleDirectoryReader('docs/')
documents = loader.load_data()

Next, create an Amazon OpenSearch Serverless collection as the vector database. Check the utils.py file for details on the create_collection() function.

# Create Amazon OpenSearch Serverless collection 
from utils import *
import sagemaker 
import random

region_name = "us-west-2"
suffix = random.randrange(1, 500)
collection_name = "llamaindex-blog-"+str(suffix)
notebook_execution_role = sagemaker.get_execution_role()
endpoint = create_collection(collection_name, notebook_execution_role)

After you create the collection, create an index to store embedding vectors:

## create an index in the collection
index_name = "pdf-rag"
create_index(index_name, endpoint, emb_dim=1024)

Next, use the following code to implement a document search system using LlamaIndex integrated with Amazon OpenSearch Serverless. It first sets up AWS authentication to securely access OpenSearch Service, then configures a vector client that can handle 1024-dimensional embeddings (specifically designed for the Amazon Titan Embedding V2 model). The code processes input documents by breaking them into manageable chunks of 1,024 tokens with a 20-token overlap, converts these chunks into vector embeddings, and stores them in the OpenSearch Serverless vector index. You can select a different or more advanced chunking strategy by modifying the transformations parameter in the VectorStoreIndex.from_documents() method. For more information and examples, see the LlamaIndex documentation.

import boto3
from llama_index.vector_stores.opensearch import  OpensearchVectorStore,   OpensearchVectorClient
from opensearchpy import RequestsHttpConnection, AWSV4SignerAuth
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.node_parser import SentenceSplitter

## integrate Amazon OpenSearch Serverless collection and index to llamaindex 

dim = 1024 # Amazon Titan Embedding V2 model dimension 
service = 'aoss'
credentials = boto3.Session().get_credentials()
awsauth = AWSV4SignerAuth(credentials, region_name, service)

client = OpensearchVectorClient(
    endpoint, 
    index_name, 
    dim, 
    embedding_field="vector", 
    text_field="chunk",
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
)

# initialise vector store and save document chunks to the vector store 
vector_store = OpensearchVectorStore(client)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    documents, 
    storage_context=storage_context,
    transformations=[SentenceSplitter(chunk_size=1024, chunk_overlap=20)]
)

You can add a reranking step in the RAG pipeline, which improves the quality of information retrieved by making sure that the most relevant documents are presented to the language model, resulting in more accurate and on-topic responses:

from llama_index.postprocessor.bedrock_rerank import AWSBedrockRerank
reranker = AWSBedrockRerank(
    top_n=3,
    model_id="amazon.rerank-v1:0",#  another rerank model option is: cohere.rerank-v3-5:0
    region_name="us-west-2",
)
query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[reranker],
)

Use the following code to test the RAG framework. You can compare results by enabling or disabling the reranker model.

response = query_engine.query(
    "In which situation should I use Amazon Bedrock over Amazon SageMaker?",
)

Next, convert the vector store into a LlamaIndex QueryEngineTool, which requires a tool name and a comprehensive description. This tool is then combined with other API tools to create an agent worker that executes tasks in a step-by-step manner. The code initializes an AgentRunner to orchestrate the entire workflow, analyzing text inputs and generating responses. The system can be configured to support parallel tool execution for improved efficiency.

# create QueryEngineTool based on the OpenSearch vector store 

from llama_index.core.tools import QueryEngineTool, ToolMetadata
oss_tool = QueryEngineTool(
        query_engine=query_engine,
        metadata=ToolMetadata(
            name="oss_guide_tool",
            description="""
            These decision guides help users select appropriate AWS machine learning and generative AI services based on specific needs. 
            They cover pre-built solutions, customizable platforms, and infrastructure options for ML workflows, 
            while outlining how generative AI can automate processes, personalize content, augment data, reduce costs, 
            and enable faster experimentation in various business contexts.""",
        ),
    )

all_tools = api_tools +[oss_tool]

agent_worker = FunctionCallingAgentWorker.from_tools(
    all_tools, 
    llm=llm, 
    verbose=True, # Set verbose=True to display the full trace of steps. 
    system_prompt = system_prompt,
    # allow_parallel_tool_calls = True  # Uncomment this line to allow multiple tool invocations
)
agent = AgentRunner(agent_worker)
response = agent.chat(text_input)

You have now completed building the agentic RAG application using LlamaIndex and Amazon OpenSearch Serverless. You can test the chatbot application with your own questions. For example, ask about the latest news and features regarding Amazon Bedrock, or inquire about the latest papers and most popular GitHub repositories related to generative AI.

RAG option 2: Document integration with Amazon Bedrock Knowledge Bases

In this section, you use Amazon Bedrock Knowledge Bases to build the RAG framework. You can create an Amazon Bedrock knowledge base on the Amazon Bedrock console or follow the provided notebook example to create it programmatically. Create a new Amazon Simple Storage Service (Amazon S3) bucket for the knowledge base, then upload the previously downloaded files to this S3 bucket. You can select different embedding models and chunking strategies that work better for your data. After you create the knowledge base, remember to sync the data. Data synchronization might take a few minutes.

To enable your newly created knowledge base to invoke the rerank model, you need to modify its permissions. First, open the Amazon Bedrock console and locate the service role that matches the one shown in the following screenshot.

Amazon Bedrock console showing Knowledge Base setup with execution role, vector store configuration, and data source controls

Choose the role and add the following provided IAM permission policy as an inline policy. This additional authorization grants your knowledge base the necessary permissions to successfully invoke the rerank model on Amazon Bedrock.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": "arn:aws:bedrock:us-west-2::foundation-model/amazon.rerank-v1:0"
        },
        {
            "Effect": "Allow",
            "Action": "bedrock:Rerank",
            "Resource": "*"
        }
    ]
}

Use the following code to integrate the knowledge base into the LlamaIndex framework. Specific configurations can be provided in the retrieval_config parameter, where numberOfResults is the maximum number of retrieved chunks from the vector store, and overrideSearchType has two valid values: HYBRID and SEMANTIC. In the rerankConfiguration, you can optionally provide a rerank modelConfiguration and numberOfRerankedResults to sort the retrieved chunks by relevancy scores and select only the defined number of results. For the full list of available configurations for retrieval_config, refer to the Retrieve API documentation.

# Configure a knowledge base retriever using AmazonKnowledgeBasesRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.retrievers.bedrock import AmazonKnowledgeBasesRetriever

# maximum number of relevant text chunks that will be retrieved
# If you need quick, focused answers: lower numbers (1-3)
# If you need detailed, comprehensive answers: higher numbers (5-10)
top_k = 10

# search mode options: HYBRID, SEMANTIC
# HYBRID search combines the strengths of semantic search and keyword search 
# Balances semantic understanding with exact matching
# https://docs.llamaindex.ai/en/stable/examples/retrievers/bedrock_retriever/
search_mode = "HYBRID"

kb_retriever = AmazonKnowledgeBasesRetriever(
    knowledge_base_id=knowledge_base_id,
    retrieval_config={
        "vectorSearchConfiguration": {
            "numberOfResults": top_k,
            "overrideSearchType": search_mode,
            'rerankingConfiguration': {
                'bedrockRerankingConfiguration': {
                    'modelConfiguration': {
                        'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.rerank-v1:0'
                    },
                    'numberOfRerankedResults': 3
                },
                'type': 'BEDROCK_RERANKING_MODEL'
            }
        },
        
    }
)
kb_engine = RetrieverQueryEngine(retriever=kb_retriever)

Like the first option, you can create the knowledge base as a QueryEngineTool in LlamaIndex and combine it with other API tools. Then, you can create a FunctionCallingAgentWorker using these combined tools and initialize an AgentRunner to interact with them. By using this approach, you can chat with and take advantage of the capabilities of the integrated tools.

# Create a query tool for Bedrock Knowledge Base
kb_tool = QueryEngineTool(
        query_engine=kb_engine,
        metadata=ToolMetadata(
            name="kb_tool",
            description="""
            These decision guides help users select appropriate AWS machine learning and generative AI services based on specific needs. 
            They cover pre-built solutions, customizable platforms, and infrastructure options for ML workflows, 
            while outlining how generative AI can automate processes, personalize content, augment data, reduce costs, 
            and enable faster experimentation in various business contexts.""",
        ),
    )

# Update the agent to include all API tools and the Knowledge Base tool.
all_tools = api_tools +[kb_tool]

agent_worker = FunctionCallingAgentWorker.from_tools(
    all_tools, 
    llm=llm, 
    verbose=True, # Set verbose=True to display the full trace of steps. 
    system_prompt = system_prompt,
    # allow_parallel_tool_calls = True  # Uncomment this line to allow multiple tool invocations
)
agent = AgentRunner(agent_worker)
response = agent.chat(text_input)

Now you have built the agentic RAG solution using LlamaIndex and Amazon Bedrock Knowledge Bases.

Clean up

When you finish experimenting with this solution, use the following steps to clean up the AWS resources to avoid unnecessary costs:

  1. In the Amazon S3 console, delete the S3 bucket and data created for this solution.
  2. In the OpenSearch Service console, delete the collection that was created for storing the embedding vectors.
  3. In the Amazon Bedrock Knowledge Bases console, delete the knowledge base you created.
  4. In the SageMaker console, navigate to your domain and user profile, and launch SageMaker Studio to stop or delete the JupyterLab instance.

Conclusion

This post demonstrated how to build a powerful agentic RAG application using LlamaIndex and Amazon Bedrock that goes beyond traditional question answering systems. By integrating Mistral Large 2 as the orchestrating model with external APIs (GitHub, arXiv, TechCrunch, and DuckDuckGo) and internal knowledge bases, you’ve created a versatile technology discovery and research tool.

We showed you two complementary approaches to implement the RAG framework: a programmatic implementation using LlamaIndex with Amazon OpenSearch Serverless, providing maximum flexibility for advanced use cases, and a managed solution using Amazon Bedrock Knowledge Bases that simplifies document processing and storage with minimal configuration. You can try out the solution using the following code sample.

For more relevant information, see Amazon Bedrock, Amazon Bedrock Knowledge Bases, Amazon OpenSearch Serverless, and Use a reranker model in Amazon Bedrock. Refer to Mistral AI in Amazon Bedrock to see the latest Mistral models that are available on both Amazon Bedrock and AWS Marketplace.


About the Authors

Ying Hou, PhD, is a Sr. Specialist Solution Architect for Gen AI at AWS, where she collaborates with model providers to onboard the latest and most intelligent AI models onto AWS platforms. With deep expertise in Gen AI, ASR, computer vision, NLP, and time-series forecasting models, she works closely with customers to design and build cutting-edge ML and GenAI applications. Outside of architecting innovative AI solutions, she enjoys spending quality time with her family, getting lost in novels, and exploring the UK’s national parks.

Preston Tuggle is a Sr. Specialist Solutions Architect with the Third-Party Model Provider team at AWS. He focuses on working with model providers across Amazon Bedrock and Amazon SageMaker, helping them accelerate their go-to-market strategies through technical scaling initiatives and customer engagement.

Read More

Text-to-image basics with Amazon Nova Canvas

Text-to-image basics with Amazon Nova Canvas

AI image generation has emerged as one of the most transformative technologies in recent years, revolutionizing how you create and interact with visual content. Amazon Nova Canvas is a generative model in the suite of Amazon Nova creative models that enables you to generate realistic and creative images from plain text descriptions.

This post serves as a beginner’s guide to using Amazon Nova Canvas. We begin with the steps to get set up on Amazon Bedrock. Amazon Bedrock is a fully managed service that hosts leading foundation models (FMs) for various use cases such as text, code, and image generation; summarization; question answering; and custom use cases that involve fine-tuning and Retrieval Augmented Generation (RAG). In this post, we focus on the Amazon Nova image generation models available in AWS Regions in the US, in particular, the Amazon Nova Canvas model. We then provide an overview of the image generation process (diffusion) and dive deep into the input parameters for text-to-image generation with Amazon Nova Canvas.

Get started with image generation on Amazon Bedrock

Complete the following steps to get setup with access to Amazon Nova Canvas and the image playground:

  1. Create an AWS account if you don’t have one already.
  2. Open the Amazon Bedrock console as an AWS Identity and Access Management (IAM) administrator or appropriate IAM user.
  3. Confirm and choose one of the Regions where the Amazon Nova Canvas model is available (for example, US East (N. Virginia)).
  4. In the navigation pane, choose Model access under Bedrock configurations.
    Expandable Bedrock configurations menu with Model access and Settings items
  1. Under What is Model access, choose Modify model access or Enable specific models (if not yet activated).Amazon Bedrock model access explanation with permissions, terms, and quota link for foundation models
  1. Select Nova Canvas, then choose Next.Amazon Bedrock model access editor displaying filtered view of Nova Canvas image model with access controls
  1. On the Review and submit page, choose Submit.Final review interface for Amazon Bedrock Nova Canvas model access request with edit and navigation options
  1. Refresh the Base models
    If you see the Amazon Nova Canvas model in the Access Granted status, you are ready to proceed with the next steps.
    Nova Canvas model status row showing granted access
  1. In the navigation pane, choose Image / Video under Playgrounds.
    Playgrounds menu showing Chat/Text and Image/Video options
  1. Choose Select model, then choose Amazon and Nova Canvas. Then choose Apply.Model selection dialog with categories, available models list, and inference options for AI image generation

You are all set up to start generating images with Amazon Nova Canvas on Amazon Bedrock. The following screenshot shows an example of our playground.

Nova Canvas image generation interface showing configuration panel and two example flower vase photos

Understanding the generation process

Amazon Nova Canvas uses diffusion-based approaches to generate images:

  • Starting point – The process begins with random noise (a pure static image).
  • Iterative denoising – The model gradually removes noise in steps, guided by your prompts. The amount of noise to remove at each step is learned at training. For instance, for a model to generate an image of a cat, it has to be trained on multiple cat images, and iteratively insert noise into the image until it is complete noise. When learning the amount of noise to add at each step, the model effectively learns the reverse process, starting with a noisy image and iteratively subtracting noise to arrive at the image of a cat.
  • Text conditioning – The text prompt serves as the conditioning that guides the image generation process. The prompt is encoded as a numerical vector, referenced against similar vectors in a text-image embedding space that corresponds to images, and then using these vectors, a noisy image is transformed into an image that captures the input prompt.
  • Image conditioning – In addition to text prompts, Amazon Nova Canvas also accepts images as inputs.
  • Safety and fairness – To comply with safety and fairness goals, both the prompt and the generated output image go through filters. If no filter is triggered, the final image is returned.

Prompting fundamentals

Image generation begins with effective prompting—the art of crafting text descriptions that guide the model toward your desired output. Well-constructed prompts include specific details about subject, style, lighting, perspective, mood, and composition, and work better when structured as image captions rather than a command or conversation. For example, rather than saying “generate an image of a mountain,” a more effective prompt might be “a majestic snow-capped mountain peak at sunset with dramatic lighting and wispy clouds, photorealistic style.” Refer to Amazon Nova Canvas prompting best practices for more information about prompting.

Let’s address the following prompt elements and observe their impact on the final output image:

  • Subject descriptions (what or who is in the image) – In the following example, we use the prompt “a cat sitting on a chair.”

Striped cat with bright eyes resting on wooden dining chair in warm lighting

  • Style references (photography, oil painting, 3D render) – In the following examples, we use the prompts “A cat sitting on a chair, oil painting style” and then “A cat sitting on a chair, anime style.”

Digital artwork of cat resting on wooden chair, painted in soft brushstrokes with warm golden tones

Stylized animation of tabby cat resting peacefully on wooden armchair, bathed in warm window light

  • Compositional elements and technical specifications (foreground, background, perspective, lighting) – In the following examples, we use the prompts “A cat sitting on a chair, mountains in the background,” and “A cat sitting on a chair, sunlight from the right low angle shot.”

Cat sitting on wooden chair with snow-capped mountains in background

Detailed portrait of alert tabby cat on wooden chair, backlit by golden afternoon sunlight

Positive and negative prompts

Positive prompts tell the model what to include. These are the elements, styles, and characteristics you want to observe in the final image. Avoid the use of negation words like “no,” “not,” or “without” in your prompt. Amazon Nova Canvas has been trained on image-caption pairs, and captions rarely describe what isn’t in an image. Therefore, the model has never learned the concept of negation. Instead, use negative prompts to specify elements to exclude from the output.

Negative prompts specify what to avoid. Common negative prompts include “blurry,” “distorted,” “low quality,” “poor anatomy,” “bad proportions,” “disfigured hands,” or “extra limbs,” which help models avoid typical generation artifacts.

In the following examples, we first use the prompt “An aerial view of an archipelago,” then we refine the prompt as “An aerial view of an archipelago. Negative Prompt: Beaches.”

Aerial view of tropical islands with turquoise waters and white beach

Aerial view of forested islands scattered across calm ocean waters

The balance between positive and negative prompting creates a defined creative space for the model to work within, often resulting in more predictable and desirable outputs.

Image dimensions and aspect ratios

Amazon Nova Canvas is trained on 1:1, portrait and landscape resolutions, with generation tasks having a maximum output resolution of 4.19 million pixels (that is, 2048×2048, 2816×1536). For editing tasks, the image should be 4,096 pixels on its longest side, have an aspect ratio between 1:4 and 4:1, and have a total pixel count of 4.19 million or smaller. Understanding dimensional limitations helps avoid stretched or distorted results, particularly for specialized composition needs.

Classifier-free guidance scale

The classifier-free guidance (CFG) scale controls how strictly the model follows your prompt:

  • Low values (1.1–3) – More creative freedom for the AI, potentially more aesthetic, but low contrast and less prompt-adherent results
  • Medium values (4–7) – Balanced approach, typically recommended for most generations
  • High values (8–10) – Strict prompt adherence, which can produce more precise results but sometimes at the cost of natural aesthetics and increased color saturation

In the following examples, we use the prompt “Cherry blossoms, bonsai, Japanese style landscape, high resolution, 8k, lush greens in the background.”

The first image with CFG 2 captures some elements of cherry blossoms and bonsai. The second image with CFG 8 adheres more to the prompt with a potted bonsai, more pronounced cherry blossom flowers, and lush greens in the background.

Miniature cherry blossom tree with pink blooms cascading over moss-covered rock near peaceful pond

Cherry blossom bonsai with curved trunk and pink flowers in traditional pot against green landscape

Think of CFG scale as adjusting how literally your instructions are taken into consideration vs. how much artistic interpretation it applies.

Seed values and reproducibility

Every image generation begins with a randomization seed—essentially a starting number that determines initial conditions:

  • Seeds are typically represented as long integers (for example, 1234567890)
  • Using the same seed, prompt, and parameters reproduces identical images every time
  • Saving seeds allows you to revisit successful generations or create variations on promising results
  • Seed values have no inherent quality; they are simply different starting points

Reproducibility through seed values is essential for professional workflows, allowing refined iterations on the prompt or other input parameters to clearly see their effect, rather than completely random generations. The following images are generated using two slightly different prompts (“A portrait of a girl smiling” vs. “A portrait of a girl laughing”), while holding the seed value and all other parameters constant.

All preceding images in this post have been generated using the text-to-image (TEXT_IMAGE) task type of Amazon Nova Canvas, available through the Amazon Bedrock InvokeModel API. The following is the API request and response structure for image generation:

#Request Structure
{
    "taskType": "TEXT_IMAGE",
    "textToImageParams": {
        "text": string,         #Positive Prompt
        "negativeText": string  #Negative Prompt
    },
    "imageGenerationConfig": {
        "width": int,           #Image Resolution Width
        "height": int,          #Image Resolution Width
        "quality": "standard" | "premium",   #Image Quality
        "cfgScale": float,      #Classifer Free Guidance Scale
        "seed": int,            #Seed value
        "numberOfImages": int   #Number of images to be generated (max 5)
    }
}
#Response Structure
{
    "images": "images": string[], #list of Base64 encoded images
    "error": string
}

Code example

This solution can also be tested locally with a Python script or a Jupyter notebook. For this post, we use an Amazon SageMaker AI notebook using Python (v3.12). For more information, see Run example Amazon Bedrock API requests using an Amazon SageMaker AI notebook. For instructions to set up your SageMaker notebook instance, refer to Create an Amazon SageMaker notebook instance. Make sure the instance is set up in the same Region where Amazon Nova Canvas access is enabled. For this post, we create a Region variable to match the Region where Amazon Nova Canvas is enabled (us-east-1). You must modify this variable if you’ve enabled the model in a different Region. The following code demonstrates text-to-image generation by invoking the Amazon Nova Canvas v1.0 model using Amazon Bedrock. To understand the API request and response structure for different types of generations, parameters, and more code examples, refer to Generating images with Amazon Nova.

import base64  #For encoding/decoding base64 data
import io  #For handling byte streams
import json  #For JSON processing
import boto3  #AWS SDK for Python
from PIL import Image  #Python Imaging Library for image processing
from botocore.config import Config  #For AWS client configuration

#Create a variable to fix the region to where Nova Canvas is enabled
region = "us-east-1"

#Setup an Amazon Bedrock runtime client
client = boto3.client(service_name='bedrock-runtime', region_name=region, config=Config(read_timeout=300))

#Set the content type and accept headers for the API call
accept = "application/json"
content_type = "application/json"

#Define the prompt for image generation
prompt = """A cat sitting on a chair, mountains in the background, low angle shot."""

#Create the request body with generation parameters
api_request= json.dumps({
        "taskType": "TEXT_IMAGE",  #Specify text-to-image generation
        "textToImageParams": {
            "text": prompt  
        },
        "imageGenerationConfig": {
            "numberOfImages": 1,   #Generate one image
            "height": 720,        #Image height in pixels
            "width": 1280,         #Image width in pixels
            "cfgScale": 7.0,       #CFG Scale
            "seed": 0              #Seed number for generation
        }
})
#Call the Bedrock model to generate the image
response = client.invoke_model(body=api_request, modelId='amazon.nova-canvas-v1:0', accept=accept, 
contentType=content_type)
        
#Parse the JSON response
response_json = json.loads(response.get("body").read())

#Extract the base64-encoded image from the response
base64_image = response_json.get("images")[0]
#Convert the base64 string to ASCII bytes
base64_bytes = base64_image.encode('ascii')
#Decode the base64 bytes to get the actual image bytes
image_data = base64.b64decode(base64_bytes)

#Convert bytes to an image object
output_image = Image.open(io.BytesIO(image_data))
#Display the image
output_image.show()
#Save the image to current working directory
output_image.save('output_image.png')

Clean up

When you have finished testing this solution, clean up your resources to prevent AWS charges from being incurred:

  1. Back up the Jupyter notebooks in the SageMaker notebook instance.
  2. Shut down and delete the SageMaker notebook instance.

Cost considerations

Consider the following costs from the solution deployed on AWS:

  • You will incur charges for generative AI inference on Amazon Bedrock. For more details, refer to Amazon Bedrock pricing.
  • You will incur charges for your SageMaker notebook instance. For more details, refer to Amazon SageMaker pricing.

Conclusion

This post introduced you to AI image generation, and then provided an overview of accessing image models available on Amazon Bedrock. We then walked through the diffusion process and key parameters with examples using Amazon Nova Canvas. The code template and examples demonstrated in this post aim to get you familiar with the basics of Amazon Nova Canvas and get started with your AI image generation use cases on Amazon Bedrock.

For more details on text-to-image generation and other capabilities of Amazon Nova Canvas, see Generating images with Amazon Nova. Give it a try and let us know your feedback in the comments.


About the Author

Arjun Singh is a Sr. Data Scientist at Amazon, experienced in artificial intelligence, machine learning, and business intelligence. He is a visual person and deeply curious about generative AI technologies in content creation. He collaborates with customers to build ML and AI solutions to achieve their desired outcomes. He graduated with a Master’s in Information Systems from the University of Cincinnati. Outside of work, he enjoys playing tennis, working out, and learning new skills.

Read More

Real-world applications of Amazon Nova Canvas for interior design and product photography

Real-world applications of Amazon Nova Canvas for interior design and product photography

As AI image generation becomes increasingly central to modern business workflows, organizations are seeking practical ways to implement this technology for specific industry challenges. Although the potential of AI image generation is vast, many businesses struggle to effectively apply it to their unique use cases.

In this post, we explore how Amazon Nova Canvas can solve real-world business challenges through advanced image generation techniques. We focus on two specific use cases that demonstrate the power and flexibility of this technology:

  • Interior design – Image conditioning with segmentation helps interior designers rapidly iterate through design concepts, dramatically reducing the time and cost associated with creating client presentations
  • Product photography – Outpainting enables product photographers to create diverse environmental contexts for products without extensive photo shoots

Whether you’re an interior design firm looking to streamline your visualization process or a retail business aiming to reduce photography costs, this post can help you use the advanced features of Amazon Nova Canvas to achieve your specific business objectives. Let’s dive into how these powerful tools can transform your image generation workflow.

Prerequisites

You should have the following prerequisites:

Interior design

An interior design firm has the following problem: Their designers spend hours creating photorealistic designs for client presentations, needing multiple iterations of the same room with different themes and decorative elements. Traditional 3D rendering is time-consuming and expensive. To solve this problem, you can use the image conditioning (segmentation) features of Amazon Nova Canvas to rapidly iterate on existing room photos. The condition image is analyzed to identify prominent content shapes, resulting in a segmentation mask that guides the generation. The generated image closely follows the layout of the condition image while allowing the model to have creative freedom within the bounds of each content area.

The following images show examples of the initial input, a segmentation mask based on the input, and output based on two different prompts.

Cozy living room featuring stone fireplace, mounted TV, and comfortable seating arrangement AI-generated semantic segmentation map of a living room, with objects labeled in different colors
Input image of a living room Segmentation mask of living room
Minimalist living room featuring white furniture, dark wood accents, and marble-look floors Coastal-themed living room with ocean view and beach-inspired decor
Prompt: A minimalistic living room Prompt: A coastal beach themed living room

This post demonstrates how to maintain structural integrity while transforming interior elements, so you can generate multiple variations in minutes with simple prompting and input images. The following code block presents the API request structure for image conditioning with segmentation. Parameters to perform these transformations are passed to the model through the API request. Make sure that the output image has the same dimensions as the input image to avoid distorted results.

{
    "taskType": "TEXT_IMAGE",
    "textToImageParams": {
        "conditionImage": string (Base64 encoded image), #Original living room
        "controlMode": "SEGMENTATION", 
        "controlStrength": float, #Specify how closely to follow the condition       #image (0.0-1.0; Default: 0.7).
        "text": string, #A minimalistic living room
        "negativeText": string
    },
    "imageGenerationConfig": {
        "width": int,
        "height": int,
        "quality": "standard" | "premium",
        "cfgScale": float,
        "seed": int,
        "numberOfImages": int
    }
}

The taskType object determines the type of operation being performed and has its own set of parameters, and the imageGenerationConfig object contains general parameters common to all task types (except background removal). To learn more about the request/response structure for different types of generations, refer to Request and response structure for image generation.

The following Python code demonstrates an image conditioning generation by invoking the Amazon Nova Canvas v1.0 model on Amazon Bedrock:

import base64  #For encoding/decoding base64 data
import io  #For handling byte streams
import json  #For JSON operations
import boto3  #AWS SDK for Python
from PIL import Image  #Python Imaging Library for image processing
from botocore.config import Config  #For AWS client configuration
#Create a variable to fix the region to where Nova Canvas is enabled 
region = "us-east-1"

#Create Bedrock client with 300 second timeout
bedrock = boto3.client(service_name='bedrock-runtime', region_name=region,
        config=Config(read_timeout=300))

#Original living room image in current working directory
input_image_path = "Original Living Room.jpg"

#Read and encode the image
def prepare_image(image_path):
    with open(image_path, 'rb') as image_file:
        image_data = image_file.read()
        base64_encoded = base64.b64encode(image_data).decode('utf-8')
    return base64_encoded

#Get the base64 encoded image
input_image = prepare_image(input_image_path)

#Set the content type and accept headers for the API call
accept = "application/json"
content_type = "application/json"

#Prepare the request body
api_request = json.dumps({
       "taskType": "TEXT_IMAGE",  #Type of generation task
       "textToImageParams": {
             "text": "A minimalistic living room",  #Prompt
             "negativeText": "bad quality, low res",  #What to avoid
             "conditionImage": input_image,  #Base64 encoded original living room
             "controlMode": "SEGMENTATION"  #Segmentation mode
            },
       "imageGenerationConfig": {
             "numberOfImages": 1,  #Generate one image
             "height": 1024,  #Image height, same as the input image
             "width": 1024,  #Image width, same as the input image
             "seed": 0, #Modify seed value to get variations on the same prompt
             "cfgScale": 7.0  #Classifier Free Guidance scale
            }
})

#Call the model to generate image
response = bedrock.invoke_model(body=api_request, modelId='amazon.nova-canvas-v1:0', accept=accept, contentType=content_type)

#Parse the response body
response_json = json.loads(response.get("body").read())

#Extract and decode the base64 image
base64_image = response_json.get("images")[0]  #Get first image
base64_bytes = base64_image.encode('ascii')  #Convert to ASCII
image_data = base64.b64decode(base64_bytes)  #Decode base64 to bytes

#Display the generated image
output_image = Image.open(io.BytesIO(image_data))
output_image.show()
#Save the image to current working directory
output_image.save('output_image.png')

Product photography

A sports footwear company has the following problem: They need to showcase their versatile new running shoes in multiple environments (running track, outdoors, and more), requiring expensive location shoots and multiple photography sessions for each variant. To solve this problem, you can use Amazon Nova Canvas to generate diverse shots from a single product photo. Outpainting can be used to replace the background of an image. You can instruct the model to preserve parts of the image by providing a mask prompt, for example, “Shoes.” A mask prompt is a natural language description of the objects in your image that should not be changed during outpainting. You can then generate the shoes in different backgrounds with new prompts.

The following images show examples of the initial input, a mask created for “Shoes,” and output based on two different prompts.

Stylized product photo of performance sneaker with contrasting navy/white upper and orange details Black silhouette of an athletic sneaker in profile view
Studio photo of running shoes Mask created for “Shoes”
Athletic running shoe with navy and orange colors on red running track Athletic shoe photographed on rocky surface with forest background
Prompt: Product photoshoot of sports shoes placed on a running track outdoor Prompt: Product photoshoot of sports shoes on rocky terrain, forest background

Instead of using a mask prompt, you can input a mask image, which defines the areas of the image to preserve. The mask image must be the same size as the input image. Areas to be edited are shaded pure white and areas to preserve are shaded pure black. Outpainting mode is a parameter to define how the mask is treated. Use DEFAULT to transition smoothly between the masked area and the non-masked area. This mode is generally better when you want the new background to use similar colors as the original background. However, you can get a halo effect if your prompt calls for a new background that is significantly different than the original background. Use PRECISE to strictly adhere to the mask boundaries. This mode is generally better when you’re making significant changes to the background.

This post demonstrates how to use outpainting to capture product accuracy, and then turn one studio photo into different environments seamlessly. The following code illustrates the API request structure for outpainting:

{
    "taskType": "OUTPAINTING",
    "outPaintingParams": {
        "image": string (Base64 encoded image),
        "maskPrompt": string, #Shoes
        "maskImage": string, #Base64 encoded image
        "outPaintingMode": "DEFAULT" | "PRECISE", 
        "text": string,  #Product photoshoot of sports shoes on rocky terrain
        "negativeText": string
    },
    "imageGenerationConfig": {
        "numberOfImages": int,
        "quality": "standard" | "premium",
        "cfgScale": float,
        "seed": int
    }
}

The following Python code demonstrates an outpainting-based background replacement by invoking the Amazon Nova Canvas v1.0 model on Amazon Bedrock. For more code examples, see Code examples.

import base64  #For encoding/decoding base64 data
import io  #For handling byte streams
import json  #For JSON operations
import boto3  #AWS SDK for Python
from PIL import Image  #Python Imaging Library for image processing
from botocore.config import Config  #For AWS client configuration
#Create a variable to fix the region to where Nova Canvas is enabled 
region = "us-east-1"

#Create Bedrock client with 300 second timeout
bedrock = boto3.client(service_name='bedrock-runtime', region_name=region,
        config=Config(read_timeout=300))

#Original studio image of shoes in current working directory
input_image_path = "Shoes.png"

#Read and encode the image
def prepare_image(image_path):
    with open(image_path, 'rb') as image_file:
        image_data = image_file.read()
        base64_encoded = base64.b64encode(image_data).decode('utf-8')
    return base64_encoded

#Get the base64 encoded image
input_image = prepare_image(input_image_path)

#Set the content type and accept headers for the API call
accept = "application/json"
content_type = "application/json"

#Prepare the request body
api_request = json.dumps({
        "taskType": "OUTPAINTING",
        "outPaintingParams": {
             "image": input_image,
             "maskPrompt": "Shoes", 
             "outPaintingMode": "DEFAULT", 
             "text": "Product photoshoot of sports shoes placed on a running track outdoor",
             "negativeText": "bad quality, low res"
            },
        "imageGenerationConfig": {
             "numberOfImages": 1,
             "seed": 0, #Modify seed value to get variations on the same prompt
             "cfgScale": 7.0
            }
})

#Call the model to generate image
response = bedrock.invoke_model(body=api_request, modelId='amazon.nova-canvas-v1:0', accept=accept, contentType=content_type)

#Parse the response body
response_json = json.loads(response.get("body").read())

#Extract and decode the base64 image
base64_image = response_json.get("images")[0]  #Get first image
base64_bytes = base64_image.encode('ascii')  #Convert to ASCII
image_data = base64.b64decode(base64_bytes)  #Decode base64 to bytes

#Display the generated image
output_image = Image.open(io.BytesIO(image_data))
output_image.show()
#Save the image to current working directory
output_image.save('output_image.png')

Clean up

When you have finished testing this solution, clean up your resources to prevent AWS charges from being incurred:

  1. Back up the Jupyter notebooks in the SageMaker notebook instance.
  2. Shut down and delete the SageMaker notebook instance.

Cost considerations

Consider the following costs from the solution deployed on AWS:

  • You will incur charges for generative AI inference on Amazon Bedrock. For more details, refer to Amazon Bedrock pricing.
  • You will incur charges for your SageMaker notebook instance. For more details, refer to Amazon SageMaker pricing.

Conclusion

In this post, we explored practical implementations of Amazon Nova Canvas for two high-impact business scenarios. You can now generate multiple design variations or diverse environments in minutes rather than hours. With Amazon Nova Canvas, you can significantly reduce costs associated with traditional visual content creation. Refer to Generating images with Amazon Nova to learn about the other capabilities supported by Amazon Nova Canvas.

As next steps, begin with a single use case that closely matches your business needs. Use our provided code examples as a foundation and adapt them to your specific requirements. After you’re familiar with the basic implementations, explore combining multiple techniques and scale gradually. Don’t forget to track time savings and cost reductions to measure ROI. Contact your AWS account team for enterprise implementation guidance.


About the Author

Arjun Singh is a Sr. Data Scientist at Amazon, experienced in artificial intelligence, machine learning, and business intelligence. He is a visual person and deeply curious about generative AI technologies in content creation. He collaborates with customers to build ML/AI solutions to achieve their desired outcomes. He graduated with a Master’s in Information Systems from the University of Cincinnati. Outside of work, he enjoys playing tennis, working out, and learning new skills.

Read More

Independent evaluations demonstrate Nova Premier’s safety

Independent evaluations demonstrate Nova Premier’s safety


Independent evaluations demonstrate Nova Premiers safety

In both black-box stress testing and red-team exercises, Nova Premier comes out on top.

Conversational AI

May 29, 03:22 PMMay 29, 03:22 PM

AI safety is a priority at Amazon. Our investment in safe, transparent, and responsible AI (RAI) includes collaboration with the global community and policymakers. We are members of and collaborate with organizations such as the Frontier Model Forum, the Partnership on AI, and other forums organized by government agencies such as the National Institute of Standards and Technology (NIST). Consistent with Amazon’s endorsement of the Korea Frontier AI Safety Commitments, we published our Frontier Model Safety Framework earlier this year.

Amazon Nova Premier’s guardrails help prevent generation of unsafe content.

During the development of the Nova Premier model, we conducted a comprehensive evaluation to assess its performance and safety. This included testing on both internal and public benchmarks and internal/automated and third-party red-teaming exercises. Once the final model was ready, we prioritized obtaining unbiased, third-party evaluations of the model’s robustness against RAI controls. In this post, we outline the key findings from these evaluations, demonstrating the strength of our testing approach and Amazon Premier’s standing as a safe model. Specifically, we cover our evaluations with two third-party evaluators: PRISM AI and ActiveFence.

Evaluation of Nova Premier against PRISM AI

PRISM Eval’s Behavior Elicitation Tool (BET) dynamically and systematically stress-tests AI models’ safety guardrails. The methodology focuses on measuring how many adversarial attempts (steps) it takes to get a model to generate harmful content across several key risk dimensions. The central metric is “steps to elicit” the number of increasingly sophisticated prompting attempts required before a model generates an inappropriate response. A higher number of steps indicates stronger safety measures, as the model is more resistant to manipulation. The PRISM risk dimensions (inspired by the MLCommons AI Safety Benchmarks) include CBRNE weapons, violent crimes, non-violent crimes, defamation, and hate, amongst several others.

Using the BET Eval tool and its V1.0 metric, which is tailored toward non-reasoning models, we compared the recently released Nova models (Pro and Premier) to the latest models in the same class: Claude (3.5 v2 and 3.7 non-reasoning) and Llama4 Maverick, all available through Amazon Bedrock. PRISM BET conducts black-box evaluations (where model developers dont have access to the test prompts) of models integrated with their API. The evaluation conducted with BET Eval MAX, PRISMs most comprehensive/aggressive testing suite, revealed significant variations in safety against malicious instructions. Nova models demonstrated superior overall safety performance, with an average of 43 steps for Premier and 52 steps for Pro, compared to 37.7 for Claude 3.5 v2 and fewer than 12 steps for other models in the comparison set (namely, 9.9 for Claude3.7, 11.5 for Claude 3.7 thinking, and 6.5 for Maverick). This higher step count suggests that on average, Nova’s safety guardrails are more sophisticated and harder to circumvent through adversarial prompting. The figure below presents the number of steps per harm category evaluated through BET Eval MAX.

Results of tests using PRISM’s BET Eval MAX testing suite.

The PRISM evaluation provides valuable insights into the relative safety of different Amazon Bedrock models. Nova’s strong performance, particularly in hate speech and defamation resistance, represents meaningful progress in AI safety. However, the results also highlight the ongoing challenge of building truly robust safety measures into AI systems. As the field continues to evolve, frameworks like BET will play an increasingly important role in benchmarking and improving AI safety. As a part of this collaboration Nicolas Miailhe, CEO of PRISM Eval, said, Its incredibly rewarding for us to see Nova outperforming strong baselines using the BET Eval MAX; our aim is to build a long-term partnership toward safer-by-design models and to make BET available to various model providers.” Organizations deploying AI systems should carefully consider these safety metrics when selecting models for their applications.

Manual red teaming with ActiveFence

The AI safety & security company ActiveFence benchmarked Nova Premier on Bedrock on prompts distributed across Amazons eight core RAI categories. ActiveFence also evaluated Claude 3.7 (non-reasoning mode) and GPT 4.1 API on the same set. The flag rate on Nova Premier was lower than that on the other two models, indicating that Nova Premier is the safest of the three.

&lt;tbody&gt;&lt;tr&gt;&lt;td colspan=”1″ rowspan=”1″&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/td&gt;&lt;td colspan=”1″ rowspan=”1″&gt;&lt;b&gt;3P Flag Rate [ is better]&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=”1″ rowspan=”1″&gt;Nova Premier&lt;/td&gt;&lt;td colspan=”1″ rowspan=”1″&gt;12.0%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=”1″ rowspan=”1″&gt;Sonnet 3.7 (non-reasoning)&lt;/td&gt;&lt;td colspan=”1″ rowspan=”1″&gt;20.6%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=”1″ rowspan=”1″&gt;GPT4.1 API &lt;/td&gt;&lt;td colspan=”1″ rowspan=”1″&gt;22.4%&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;

Our role is to think like an adversary but act in service of safety, said Guy Paltieli from ActiveFence. By conducting a blind stress test of Nova Premier under realistic threat scenarios, we helped evaluate its security posture in support of Amazons broader responsible-AI goals, ensuring the model could be deployed with greater confidence.”

These evaluations conducted with PRISM and ActiveFence give us confidence in the strength of our guardrails and our ability to protect our customers safety when they use our models. While these evaluations demonstrate strong safety performance, we recognize that AI safety is an ongoing challenge requiring continuous improvement. These assessments represent a point-in-time snapshot, and we remain committed to regular testing and enhancement of our safety measures. No AI system can guarantee perfect safety in all scenarios, which is why we maintain monitoring and response systems after deployment.

Acknowledgments: Vincent Ponzo, Elyssa Vincent

Research areas: Conversational AI

Tags: Responsible AI

Read More

Part 3: Building an AI-powered assistant for investment research with multi-agent collaboration in Amazon Bedrock and Amazon Bedrock Data Automation

Part 3: Building an AI-powered assistant for investment research with multi-agent collaboration in Amazon Bedrock and Amazon Bedrock Data Automation

In the financial services industry, analysts need to switch between structured data (such as time-series pricing information), unstructured text (such as SEC filings and analyst reports), and audio/visual content (earnings calls and presentations). Each format requires different analytical approaches and specialized tools, creating workflow inefficiencies. Add on top of this the intense time pressure resulting from rapidly evolving industry conditions and narrow decision windows. Delayed analysis can mean missed opportunities or failure to identify emerging risks, with potentially significant financial consequences.

AI-assistants boost productivity by automating routine data collection and processing tasks, surfacing relevant insights, and allowing analysts to focus on higher-value activities. However, a single AI agent struggles with complex, multistep investment research workflows to effectively handle the full spectrum of multiple specialized tasks. This is where multi-agent collaboration provides an advancement. By creating specialized AI subagents that excel in specific tasks and functionalities and using them together under a coordinated framework using an AI supervisor agent, we can address the full complexity of investment research workflows. A supervisor agent can intelligently decompose complex queries, delegate specialized tasks to subagents, and synthesize their outputs into comprehensive answers—similar to how a research team functions in the real world. The benefits of this approach are substantial, stemming from distributed problem-solving and specialization, improved accuracy through specialized expertise, enhanced scalability by adding new agent capabilities without rebuilding the entire system, and greater transparency because the reasoning process of each specialized agent can be tracked and verified.

Amazon Bedrock Agents uses the reasoning of foundation models (FMs), APIs, and data to break down user requests, gather relevant information, and efficiently complete tasks. With the multi-agent collaboration capability of Amazon Bedrock, you can build, deploy, and manage multiple AI agents working together on complex multistep tasks that require specialized skills. Amazon Bedrock Data Automation (BDA) enables the generation of useful insights from unstructured multimodal content such as documents, images, audio, and video for your AI-powered applications, and it can be used as a parser when setting up a knowledge base for Retrieval Augmented Generation (RAG) workflows.

In this post, we walk through how to build a multi-agent investment research assistant using the multi-agent collaboration capability of Amazon Bedrock. Our solution demonstrates how a team of specialized AI agents can work together to analyze financial news, evaluate stock performance, optimize portfolio allocations, and deliver comprehensive investment insights—all orchestrated through a unified, natural language interface. We use BDA as a parser when setting up a knowledge base for RAG workflows with multi-modal data. Amazon Nova understanding models are used as the large language models (LLMs) associated with the supervisor agents and subagents.

Check the video that demonstrates the solution,

Results from these agents are demonstrative, not to be taken as financial advice.

Solution overview

The multi-agent investment research assistant is composed of a supervisor agent and three subagents (quantitative analysis agent, news agent, and smart summarizer agent) working together in a coordinated framework, as shown in the following diagram.

The supervisor agent uses the instructions you provide to understand the structure and role of each collaborator agent. The supervisor agent orchestrates the overall investment research process by breaking down user prompts, delegating subtasks to specialized subagents, and consolidating their outputs to generate the final response. Each subagent is configured to carry out specialized tasks as follows:

  • Quantitative analysis agent
    • Functions – Powered by stock-query and portfolio-optimization action groups.
    • Responsibilities – Query and analyze historical stock data and build optimized portfolio allocations based on user inputs such as stock tickers or investment amount.
  • News agent
    • Functions – Powered by the web_search action group and the financial_analysis_KB knowledge base.
    • Responsibilities – Search and retrieve relevant financial data such as earnings reports and filings from the knowledge base for context. If information isn’t present in the knowledge base, construct a web query. Handles queries related to analyzing company financials, performance drivers, and management commentary.
  • Smart summarizer agent
    • Functions – Uses LLM capabilities with specific prompt instructions, no custom action group.
    • Responsibilities – Takes in output from other subagents, such as recent news and financial data, and synthesizes the information into structured investment insights. Summarizes long texts or documents such as earnings reports into concise insights.

Multi-agent collaboration works as follows:

  1. User prompts the multi-agent investment research assistant with a high-level research query.
  2. Supervisor agent breaks the query into subtasks and orchestrates and invokes the relevant subagents in parallel or sequentially as needed:
    • News agent to retrieve financial reports or data
    • Quantitative analysis agent for stock data analysis
    • Smart summarizer agent to summarize lengthy texts
  3. Supervisor agent consolidates the subagent outputs along with the original query context.
  4. Supervisor agent passes the consolidated information to the LLM to generate the final research insights.

This architecture uses the strengths of the different specialized agents to comprehensively address the various aspects of investment research, and the supervisor agent provides the intelligent orchestration.

Technical architecture

The following diagram is the technical architecture.

Some of the key components and the associated codes of the technical architecture are:

  • Amazon Bedrock Data Automation (BDA) for processing data – The following invokes a BDA job asynchronously using the invoke_data_automation_async() method from the bda-runtime client in boto3:
    bda_response = bda_runtime_client.invoke_data_automation_async(
                inputConfiguration={
                    "s3Uri": s3_uri_input
                },
                outputConfiguration={
                    "s3Uri": s3_uri_output
                },
            )

  • Defining and building subagents with tools – The following creates a subagent using Agent.create. It defines two Lambda tools—one for fetching historical stock data and another for running portfolio optimization—then registers them with the subagent along with its role, goal, instructions, and the underlying FM.
    quantitative_analysis_agent = Agent.create(
        name="quantitative_analysis_agent",
        role="Financial Data Collector",
        goal="Retrieve real-time and historic stock prices as well as optimizing a portfolio given tickers.",
        instructions="""Specialist in real-time financial data extraction and portfolio optimization.
                            ...""",
        tools=[
            # Stock Data Lookup Tool
            {
                "code": stock_data_lookup_arn,
                "definition": {
                    "name": "stock_data_lookup",
                    "description": "Gets the 1-month stock price history for a given stock ticker, formatted as JSON.",
                    "parameters": {
                        "ticker": {"description": "The ticker to retrieve price history for", "type": "string", "required": True}
                    },
                },
            },
        ],
        llm=LLM,
    )

  • Defining and building the supervisor agent (multi-agent investment research assistant) – The following creates a supervisor agent using SupervisorAgent.create. It defines the collaborator agents (subagents), and registers them with the supervisor agent along with its role, goal, instructions, and the underlying FM.
    investment_research_assistant = SupervisorAgent.create(
        "investment_research_assistant",
        role="Investment Research Assistant",
        goal="A seasoned investment research expert responsible for orchestrating subagents ...",
        collaboration_type="SUPERVISOR",
        instructions=f"""You are a Investment Research Assistant, a financial research supervisor overseeing multiple specialized agents. Your goal is to coordinate and synthesize their outputs to create a structured stock investment analysis.
                    ...""",
        collaborator_agents=[
            {
                "agent": "news_agent",
                "instructions": f"Always check the knowledge base (ID: {kb_id}) first. Use this collaborator for finding news and analyzing specific documents."
            },
            {
                "agent": "quantitative_analysis_agent",
                "instructions": "Use this collaborator for retrieving stock price history and performing portfolio optimization."
            },
            {
                "agent": "smart_summarizer_agent",
                "instructions": "Use this collaborator for synthesizing stock trends, financial data, and generating structured investment insights."
            }
        ],
        collaborator_objects=[news_agent, quantitative_analysis_agent, smart_summarizer_agent],
        llm=LLM,
    )

  • Invoking the supervisor agent (multi-agent investment research assistant) – The following uses the invoke method to pass in the defined request as a prompt and get response from the supervisor agent, using the multi-agent collaboration architecture.
    request = """What's AMZN stock price doing over the last week
                   and relate that to recent news"""
    print(f"Request:n{request}n")
    result = investment_research_assistant.invoke(
        request,
        enable_trace=True,
        trace_level="core",
    )
    print(f"Final answer:n{result}")

Prerequisites

To deploy this solution, attach the appropriate permissions for Amazon Bedrock, AWS Lambda, Amazon OpenSearch Serverless, Amazon Simple Storage Service (Amazon S3), and AWS Identity and Access Management (IAM) to your execution role. You also need to deploy two AWS CloudFormation stacks: web_search and stock_data.

For more information on the permission policy, CloudFormation stacks, and setting up the environment, refer to the README.md in the investment_research_agent repository on GitHub.

Dive deeper into the solution

To dive deeper into the solution and the code shown in this post, refer to the investment research assistant agent GitHub repo. The repo contains instructions for the end-to-end solution, including setting up the supervisor agent and subagents, associated action groups, unstructured data (earnings reports PDF files and earnings call audio files), and structured data (stocks time series).

In the appendix at the end of this post, we share different questions asked by a financial analyst to the supervisor agent, the subagent invoked, and the answers from the supervisor agent.

Cleanup

To clean up this solution, make sure to delete all agents, knowledge bases, and associated Amazon OpenSearch Serverless instances. Delete both the web_search and stock_data stacks. This will make sure that no charges are incurred after you’re finished testing this solution. Read the associated notebook in the GitHub repository for more instructions on cleaning up the agents.

Conclusion

The multi-agent investment research assistant built using Amazon Bedrock Agents with its multi-agent collaboration feature represents how financial professionals can use AI to enhance their analytical capabilities. By using specialized agents—each focusing on distinct aspects of financial analysis—orchestrated by a supervisor agent, we’ve created a system that can process diverse data types, apply appropriate analytical techniques, and synthesize comprehensive insights in response to complex investment queries. The result is an AI assistant that can handle multistep financial analysis workflows that would otherwise overwhelm a single-agent system.

The application of multi-agent collaboration extends broadly across the financial sector, from equity research and portfolio management to risk assessment and compliance monitoring. A complex financial workflow that requires multiple specialized skills and data sources can benefit from this architectural approach.

The implementation uses key AWS services, including Amazon Bedrock for foundation models, the multi-agent collaboration capability of Amazon Bedrock Agents, Amazon S3 for document storage, Amazon OpenSearch Serverless for vector search capabilities, and Lambda functions for specialized actions. This serverless architecture provides scalability, and the knowledge base integration provides context from financial documents to ground the AI’s responses in factual data.

For more information:

  • Get started building your own multi-agent investment research assistant by accessing our complete solution code and implementation guide in the GitHub repo.
  • Explore how to customize the solution for your specific investment research needs by modifying the subagents’ specialized tools and action groups and by bringing your data securely to Amazon Bedrock Knowledge Bases.
  • You can also explore and run Amazon Bedrock multi-agent collaboration workshop with AWS specialists or on your own.

Read more about single agent solutions for investment research with multi-modal data:

Read the blog post on Unlocking complex problem-solving with multi-agent collaboration on Amazon Bedrock for more details on benefits, design, and evaluation of multi-agent collaboration for enterprise applications.


About the Authors

Sovik Kumar Nath is an AI/ML and Generative AI senior solution architect with AWS. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. He has double masters degrees from the University of South Florida, University of Fribourg, Switzerland, and a bachelors degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, taking ferry rides, and watching movies.

Lucas Banerji is an AI/ML and GenAI specialist Solutions Architect at AWS. He is passionate about building agentic AI systems and exploring the frontier of what’s possible with intelligent automation. Lucas holds a degree in Computer Science from the University of Virginia. Outside of work, he enjoys running, practicing Muay Thai, and traveling the world.

 Mohan Musti is a Principal Technical Account Manger based out of Dallas. Mohan helps customers architect and optimize applications on AWS. Mohan has Computer Science and Engineering from JNT University, India. In his spare time, he enjoys spending time with his family and camping.


Appendix

The following table contains example questions asked and responses from the AI-powered assistant for investment research based on structured and unstructured data.

Question Purpose Question asked by financial analyst to AI-powered assistant Subagent, action group and/or knowledge base invoked Intermediate steps from Supervisor agent chain-of-thought reasoning Multi-agent Investment Research Assistant – Responses
Prompt the agent to analyze the company’s financial health and performance. Analyze Amazon’s financial health based on the 2024 10k report. Calculate important financial ratios. Limit to 5 sentences. news_agent (financial_analysis_kb), smart_summarizer_agent 1) I need to specifically request the 2024 10-K report information from the knowledge base. Let me ask the news agent for this specific report.
2) Since the knowledge base contains recent financial information and forward-looking guidance, I’ll provide a comprehensive summary of Amazon’s financial metrics and projections.
3) Now I’ll ask the analyst agent to analyze this information and calculate key financial ratios.
Based on Amazon’s recent financial data, the company demonstrates strong financial health with total assets of $624.9B and a robust cash position of $78.8B, while generating $143.3B in net sales and $59.2B in net income. Key profitability metrics show improvement with a 46% gross margin, 4.4% operating margin, and 14.8% return on equity, while efficiency ratios remain healthy with inventory turnover at 13.2x and asset turnover at 1.4x. Liquidity ratios (current ratio: 1.06, quick ratio: 0.88) are adequate though tight, but are supported by strong operating cash flow of $54.3B, representing a 38% year-over-year increase. The company maintains a balanced leverage profile with a debt-to-equity ratio of 0.98 and solid interest coverage ratio of 8.5x, indicating good debt management. Despite ongoing high capital expenditures, Amazon’s improving margins, strong cash generation, and efficient operations suggest a healthy financial position with sustainable growth potential.
Prompt the agent to analyze the company’s financial performance from company’s earnings reports What were the main drivers of AWS’s revenue growth in the Q1 2023 quarter? Please limit your answers to 5 sentences. news_agent (financial_analysis_kb), smart_summarizer_agent 1.) I need to get information about AWS’s Q1 2023 earnings report. I’ll ask the news agent to check the knowledge base for this specific information.
2.) Based on the search results, I can provide a comprehensive answer about AWS’s Q1 2023 performance and growth drivers. The information comes directly from their earnings reports.
3.) I’ll synthesize this information into a concise response focusing on the main drivers and send it to the user.
AWS’s Q1 2023 revenue growth was primarily driven by continued cloud adoption, achieving net sales of $21.4 billion and a 16% year-over-year growth rate. However, the growth was tempered by customers across all segments optimizing their cloud spending in response to economic uncertainty. The business maintained strong fundamentals with an annualized sales run rate exceeding $85 billion, though momentum showed signs of slowing with April growth rates declining by approximately 500 basis points compared to Q1.
Prompt the agent to analyze the company’s management team and their plans for the future from company’s earnings reports What are Amazon’s top priorities for 2024? What are the biggest risks facing the company? Please limit your answers to 5 sentences. news_agent (financial_analysis_kb & web_search), smart_summarizer_agent 1.) I’ll ask the news agent to check the knowledge base for recent information about Amazon’s strategic priorities and risk factors for 2024.
2.) Let me get additional recent news and risk factors by asking the news agent to search for supplementary information.
3.) Based on the recent news searches, I can now provide a comprehensive overview of the significant risks currently facing Amazon
Amazon’s top priorities for 2024 include expanding international stores, growing grocery and B2B retail segments, scaling the “Buy with Prime” program, advancing healthcare initiatives, and developing low Earth orbit satellite capabilities for improved connectivity. The company faces several significant risks, including substantial AI-related capital expenditure pressures (expected $75 billion in 2024), increasing regulatory scrutiny from the FTC and competition authorities, potential impacts from new Chinese import tariffs, and intensifying competition in both cloud computing and retail sectors. Amazon is balancing these ambitious growth initiatives with cost optimization efforts, while viewing their strategic investments as potentially transformative for their long-term future.
Prompt the agent to build an optimal portfolio using the collected data Optimize my portfolio with AAA, BBB, and CCC [anonymized tickers] quantitative_analysis_agent (stock_data_lookup & portfolio tool) 1.) I need to get historical stock data for these three stocks before running portfolio optimization. I’ll ask the stock data agent for this information.
2.) Now that I have retrieved the historical price data for all three stocks, I’ll perform portfolio optimization using this data.
3.) I’ll synthesize this information into a clear, comprehensive response for the user.
Based on portfolio optimization analysis, here is the recommended allocation: Company A (AAA): 61.04% Company B (BBB): 32.54% Company C (CCC): 6.42% This allocation optimizes risk-adjusted returns based on recent market data. Please note that these allocations are based on mathematical optimization and historical data. Consider your personal investment goals, risk tolerance, and market conditions when making investment decisions.
Prompt the agent to look up news and relate it to stock data What was happening to AMZN stock in November 2024, relate that to news in the same timeframe news_agent (web_search), quantitative_analysis_agent (stock_data_lookup), & smart_summarizer_agent 1.) I’ll need to get both recent stock price data and news, then synthesize them together. I’ll request both simultaneously to be efficient.
2.) I’ll now synthesize the stock price movement with the recent news developments to provide a comprehensive analysis.
November 2024: Stock Price Movement: – Trading Range: $161.43 (low) to $206.21 (high) – Overall Trend: Significant upward momentum with notable volatility – Key Technical Levels: Resistance at $206, Support at $190. Key News Impact: 1. Q4 2024 Guidance (Released Oct 31): – Projected net sales: $181.5-188.5 billion – Expected growth: 7-11% year-over-year – This guidance served as a major catalyst for the stock’s movement Price-News Correlation: – The strong upward movement (~27% range) appears to be primarily driven by positive Q4 guidance – Investor confidence was bolstered by the projected sales growth – The wide guidance range reflected optimism around the holiday shopping season – Higher trading volumes were observed during price movements, indicating strong market participation The stock’s performance suggests that investors responded positively to Amazon’s forward-looking guidance, particularly regarding holiday season expectations. The volatility in price movements indicates active trading around these news events, with the market generally maintaining an optimistic outlook on Amazon’s near-term prospects.

Read More

A generative AI prototype with Amazon Bedrock transforms life sciences and the genome analysis process

A generative AI prototype with Amazon Bedrock transforms life sciences and the genome analysis process

It takes biopharma companies over 10 years, at a cost of over $2 billion and with a failure rate of over 90%, to deliver a new drug to patients. The Market to Molecule (M2M) value stream process, which biopharma companies must apply to bring new drugs to patients, is resource-intensive, lengthy, and highly risky. Nine out of ten biopharma companies are AWS customers, and helping them streamline and transform the M2M processes can help deliver drugs to patients faster, reduce risk, and bring value to our customers.

Pharmaceutical companies are taking a new approach to drug discovery, looking for variants in the human genome and linking them to diseases. This genetic validation approach can improve the success ratio in the M2M value stream process by focusing on the root cause of disease and the gene variants.

As depicted in the following M2M value stream diagram, the Research process (and the Basic Research sub-process) is critical to downstream processes where linking the gene variant to a disease occurs, and is instrumental in defining the target molecule. This can be a critical step in expediting and reducing the cost of delivering a new drug to patients.

To transform the M2M value stream process, our customer has been working on associating genes with diseases by using their large dataset of over 2 million sequenced exomes (genes that are expressed into proteins). To accomplish this, the customer’s clinical scientists have to develop methods to navigate through the enormous dataset by using online genome browsers, a mechanical data-first experience that doesn’t fully meet the needs of users. Starting with a search query to get results, the typical interactions of navigating levels, filtering, waiting, and repeating the search can be time-consuming and tedious. Simplifying the UI from the traditional human browser to a conversational AI assistant can enhance the user experience in the clinical research process.

Generative AI is a promising next step in the evolutionary process of leading this change. As generative AI started to make significant impact in healthcare and life sciences, this use case was primed for generative AI experimentation. In collaboration with the customer, AWS built a custom approach of posting a question or a series of questions, allowing scientists to have more flexibility and agility for exploring the genome. Our customer aimed at saving researchers countless hours of work using a new generative AI-enabled gene assistant. By asking a question, or a series of questions, scientists have more flexibility and agility in exploring the genome. Identifying variants and their potential correlation with diseases can be done more efficiently using words, rather than filters, settings, and buttons. With a more streamlined research process, we can help increase the likelihood of leading to new breakthroughs.

This post explores deploying a text-to-SQL pipeline using generative AI models and Amazon Bedrock to ask natural language questions to a genomics database. We demonstrate how to implement an AI assistant web interface with AWS Amplify and explain the prompt engineering strategies adopted to generate the SQL queries. Finally, we present instructions to deploy the service in your own AWS account. Amazon Bedrock is a fully managed service that provides access to large language models (LLMs) and other foundation models (FMs) from leading AI companies through a single API, allowing you to use it instantly without much effort, saving developers valuable time. We used the AWS HealthOmics variant stores to store the Variant Call Format (VCF) files with omics data. A VCF file is typically the output of a bioinformatics pipeline. VCFs encode Single Nucleotide Polymorphisms (SNPs) and other structural genetic variants. The format is further described on the 1000 Genomes project website. We used the AWS HealthOmics – End to End workshop to deploy the variants and annotation stores.

Although this post focuses on a text-to-SQL approach to an omics database, the generative AI approaches discussed here can be applied to a variety of complex schemas of relational databases.

Text-to-SQL for genomics data

Text-to-SQL is a task in natural language processing (NLP) to automatically convert natural language text into SQL queries. This involves translating the written text into a structured format and using it to generate an accurate SQL query that can run on a database. The task is difficult because there are big differences between human language, which is flexible, ambiguous, and dependent on context, and SQL, which is structured.

Before LLMs for text-to-SQL, user queries had to be preprocessed to match specific templates, which were then used to rephrase the queries. This approach was use case-specific and required data preparation and manual work. Now, with LLMs, the text-to-SQL task has undergone a major transformation. LLMs continue to showcase key performance improvements in generating valid SQL queries from natural language queries. Relying on pre-trained models trained on massive datasets, LLMs can identify the relationships between words in language and accurately predict the next ones to be used.

However, although LLMs have remarkable performance in many text-to-SQL problems, they have limitations that lead to hallucinations. This post describes the main approaches used to overcome these limitations.

There are two key strategies to achieve high accuracy in text-to-SQL services:

  • Prompt engineering – The prompt is structured to annotate different components, such as pointing to columns and schemas, and then instructing the model on which type of SQL to create. These annotations act as instructions that guide the model in formatting the SQL output correctly. For example, a prompt might contain annotations showing specific table columns and guiding the model to generate a SQL query. This approach allows for more control over the model’s output by explicitly specifying the desired structure and format of the SQL query.
  • Fine-tuning – You can start with a pre-trained model on a large general text corpus and then proceed with an instruction-based fine-tuning with labeled examples to improve the model’s performance on text-to-SQL tasks. This process adapts the model to the target task by directly training it on the end task, but it requires a substantial number of text-SQL examples.

This post focuses on the prompt engineering strategy for SQL generation. AWS customers deploy prompt engineering strategies first because they’re efficient in returning high-quality results and require a less complex infrastructure and process. For more details and best practices on when to follow each approach, refer to Best practices to build generative AI applications on AWS.

We experimented with prompt engineering using chain-of-thought and tree-of-thought approaches to improve the reasoning and SQL generation capabilities. The chain-of-thought prompting technique guides the LLMs to break down a problem into a series of intermediate steps or reasoning steps, explicitly expressing their thought process before arriving at a definitive answer or output.

Using prompts, we compelled the LLM to generate a series of statements about its own reasoning, allowing the LLM to articulate its reasoning process to produce accurate and understandable outputs. The tree-of-thought approach introduces a structured branching approach to the reasoning process. Instead of a linear chain, we prompt the LLM to generate a tree-like structure, where each node represents a sub-task, sub-question, or intermediate step in the overall problem-solving process.

Solution Overview

The following architecture depicts the solution and AWS services we used to accomplish the prototype.

The workflow consists of the following steps:

  1. A scientist submits a natural language question or request to a chat web application connected through Amplify and integrated with an AWS AppSync GraphQL API.
  2. The request is submitted to Amazon API Gateway, which transfers the request to an AWS Lambda function that contains the text-to-SQL implementation. We recommend the implementation of a second helper Lambda function to fetch variants data, or gene names, or ClinVar listed diseases, to simplify the user experience and facilitate the SQL generation process.
  3. The text-to-SQL Lambda function receives the natural language request, merges the input question with the prompt template, and submits to Amazon Bedrock to generate the SQL.
    • Our implementation also adds a step to simplify the incoming history into a single request. We submit a request to Amazon Bedrock to transform the historical inputs from that user session into a simplified natural language request. This step is optional.
  4. With the generated SQL, the Lambda function submits the query to Amazon Athena to retrieve the genomic data from the Amazon Simple Storage Service (Amazon S3) bucket.
    • If successful, the Lambda function updates the user session stored in Amazon DynamoDB through an AWS AppSync request. That change will automatically appear on the UI that is subscribed to changes to the session table.
    • If an error occurs, the code attempts to re-generate the SQL query, passing the returned error as input and requesting it to fix the error. The Lambda function then reruns the re-generated SQL against Athena and returns the result.

Generative AI approaches to text-to-SQL

We tested the following prompt-engineering strategies:

  • LLM SQL agents
  • LLM with Retrieval Augmented Generation (RAG) to detect tables and columns of interest
  • Prompt engineering with full description of tables and columns of interest
  • Prompt engineering with chain-of-thought and tree-of-thought approaches
  • Prompt engineering with a dynamic few-shot approach

We didn’t achieve good results with SQL agents. We experimented with LangChain SQL agents. It was difficult for the agent to use contextual information from the dataset to generate accurate and syntactically correct SQL. A big challenge in omics data is that certain columns are arrays of structs or maps. At the time of building this project, the agents were incapable of detecting these nuances and failed to generate relevant SQL.

We experimented with a RAG approach to retrieve relevant tables and columns, given a user question. Then we informed the LLM by prompting it to generate a SQL query using only those tables and columns. A motivation behind this experiment is that a RAG approach can deal well with hundreds or thousands of columns or tables. However, this approach also didn’t return good results. This RAG approach returned too many irrelevant variables to be used in each SQL generation.

The next three approaches were successful, and we used them in combination to get the highest accuracy on synthetically correct SQL generation.

A first prompt idea we tested was to provide a full description of the main tables and columns to be used in the SQL generation given a user question. In the following example, we show a snapshot of the prompts used to describe the 1000 Genome variants tables. The goal of the prompt with database tables and column descriptions is to teach the LLM how to use the schema to generate queries. We approached it as if teaching a new developer that will write queries to that database, with examples of SQL queries to extract the correct dataset, how to filter the data, and only using the most relevant columns.

<table>
       <table_name>
       variants
       </table_name>
       <table_description>
       This table contains information about genetic variants.
       </table_description>
       <column>
              <column_name>contigname</column_name>
              <column_description>
This column specifies the name of the contig (a contiguous sequence of DNA) or chromosome where the variant is located. It is typicauy prefixed with "chr". If the user asks for variants at the chromossome 22, use `chr22` to access variants in this table.
              </column_description>
              <example_use>
                      setect *
                      from variants
                      wnere contigname = 'chr22'
                      and start between 45509414 and 45509418;
              </example_use>
       </column>
       <column>
              <column_name>start</column_name>
              <column_description>
                      The start position of the variant on the chromosome. This should
                      be used to compose the primary key of the variant, along with the
                      following tables: `contigname`, `end`, `referenceallele`, `alternatealleles`.
              </column_description>
              <example_use>
                      SELECT * FROM variants WHERE start > 100000 and end < 200000;
              </example_use>
       </column>
</table>

The team also worked with the creation of a prompt that used the concept of chain-of-thought and its evolution tree-of-thought to improve the reasoning and SQL generation capabilities.

The chain-of-thought prompting technique encourages LLMs to break down a problem into a series of intermediate steps, explicitly expressing their thought process before arriving at a definitive answer or output. This approach takes inspiration from the way humans often break down problems into smaller, manageable parts.

Through the use of prompts, we compelled the LLM to generate a chain-of-thought, letting the LLM articulate its reasoning process and produce more accurate and understandable outputs. This technique has the potential to improve performance on tasks that require multi-step reasoning, such as SQL generation from open-ended natural language questions. This approach presented excellent results with the FM that we tested.

As a next step in our experimentation, we used the tree-of-thought technique to generate even better results than the chain-of-thought approach. The tree-of-thought approach introduces a more structured and branching approach to the reasoning process. Instead of a linear chain, we prompt the LLM to generate a tree-like structure, where each node represents a sub-task, sub-question, or intermediate step in the overall problem-solving process. The following example presents how we used these two approaches in the prompt template:

Imagine three different experts are answering this question. All experts will write down 1 step 
of their thinking, then share it with the group. Then all experts will go on to the next step, etc. 
If any expert realises they're wrong at any point then they leave. Each of the three experts should 
explain their thinking along with the generated SQL statement. Your final step is to review the 
generated SQL code for syntax errors. Pay close attention to any use of the UNNEST function - it 
MUST be immediately followed by 'AS t(unpacked)' rather than 'AS t' . If you find a syntax error 
with the generated SQL, produce a corrected version within <SQL_FIXED> tags. Only produce 
the <SQL_FIXED> code if you find a syntax problem in the <SQL_QUERY> tags.

Finally, we tested a few-shot and a dynamic few-shot approach. The few-shot approach is a prompting technique used in prompt engineering for LLMs. It involves providing the LLM with a few examples or demonstrations, along with the input prompt, to guide the model’s generation or output. In the few-shot setting, the prompt comprises the following:

  • An instruction or task description
  • A few examples or demonstrations of the desired output, given a specific input
  • The new input for which the LLM will generate an output

By exposing the LLM to these examples, the model recognizes better patterns and infers the underlying rules or mappings between the input and desired output.

The dynamic few-shot approach extends the few-shot prompting technique. It introduces the concept of dynamically generating or selecting the examples or demonstrations used in the prompt, based on the specific input or context. In this approach, instead of providing a fixed set of examples, the prompt generation process involves:

  • Analyzing the input or context
  • Creating embeddings of the examples and of the input, and retrieving or generating relevant examples or demonstrations tailored to the specific input by applying a semantic search
  • Constructing the prompt with the selected examples and the input

Conclusion

This post demonstrated how to implement a text-to-SQL solution to democratize the access to omics data for users that aren’t data analytics specialists. The approach used HealthOmics and Amazon Bedrock to generate SQL based on natural language queries. This approach has the potential to provide access to omics data to a larger audience than what is available today.

The code is available in the accompanying GitHub repo. The deployment instructions for the HealthOmics variants and annotation store can be found in the AWS HealthOmics – End to End workshop. The deployment instructions for the text-to-SQL project are available in the README file.

We would like to acknowledge Thomaz Silva and Saeed Elnaj for their contributions to this blog. It couldn’t have been done without them.


About the Authors

Ganesh Raam Ramadurai is a Senior Technical Program Manager at Amazon Web Services (AWS), where he leads the PACE (Prototyping and Cloud Engineering) team. He specializes in delivering innovative, AI/ML and Generative AI-driven prototypes that help AWS customers explore emerging technologies and unlock real-world business value. With a strong focus on experimentation, scalability, and impact, Ganesh works at the intersection of strategy and engineering—accelerating customer innovation and enabling transformative outcomes across industries.

Jeff Harman is a Senior Prototyping Architect on the Amazon Web Services (AWS) Prototyping and Cloud Engineering team, he specializes in developing innovative solutions that leverage AWS’s cloud infrastructure to meet complex business needs. Jeff Harman is a seasoned technology professional with over three decades of experience in software engineering, enterprise architecture, and cloud computing. Prior to his tenure at AWS, Jeff held various leadership roles at Webster Bank, including Vice President of Platform Architecture for Core Banking, Vice President of Enterprise Architecture, and Vice President of Application Architecture. During his time at Webster Bank, he was instrumental in driving digital transformation initiatives and enhancing the bank’s technological capabilities. He holds a Master of Science degree from the Rochester Institute of Technology, where he conducted research on creating a Java-based, location-independent desktop environment—a forward-thinking project that anticipated the growing need for remote computing solutions. Based in Unionville, Connecticut, Jeff continues to be a driving force in the field of cloud computing, applying his extensive experience to help organizations harness the full potential of AWS technologies.

Kosal Sen is a Design Technologist on the Amazon Web Services (AWS) Prototyping and Cloud Engineering team. Kosal specializes in creating solutions that bridge the gap between technology and actual human needs. As an AWS Design Technologist, that means building prototypes on AWS cloud technologies, and ensuring they bring empathy and value into the real world. Kosal has extensive experience spanning design, consulting, software development, and user experience. Prior to AWS, Kosal held various roles where he combined technical skillsets with human-centered design principles across enterprise-scale projects.

Read More

Gemma 3 27B model now available on Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

Gemma 3 27B model now available on Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

We are excited to announce the availability of Gemma 3 27B Instruct models through Amazon Bedrock Marketplace and Amazon SageMaker JumpStart. With this launch, developers and data scientists can now deploy Gemma 3, a 27-billion-parameter language model, along with its specialized instruction-following versions, to help accelerate building, experimentation, and scalable deployment of generative AI solutions on AWS.

In this post, we show you how to get started with Gemma 3 27B Instruct on both Amazon Bedrock Marketplace and SageMaker JumpStart, and how to use the model’s powerful instruction-following capabilities in your applications.

Overview of Gemma 3 27B

Gemma 3 27B is a high-performance, open-weight, multimodal language model by Google designed to handle both text and image inputs with efficiency and contextual understanding. It introduces a redesigned attention architecture, enhanced multilingual support, and extended context capabilities. With its optimized memory usage and support for large input sequences, it is well-suited for complex reasoning tasks, long-form interactions, and vision-language applications. With 27 billion parameters and training on up to 6 trillion tokens of text, these models are optimized for tasks requiring advanced reasoning, multilingual capabilities, and instruction following. According to Google, Gemma3 27B Instruct models are ideal for developers, researchers, and businesses looking to build generative AI applications such as chatbots, virtual assistants, and automated content generation tools. The following are its key features:

  • Multimodal input – Processes text, images, and short videos for unified reasoning across modalities
  • Long context support – Handles up to 128,000 tokens, enabling seamless processing of long documents, conversations, and multimedia transcripts
  • Multilingual support – Offers out-of-the-box support for over 35 languages, with pre-training exposure to more than 140 languages in total
  • Function calling – Facilitates building agentic workflows by using natural‐language interfaces to APIs
  • Memory-efficient inference – Offers architectural updates that reduce KV-cache usage and introduce QK-norm for faster and more accurate outputs

Key use cases for Gemma3, as described by Google, include:

  • Q&A and summarization – Processing and condensing long documents or articles
  • Visual understanding – Image captioning, object identification, visual Q&A, and document understanding
  • Multilingual applications – Building AI assistants and tools across over 140 languages
  • Document processing – Analyzing multi-page articles or extracting information from large texts
  • Automated workflows – Using function calling to create AI agents that can interact with other systems

There are two primary methods for deploying Gemma 3 27B in AWS: The first approach involves using Amazon Bedrock Marketplace, which offers a streamlined way of accessing Amazon Bedrock APIs (Invoke and Converse) and tools such as Amazon Bedrock Knowledge Bases, Amazon Bedrock Agents, Amazon Bedrock Flows, Amazon Bedrock Guardrails, and model evaluation. The second approach is using SageMaker JumpStart, a machine learning (ML) hub, with foundation models (FMs), built-in algorithms, and pre-built ML solutions. You can deploy pre-trained models using either the Amazon SageMaker console or SDK.

Deploy Gemma 3 27B Instruct on Amazon Bedrock Marketplace

Amazon Bedrock Marketplace offers access to over 150 specialized FMs, including Gemma 3 27B Instruct.

Prerequisites

To try the Gemma 3 27B Instruct model using Amazon Bedrock Marketplace, you need the following:

  • An AWS account that will contain all your AWS resources
  • Access to accelerated instances (GPUs) for hosting the large language models (LLMs)

Deploy the model

To deploy the model using Amazon Bedrock Marketplace, complete the following steps:

  1. On the Amazon Bedrock console, under Foundation models in the navigation pane, select Model catalog.
  2. Filter for Gemma as the provider and choose Gemma 3 27B Instruct.

Information about Gemma3’s features, costs, and setup instructions can be found on its model overview page. This resource includes integration examples, API documentation, and programming samples. The model excels at a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. You can also access deployment guidelines and license details to begin implementing Gemma3 into your projects.

  1. Review the model details, pricing, and deployment guidelines, and choose Deploy to start the deployment process.

  1. For Endpoint name, enter an endpoint name (between 1–50 alphanumeric characters) or leave it as the default name that is pre-populated.
  2. For Number of instances, enter a number of instances (between 1–100).
  3. Select your preferred instance type, with GPU-powered options like ml.g5.48xlarge being particularly well-suited for running Gemma 3 efficiently.

Although default configurations are typically sufficient for basic needs, you have the option to customize security features such as virtual private cloud (VPC) networking, role-based permissions, and data encryption. These advanced settings might require adjustment for production environments to maintain compliance with your organization’s security protocols.

Prior to deploying Gemma 3, verify that your AWS account has sufficient quota allocation for ml.g5.48xlarge instances. A quota set to 0 will trigger deployment failures, as shown in the following screenshot.

To request a quota increase, open the AWS Service Quotas console and search for SageMaker. Locate ml.g5.48xlarge for endpoint usage and choose Request quota increase, then specify your required limit value.

  1. While the deployment is in progress, you can choose Managed deployments in the navigation pane to monitor the deployment status.
  2. When deployment is complete, you can test Gemma 3’s capabilities directly in the Amazon Bedrock playground by selecting the managed deployment and choosing Open in playground.

You can now use the playground to interact with Gemma 3.

For detailed steps and example code for invoking the model using Amazon Bedrock APIs, refer to Submit prompts and generate response using the API and the following code:

import boto3
bedrock_runtime = boto3.client("bedrock-runtime")
endpoint_arn = "arn:aws:sagemaker:us-east-2:061519324070:endpoint/endpoint-quick-start-3t7kp"
response = bedrock_runtime.converse(
    modelId=endpoint_arn,
    messages=[
        {
            "role": "user",
            "content": [{"text": "What is Amazon doing in the field of generative AI?"}]
        }
    ],
    inferenceConfig={
        "maxTokens": 256,
        "temperature": 0.1,
        "topP": 0.999
    }
)
print(response["output"]["message"]["content"][0]["text"])

Deploy Gemma 3 27B Instruct with SageMaker JumpStart

SageMaker JumpStart offers access to a broad selection of publicly available FMs. These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. You can use state-of-the-art model architectures—such as language models, computer vision models, and more—without having to build them from scratch.

With SageMaker JumpStart, you can deploy models in a secure environment. The models can be provisioned on dedicated SageMaker inference instances and can be isolated within your VPC. After deploying an FM, you can further customize and fine-tune it using the extensive capabilities of Amazon SageMaker AI, including SageMaker inference for deploying models and container logs for improved observability. With SageMaker AI, you can streamline the entire model deployment process.

There are two ways to deploy the Gemma 3 model using SageMaker JumpStart:

  • Through the user-friendly SageMaker JumpStart interface
  • Using the SageMaker Python SDK for programmatic deployment

We examine both deployment methods to help you determine which approach aligns best with your requirements.

Prerequisites

To try the Gemma 3 27B Instruct model in SageMaker JumpStart, you need the following prerequisites:

Deploy the model through the SageMaker JumpStart UI

SageMaker JumpStart provides a user-friendly interface for deploying pre-built ML models with just a few clicks. Through the SageMaker JumpStart UI, you can select, customize, and deploy a wide range of models for various tasks such as image classification, object detection, and natural language processing, without the need for extensive coding or ML expertise.

  1. On the SageMaker AI console, choose Studio in the navigation pane.
  2. First-time users will be prompted to create a domain.
  3. On the SageMaker Studio console, choose JumpStart in the navigation pane.

The model browser displays available models, with details like the provider name and model capabilities.

  1. Search for Gemma 3 to view the Gemma 3 model card. Each model card shows key information, including:
    • Model name
    • Provider name
    • Task category (for example, Text Generation)
    • The Bedrock Ready badge (if applicable), indicating that this model can be registered with Amazon Bedrock, so you can use Amazon Bedrock APIs to invoke the model

  1. Choose the model card to view the model details page.

The model details page includes the following information:

    • The model name and provider information
    • The Deploy button to deploy the model
    • About and Notebooks tabs with detailed information. The About tab includes important details, such as:
    • Model description
    • License information
    • Technical specifications
    • Usage guidelines

Before you deploy the model, we recommended you review the model details and license terms to confirm compatibility with your use case.

  1. Choose Deploy to proceed with deployment.
  2. For Endpoint name, enter an endpoint name (between 1–50 alphanumeric characters) or leave it as default.
  3. For Instance type, choose an instance type (default: ml.g5.48xlarge).
  4. For Initial instance count, enter the number of instances (default: 1).

Selecting appropriate instance types and counts is crucial for cost and performance optimization. Monitor your deployment to adjust these settings as needed. Under Inference type, Real-time inference is selected by default. This is optimized for sustained traffic and low latency.

  1. Review all configurations for accuracy. For this model, we strongly recommend adhering to SageMaker JumpStart default settings and making sure that network isolation remains in place.
  2. Choose Deploy to deploy the model.

The deployment process can take several minutes to complete.

Deploy the model programmatically using the SageMaker Python SDK

To use Gemma 3 with the SageMaker Python SDK, first make sure you have installed the SDK and set up your AWS permissions and environment correctly. The following is a code example showing how to programmatically deploy and run inference with Gemma 3:

import sagemaker
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker import Session, image_uris
import boto3
# Initialize SageMaker session
session = sagemaker.Session()
role = sagemaker.get_execution_role()
# Specify model parameters
model_id = "huggingface-vlm-gemma-3-27b-instruct"  # or "huggingface-llm-gemma-2b" for the smaller version
instance_type = "ml.g5.48xlarge"  # Choose appropriate instance based on your needs
# Create and deploy the model
model = JumpStartModel(
    model_id=model_id,
    role=role,
    instance_type=instance_type,
    model_version="*",  # Latest version
)
# Deploy the model
predictor = model.deploy(
    initial_instance_count=1,
    accept_eula=True  # Required for deploying foundation models
)

Run inference using the SageMaker API

With your Gemma 3 model successfully deployed as a SageMaker endpoint, you’re now ready to start making predictions. The SageMaker SDK provides a straightforward way to interact with your model endpoint for inference tasks. The following code demonstrates how to format your input and make API calls to the endpoint. The code handles both sending requests to the model and processing its responses, making it straightforward to integrate Gemma 3 into your applications.

import json
import boto3
# Initialize AWS session (ensure your AWS credentials are configured)
session = boto3.Session()
sagemaker_runtime = session.client("sagemaker-runtime")
# Define the SageMaker endpoint name (replace with your deployed endpoint name)
endpoint_name = "hf-vlm-gemma-3-27b-instruct-2025-05-07-18-09-16-221"

payload = {
    "inputs": "What is Amazon doing in the field of generative AI?",
    "parameters": {
        "max_new_tokens": 256,
        "temperature": 0.1,
        "top_p": 0.9,
        "return_full_text": False
    }
}

# Run inference
try:
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="application/json",
        Body=json.dumps(payload)
    )
    # Parse the response
    result = json.loads(response["Body"].read().decode("utf-8"))
    generated_text = result[0]["generated_text"].strip()
    print("Generated Response:")
    print(generated_text)
except Exception as e:
    print(f"Error during inference: {e}")

Clean up

To avoid incurring ongoing charges for AWS resources used during exploration of Gemma3 27B Instruct models, it’s important to clean up deployed endpoints and associated resources. Complete the following steps:

  1. Delete SageMaker endpoints:
    1. On the SageMaker console, in the navigation pane, choose Endpoints under Inference.
    2. Select the endpoint associated with the Gemma3 27B Instruct model (for example, gemma3-27b-instruct-endpoint).
    3. Choose Delete and confirm the deletion. This stops the endpoint and prevents further compute charges.
  2. Delete SageMaker models (if applicable):
  3. On the SageMaker console, choose Models under Inference.
  4. Select the model associated with your endpoint and choose Delete.
  5. Verify Amazon Bedrock Marketplace resources:
  6. On the Amazon Bedrock console, choose Model catalog in the navigation pane.
  7. Make sure no additional endpoints are running for the Gemma3 27B Instruct model deployed through Amazon Bedrock Marketplace.

Always verify that all endpoints are deleted after experimentation to optimize costs. Refer to the Amazon SageMaker documentation for additional guidance on managing resources.

Conclusion

The availability of Gemma3 27B Instruct models in Amazon Bedrock Marketplace and SageMaker JumpStart empowers developers, researchers, and businesses to build cutting-edge generative AI applications with ease. With their high performance, multilingual capabilities and efficient deployment on AWS infrastructure, these models are well-suited for a wide range of use cases, from conversational AI to code generation and content automation. By using the seamless discovery and deployment capabilities of SageMaker JumpStart and Amazon Bedrock Marketplace, you can accelerate your AI innovation while benefiting from the secure, scalable, and cost-effective AWS Cloud infrastructure.

We encourage you to explore the Gemma3 27B Instruct models today by visiting the SageMaker JumpStart console or Amazon Bedrock Marketplace. Deploy the model and experiment with sample prompts to meet your specific needs. For further learning, explore the AWS Machine Learning Blog, the SageMaker JumpStart GitHub repository, and the Amazon Bedrock documentation. Start building your next generative AI solution with Gemma3 27B Instruct models and unlock new possibilities with AWS!


About the Authors

Santosh Vallurupalli is a Sr. Solutions Architect at AWS. Santosh specializes in networking, containers, and migrations, and enjoys helping customers in their journey of cloud adoption and building cloud-based solutions for challenging issues. In his spare time, he likes traveling, watching Formula1, and watching The Office on repeat.

Aravind Singirikonda is an AI/ML Solutions Architect at AWS. He works with AWS customers in the healthcare and life sciences domain to provide guidance and technical assistance, helping them improve the value of their AI/ML solutions when using AWS.

Pawan Matta is a Sr. Solutions Architect at AWS. He works with AWS customers in the gaming industry and guides them to deploy highly scalable, performant architectures. His area of focus is management and governance. In his free time, he likes to play FIFA and watch cricket.

Ajit Mahareddy is an experienced Product and Go-To-Market (GTM) leader with over 20 years of experience in product management, engineering, and GTM. Prior to his current role, Ajit led product management building AI/ML products at leading technology companies, including Uber, Turing, and eHealth. He is passionate about advancing generative AI technologies and driving real-world impact with generative AI.

Read More

Building a multimodal RAG based application using Amazon Bedrock Data Automation and Amazon Bedrock Knowledge Bases

Building a multimodal RAG based application using Amazon Bedrock Data Automation and Amazon Bedrock Knowledge Bases

Organizations today deal with vast amounts of unstructured data in various formats including documents, images, audio files, and video files. Often these documents are quite large, creating significant challenges such as slower processing times and increased storage costs. Extracting meaningful insights from these diverse formats in the past required complex processing pipelines and significant development effort. Before generative AI, organizations had to rely on multiple specialized tools, custom-built solutions, and extensive manual review processes, making it time-consuming and error-prone to process and analyze these documents at scale. Generative AI technologies are revolutionizing this landscape by offering powerful capabilities to automatically process, analyze, and extract insights from these diverse document formats, significantly reducing manual effort while improving accuracy and scalability.

With Amazon Bedrock Data Automation and Amazon Bedrock Knowledge Bases, you can now build powerful multimodal RAG applications with minimal effort. Amazon Bedrock Data Automation provides automated workflows for efficiently processing various file formats at scale, while Amazon Bedrock Knowledge Bases creates a unified, searchable repository that can understand natural language queries. Together, they enable organizations to efficiently process, organize, and retrieve information from their multimodal content, transforming how they manage and use their unstructured data.

In this post, we walk through building a full-stack application that processes multimodal content using Amazon Bedrock Data Automation, stores the extracted information in an Amazon Bedrock knowledge base, and enables natural language querying through a RAG-based Q&A interface.

Real world use cases

The integration of Amazon Bedrock Data Automation and Amazon Bedrock Knowledge Bases enables powerful solutions for processing large volumes of unstructured data across various industries such as:

  • In healthcare, organizations deal with extensive patient records including medical forms, diagnostic images, and consultation recordings. Amazon Bedrock Data Automation automatically extracts and structures this information, while Amazon Bedrock Knowledge Bases enables medical staff to use natural language queries like “What was the patient’s last blood pressure reading?” or “Show me the treatment history for diabetes patients.”
  • Financial institutions process thousands of documents daily, from loan applications to financial statements. Amazon Bedrock Data Automation extracts key financial metrics and compliance information, while Amazon Bedrock Knowledge Bases allows analysts to ask questions like “What are the risk factors mentioned in the latest quarterly reports?” or “Show me all loan applications with high credit scores.”
  • Legal firms handle vast case files with court documents, evidence photos, and witness testimonies. Amazon Bedrock Data Automation processes these diverse sources, and Amazon Bedrock Knowledge Bases lets lawyers query “What evidence was presented about the incident on March 15?” or “Find all witness statements mentioning the defendant.”
  • Media companies can use this integration for intelligent contextual ad placement. Amazon Bedrock Data Automation processes video content, subtitles, and audio to understand scene context, dialogue, and mood, while simultaneously analyzing advertising assets and campaign requirements. Amazon Bedrock Knowledge Bases then enables sophisticated queries to match ads with appropriate content moments, such as “Find scenes with positive outdoor activities for sports equipment ads” or “Identify segments discussing travel for tourism advertisements.” This intelligent contextual matching offers more relevant and effective ad placements while maintaining brand safety.

These examples demonstrate how the extraction capabilities of Amazon Bedrock Data Automation combined with the natural language querying of Amazon Bedrock Knowledge Bases can transform how organizations interact with their unstructured data.

Solution overview

This comprehensive solution demonstrates the advanced capabilities of Amazon Bedrock for processing and analyzing multimodal content (documents, images, audio files, and video files) through three key components: Amazon Bedrock Data Automation, Amazon Bedrock Knowledge Bases, and foundation models available through Amazon Bedrock. Users can upload various types of content including audio files, images, videos, or PDFs for automated processing and analysis.

When you upload content, Amazon Bedrock Data Automation processes it using either standard or custom blueprints to extract valuable insights. The extracted information is stored as JSON in an Amazon Simple Storage Service (Amazon S3) bucket, while job status is tracked through Amazon EventBridge and maintained in Amazon DynamoDB. The solution performs custom parsing of the extracted JSON to create knowledge base-compatible documents, which are then stored and indexed in Amazon Bedrock Knowledge Bases.

Through an intuitive user interface, the solution displays both the uploaded content and its extracted information. Users can interact with the processed data through a Retrieval Augmented Generation (RAG)-based Q&A system, powered by Amazon Bedrock foundation models. This integrated approach enables organizations to efficiently process, analyze, and derive insights from diverse content formats while using a robust and scalable infrastructure deployed using the AWS Cloud Development Kit (AWS CDK).

Architecture

Architecture diagram

The preceding architecture diagram illustrates the flow of the solution:

  1. Users interact with the frontend application, authenticating through Amazon Cognito
  2. API requests are handled by Amazon API Gateway and AWS Lambda functions
  3. Files are uploaded to an S3 bucket for processing
  4. Amazon Bedrock Data Automation processes the files and extracts information
  5. EventBridge manages the job status and triggers post-processing
  6. Job status is stored in DynamoDB and processed content is stored in Amazon S3
  7. A Lambda function parses the processed content and indexed in Amazon Bedrock Knowledge Bases
  8. A RAG-based Q&A system uses Amazon Bedrock foundation models to answer user queries

Prerequisites

Backend

For the backend, you need to have the following prerequisites:

To use the Q&A feature, make sure that you enable access to the Amazon Bedrock foundation models that you’re planning to use, in the required AWS Regions.

  • For models in the dropdown list marked On demand, enable model access in the Region where you deployed this stack.
  • For models in the dropdown list marked CRIS, enable model access in every Region used by the system defined inference profile (cross Regions). For instance, to use Amazon Nova Pro - CRIS US, make sure you enable access to the Amazon Nova Pro model in every Region used by this inference profile: US East (Virginia) us-east-1, US West (Oregon) us-west-2, and US East (Ohio) us-east-2.
  • The models used in this solution include:
    • Anthropic’s Claude 3.5 Sonnet v2.0
    • Amazon Nova Pro v1.0
    • Anthropic’s Claude 3.7 Sonnet v1.0

Frontend

For the frontend, you need to have the following prerequisites:

  • Node/npm: v18.12.1
  • The deployed backend.
  • At least one user added to the appropriate Amazon Cognito user pool (required for authenticated API calls).

Everything you need is provided as open source code in our GitHub repository.

git clone https://github.com/aws-samples/generative-ai-cdk-constructs-samples.git

Deployment guide

This sample application codebase is organized into these key folders:

samples/bedrock-bda-media-solution

├── backend # Backend architecture CDK project
├── images # Images used for documentation
└── frontend # Frontend sample application

Deploy the backend

Use the following steps to deploy the backend AWS resources:

  • If you haven’t already done so, clone this repository:
    git clone https://github.com/aws-samples/generative-ai-cdk-constructs-samples.git

  • Enter the backend directory
    cd samples/multimodal-rag/backend

  • Create a virtualenv on MacOS and Linux:
    python3 -m venv .venv

  • Activate the virtualenv
    source .venv/bin/activate

  • After the virtualenv is activated, you can install the required dependencies.
    pip install -r requirements.txt

  • Bootstrap CDK. Bootstrapping is the process of preparing your AWS environment for use with the AWS CDK.
    cdk bootstrap

  • Run the AWS CDK Toolkit to deploy the backend stack with the runtime resources.
    cdk deploy

To help protect against unintended changes that affect your security posture, the AWS CDK Toolkit prompts you to approve security-related changes before deploying them. You need to answer yes to deploy the stack.

After the backend is deployed, you need to create a user. First, use the AWS CLI to locate the Amazon Cognito user pool ID:

$ aws cloudformation describe-stacks 
--stack-name BDAMediaSolutionBackendStack
--query "Stacks[0].Outputs[?contains(OutputKey, 'UserPoolId')].OutputValue"

[
    "OutputValue": "<region>_a1aaaA1Aa"
]

You can then go to the Amazon Cognito page in the AWS Management Console, search for the user pool, and add users.

Deploy the frontend

The repository provides a demo frontend application. With this, you can upload and review media files processed by the backend application. To deploy the UI, follow these steps:

  • Enter the frontend directory
    cd samples/multimodal-rag/frontend

  • Create a .env file by duplicating the included example.env and replace the property values with the values retrieved from the MainBackendStack outputs.
VITE_REGION_NAME=<BDAMediaSolutionBackendStack.RegionName>
VITE_COGNITO_USER_POOL_ID=<BDAMediaSolutionBackendStack.CognitoUserPoolId>
VITE_COGNITO_USER_POOL_CLIENT_ID=<2BDAMediaSolutionBackendStack.CognitoUserPoolClientId>
VITE_COGNITO_IDENTITY_POOL_ID=<BDAMediaSolutionBackendStack.CognitoIdentityPoolId>
VITE_API_GATEWAY_REST_API_ENDPOINT=<BDAMediaSolutionBackendStack.ApiGatewayRestApiEndpoint>
VITE_APP_NAME="Bedrock BDA Multimodal Media Solution"
VITE_S3_BUCKET_NAME=<BDAMediaSolutionBackendStack.BDAInputBucket>

You can run the following script is provided if you want to automate the preceding step:

./generate-dev-env.sh
  • Install the dependencies
    npm install

  • Start the web application
    npm run dev

A URL like http://localhost:5173/ will be displayed, so you can open the web application from your browser. Sign in to the application with the user profile you created in Amazon Cognito.

Set up Amazon Bedrock Data Automation

Before processing files, you need to set up an Amazon Bedrock Data Automation project and configure extraction patterns. The solution provides a control plane interface, shown in the following figure, where you can:

  • View existing Amazon Bedrock Data Automation projects in your account
  • Create new projects and blueprints
  • Select the appropriate project for processing

Setup bda

For specific documentation on how Amazon Bedrock Data Automation works, see How Bedrock Data Automation works.

After deciding the project to use, select it from the dropdown list in the list projects operation card. The selected project will be used for file processing.

Process multimodal content

To begin, go to the home page of the frontend application, shown in the following screenshot, and choose Choose file near the top right corner. Select a file. A tooltip will appear when you hover over the button, displaying the file requirements supported by Amazon Bedrock Data Automation. The application supports various file types that Amazon Bedrock Data Automation can process:

  1. PDF files
  2. Images
  3. Audio files
  4. Video files

Process multimodal content

For ready-to-use sample files, see the back-end/samples folder.

When you upload a file

The following process is triggered when a file is uploaded:

  1. The file is stored in an S3 bucket
  2. An Amazon Bedrock Data Automation job is initiated through the backend API
  3. The job status is tracked and updated in DynamoDB
  4. Extracted information is made available through the UI after processing completes

BDA analysis results

The processing time varies depending on the size of the file. You can check the status of processing tasks by choosing the refresh button. After a job is completed, you can select the file name in the table on the Home page to access the file details.

You can access the job details Amazon Bedrock Data Automation produced by navigating through the tabs on the right side of the screen. The Standard and Custom Output tabs provide details on the extracted information from Amazon Bedrock Data Automation.

Ask questions about your uploaded document

The Q&A tab will provide a chatbot to ask questions about the documents processed. You can select an Amazon Bedrock foundation model from the dropdown list and ask a question. Currently, the following models are supported:

  • Anthropic’s Claude 3.5 Sonnet v2.0
  • Amazon Nova Pro v1.0
  • Anthropic’s Claude 3.7 Sonnet v1.0

In the following image, an Amazon Bedrock foundation model is used to ask questions against the Amazon Bedrock knowledge base. Each processed document has been ingested and stored in the vector store.

bda-qa

Clean up

Delete the stack to avoid unexpected charges.

  1. First make sure to remove data from the S3 buckets created for this solution.
  2. Run CDK destroy
  3. Delete the S3 buckets.
  4. Delete the logs associated with this solution created by the different services in Amazon CloudWatch logs.

Conclusion

This solution demonstrates how the integration of Amazon Bedrock Data Automation and Amazon Bedrock Knowledge Bases represents a significant leap forward in how organizations can process and derive value from their multimodal content. This solution not only demonstrates the technical implementation but also showcases the transformative potential of combining automated content processing with intelligent querying capabilities. By using the AWS serverless architecture and the power of foundation models, you can now build scalable, cost-effective solutions that turn your unstructured data into actionable insights.

At the time of writing, this solution is available in the following AWS Regions: US East (N. Virginia), and US West (Oregon).


About the authors

Author - Lana Zhang Lana Zhang is a Senior Solutions Architect in the AWS World Wide Specialist Organization AI Services team, specializing in AI and generative AI with a focus on use cases including content moderation and media analysis. She’s dedicated to promoting AWS AI and generative AI solutions, demonstrating how generative AI can transform classic use cases by adding business value. She assists customers in transforming their business solutions across diverse industries, including social media, gaming, ecommerce, media, advertising, and marketing.

Alain Krok is a Senior Solutions Architect with a passion for emerging technologies. His experience includes designing and implementing IIoT solutions for the oil and gas industry and working on robotics projects. He enjoys pushing the limits and indulging in extreme sports when he’s not designing software.

Dinesh Sajwan is a Senior Prototyping Architect at AWS. He thrives on working with cutting-edge technologies and leverages his expertise to solve complex business challenges. His diverse technical background enables him to develop innovative solutions across various domains. When not exploring new technologies, he enjoys spending quality time with his family and indulging in binge-watching his favorite shows.

Read More

Tailoring foundation models for your business needs: A comprehensive guide to RAG, fine-tuning, and hybrid approaches

Tailoring foundation models for your business needs: A comprehensive guide to RAG, fine-tuning, and hybrid approaches

Foundation models (FMs) have revolutionised AI capabilities, but adopting them for specific business needs can be challenging. Organizations often struggle with balancing model performance, cost-efficiency, and the need for domain-specific knowledge. This blog post explores three powerful techniques for tailoring FMs to your unique requirements: Retrieval Augmented Generation (RAG), fine-tuning, and a hybrid approach combining both methods. We dive into the advantages, limitations, and ideal use cases for each strategy.

AWS provides a suite of services and features to simplify the implementation of these techniques. Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock Knowledge Bases provides native support for RAG, streamlining the process of enhancing model outputs with domain-specific information. Amazon Bedrock also offers native features for model customizations through continued pre-training and fine-tuning. In addition, you can use Amazon Bedrock Custom Model Import to bring and use your customized models alongside existing FMs through a single serverless, unified API. Use Amazon Bedrock Model Distillation to use smaller, faster, more cost-effective models that deliver use-case specific accuracy that is comparable to the most advanced models in Amazon Bedrock.

For this post, we have used Amazon SageMaker AI for the fine-tuning and hybrid approach to maintain more control over the fine-tuning script and try different fine-tuning methods. In addition, we have used Amazon Bedrock Knowledge Bases for the RAG approach as shown in Figure 1.

To help you make informed decisions, we provide ready-to-use code in our Github repo, using these AWS services to experiment with RAG, fine-tuning, and hybrid approaches. You can evaluate their performance based on your specific use case and your dataset, and use the model that best fits to effectively customize FMs for your business needs.

Figure 1: Architecture diagram for RAG, fine-tuning and hybrid approaches

Retrieval Augmented Generation

RAG is a cost-effective way to enhance AI capabilities by connecting existing models to external knowledge sources. For example, an AI powered customer service chatbot using RAG can answer questions about current product features by first checking the product documentation knowledge base. If a customer asks a question, the system retrieves the specific details from the product knowledge base before composing its response, helping to make sure that the information is accurate and up-to-date.

A RAG approach gives AI models access to external knowledge sources for better responses and has two main steps: retrieval for finding the relevant information from connected data sources and generation using an FM to generate an answer based on the retrieved information.

Fine-tuning

Fine-tuning is a powerful way to customize FMs for specific tasks or domains using additional training data. In fine-tuning, you adjust the model’s parameters using a smaller, labelled dataset relevant to the target domain.

For example, to build an AI powered customer service chatbot, you can fine-tune an existing FM using your own dataset to handle questions about a company’s product features. By training the model on historical customer interactions and product specifications, the fine-tuned model learns the context and the company messaging tone to provide more accurate responses.

If the company launches a new product, the model should be fine-tuned again with new data to update its knowledge and maintain relevance. Fine-tuning helps make sure that the model can deliver precise, context-aware responses. However, it requires more computational resources and time compared to RAG, because the model itself needs to be retrained with the new data.

Hybrid approach

The hybrid approach combines the strengths of RAG and fine-tuning to deliver highly accurate, context-aware responses. Let’s consider an example, a company frequently updates the features of its products. They want to customize their FM using internal data, but keeping the model updated with changes in the product catalog is challenging. Because product features change monthly, keeping the model up to date would be costly and time-consuming.

By adopting a hybrid approach, the company can reduce costs and improve efficiency. They can fine-tune the model every couple of months to keep it aligned with the company’s overall tone. Meanwhile, RAG can retrieve the latest product information from the company’s knowledge base, helping to make sure that responses are up-to-date. Fine-tuning the model also enhances RAG’s performance during the generation phase, leading to more coherent and contextually relevant responses. If you want to further improve the retrieval phase, you can customize the embedding model, use a different search algorithm, or explore other retrieval optimization techniques.

The following sections provide the background for dataset creation and implementation of the three different approaches

Prerequisites

To deploy the solution, you need:

Dataset description

For the proof-of-concept, we created two synthetic datasets using Anthropic’s Claude 3 Sonnet on Amazon Bedrock.

Product catalog dataset

This dataset is your primary knowledge source in Amazon Bedrock. We created a product catalog which consists of 15 fictitious manufacturing products by prompting Anthropic’s Claude 3 Sonnet using example product catalogs. You should create your dataset in .txt format. The format in the example for this post has the following fields:

  • Product names
  • Product descriptions
  • Safety instructions
  • Configuration manuals
  • Operation instructions

Train and test the dataset

We use the same product catalog we created for the RAG approach as training data to run domain adaptation fine-tuning.

The test dataset consists of question-and-answer pairs about the product catalog dataset created earlier. We used this code in the Question-Answer Dataset Jupyter notebook section to generate the test dataset.

Implementation

We implemented three different approaches: RAG, fine-tuning, and hybrid. See the Readme file for instructions to deploy the whole solution.

RAG

The RAG approach uses Amazon Bedrock Knowledge Bases and consists of two main parts.

To set up the infrastructure:

  1. Update the config file with your required data (details in the Readme)
  2. Run the following commands in the infrastructure folder:
cd infrastructure
./prepare.sh
cdk bootstrap aws://<<ACCOUNT_ID>>/<<REGION>>
cdk synth
cdk deploy --all

Context retrieval and response generation:

  1. The system finds relevant information by searching the knowledge base with the user’s question
  2. It then sends both the user’s question and the retrieved information to Meta LLama 3.1 8b LLM on Amazon Bedrock
  3. The LLM will then generate a response based on the user’s question and retrieved information

Fine-tuning

We used Amazon SageMaker AI JumpStart to fine-tune the Meta Llama 3.1 8b Instruct model using domain adaptation method for 5 epochs. You can adjust the following parameters in the config.py file:

  • Fine-tuning method: You can change the fine-tuning method in the config file, the default is domain_adaptation.
  • Number of epochs: Adjust number of epochs in the config file according to your data size.
  • Fine-tuning template: Change the template based on your use-case. The current one prompts the LLM to answer a customer question.

Hybrid

The hybrid approach combines RAG and fine-tuning, and uses the following high-level steps:

  1. Retrieve the most relevant context based on the user’s question from the Knowledge Base
  2. The fine-tuned model generates answers using the retrieved context

You can customize the prompt template in the config.py file.

Evaluation

For this example, we use three evaluation metrics to measure performance. You can modify src/evaluation.py to implement your own metrics for your evaluation implementation.

Each metric helps you understand different aspects of how well each of the approaches works:

  • BERTScore: BERTScore tells you how similar the generated answers are to the correct answers using cosine similarities. It calculates precision, recall, and F1 measure. We used the F1 measure as the evaluation score.
  • LLM evaluator score: We use different language models from Amazon Bedrock to score the responses from RAG, fine-tuning, and Hybrid approaches. Each evaluation receives both the correct answers and the generated answers and gives a score between 0 and 1 (closer to 1 indicates higher similarity) for each generated answer. We then calculate the final score by averaging all the evaluation scores. The process is shown in the following figure.

Figure 2: LLM evaluator method

  • Inference latency: Response times are important in applications like chatbots, so depending on your use case, this metric might be important in your decision. For each approach, we averaged the time it took to receive a full response for each sample.
  • Cost analysis: To do a full cost analysis, we made the following assumptions:
    • We used one OpenSearch compute unit (OCU) for indexing and another for the search related to document indexing in RAG. See OpenSearch Serverless pricing for more details.
    • We assume an application that has 1,000 users, each of them conducting 10 requests per day with an average of 2,000 input tokens and 1,000 output tokens. See Amazon Bedrock pricing for more details.
    • We used ml.g5.12xlarge instance for fine-tuning and hosting the fine-tuned model. The fine-tuning job took 15 minutes to complete. See SageMaker AI pricing for more details.
    • For fine-tuning and the hybrid approach, we assume that the model instance is up 24/7, which might vary according to your use case.
    • The cost calculation is done for one month.

Based on those assumptions, the cost associated with each of the three approaches is calculated as follows:

  • For RAG: 
    • OpenSearch Serverless monthly costs = Cost of 1 OCU per hour * 2 OCUs * 24 hours * 30 days
    • Total invocations for Meta Llama 3.1 8b = 1000 user * 10 requests * (price per input token * 2,000 + price per output token * 1,000) * 30 days
  • For fine-tuning:
    • (Number of minutes used for the fine-tuning job / 60) * Hourly cost of an ml.g5.12xlarge instance
    • Hourly cost of an ml.g5.12xlarge instance hosting * 24 hours * 30 days
  • For hybrid:
    • OpenSearch Serverless monthly costs = Cost of 1 OCU per hour * 2 OCUs * 24 hours * 30 days
    • (Number of minutes used for the finetuning job / 60) * cost of an ml.g5.12xlarge instance
    • Hourly cost of ml.g5.12xlarge instance hosting * 24 hours * 30 days

Results

You can find detailed evaluation results in two places in the code repository. The individual scores for each sample are in the JSON files under data/output, and a summary of the results is in summary_results.csv in the same folder.

The results shown in the following table show:

  • How each approach (RAG, fine-tuning, and hybrid) performs
  • Their scores from both BERTScore and LLM evaluators
  • The cost analysis for each method calculated for the US East region
Approach Average BERTScore Average LLM evaluator score Average inference time (in seconds) Cost per month (US East region)
RAG 0.8999 0.8200 8.336 ~=350 + 198 ~= 548$
Finetuning 0.8660 0.5556 4.159 ~= 1.77 + 5105 ~= 5107$
Hybrid 0.8908 0.8556 17.700 ~= 350 + 1.77 + 5105 ~= 5457$

Note that the costs for both the fine-tuning and hybrid approach can decrease significantly depending on the traffic pattern if you set the real-time inference endpoint from SageMaker to scaledown to zero instances when not in use. 

Clean up

Follow the cleanup section in the Readme file in order to avoid paying for unused resources.

Conclusion

In this post, we showed you how to implement and evaluate three powerful techniques for tailoring FMs to your business needs: RAG, fine-tuning, and a hybrid approach combining both methods. We provided ready-to-use code to help you experiment with these approaches and make informed decisions based on your specific use case and dataset.

The results in this example were specific to the dataset that we used. For that dataset, RAG outperformed fine-tuning and achieved comparable results to the hybrid approach with a lower cost, but fine-tuning led to the lowest latency. Your results will vary depending on your dataset.

We encourage you to test these approaches using our code as a starting point:

  1. Add your own datasets in the data folder
  2. Fill out the config.py file
  3. Follow the rest of the readme instructions to run the full evaluation

About the Authors

Idil Yuksel is a Working Student Solutions Architect at AWS, pursuing her MSc. in Informatics with a focus on machine learning at the Technical University of Munich. She is passionate about exploring application areas of machine learning and natural language processing. Outside of work and studies, she enjoys spending time in nature and practicing yoga.

Karim Akhnoukh is a Senior Solutions Architect at AWS working with customers in the financial services and insurance industries in Germany. He is passionate about applying machine learning and generative AI to solve customers’ business challenges. Besides work, he enjoys playing sports, aimless walks, and good quality coffee.

Read More