Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

Retrieval Augmented Generation (RAG) is a popular paradigm that provides additional knowledge to large language models (LLMs) from an external source of data that wasn’t present in their training corpus.

RAG provides additional knowledge to the LLM through its input prompt space and its architecture typically consists of the following components:

  • Indexing: Prepare a corpus of unstructured text, parse and chunk it, and then, embed each chunk and store it in a vector database.
  • Retrieval: Retrieve context relevant to answering a question from the vector database using vector similarity. Use prompt engineering to provide this additional context to the LLM along with the original question. The LLM will then use the original question and the context from the vector database to generate an answer based on data that wasn’t part of its training corpus.

Challenges in RAG accuracy

Pre-trained embedding models are typically trained on large, general-purpose datasets like Wikipedia or web-crawl data. While these models capture a broad range of semantic relationships and can generalize well across various tasks, they might struggle to accurately represent domain-specific concepts and nuances. This limitation can lead to suboptimal performance when using these pre-trained embeddings for specialized tasks or domains, such as legal, medical, or technical domains. Furthermore, pre-trained embeddings might not effectively capture the contextual relationships and nuances that are specific to a particular task or domain. For example, in the legal domain, the same term can have different meanings or implications depending on the context, and these nuances might not be adequately represented in a general-purpose embedding model.

To address the limitations of pre-trained embeddings and improve the accuracy of RAG systems for specific domains or tasks, it’s essential to fine tune the embedding model on domain-specific data. By fine tuning the model on data that is representative of the target domain or task, the model can learn to capture the relevant semantics, jargon, and contextual relationships that are crucial for that domain.

Domain-specific embeddings can significantly improve the quality of vector representations, leading to more accurate retrieval of relevant context from the vector database. This, in turn, enhances the performance of the RAG system in terms of generating more accurate and relevant responses.

This post demonstrates how to use Amazon SageMaker to fine tune a Sentence Transformer embedding model and deploy it with an Amazon SageMaker Endpoint. The code from this post and more examples are available in the GitHub repo. For more information about fine tuning Sentence Transformer, see Sentence Transformer training overview.

Fine tuning embedding models using SageMaker

SageMaker is a fully managed machine learning service that simplifies the entire machine learning workflow, from data preparation and model training to deployment and monitoring. It provides a seamless and integrated environment that abstracts away the complexities of infrastructure management, allowing developers and data scientists to focus solely on building and iterating their machine learning models.

One of the key strengths of SageMaker is its native support for popular open source frameworks such as TensorFlow, PyTorch, and Hugging Face transformers. This integration enables seamless model training and deployment using these frameworks, their powerful capabilities and extensive ecosystem of libraries and tools.

SageMaker also offers a range of built-in algorithms for common use cases like computer vision, natural language processing, and tabular data, making it easy to get started with pre-built models for various tasks. SageMaker also supports distributed training and hyperparameter tuning, allowing for efficient and scalable model training.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Steps to fine tune embedding models on Amazon SageMaker

In the following sections, we use a SageMaker JupyterLab to walk through the steps of data preparation, creating a training script, training the model, and deploying it as a SageMaker endpoint.

We will fine tune the embedding model sentence-transformers, all-MiniLM-L6-v2, which is an open source Sentence Transformers model fine tuned on a 1B sentence pairs dataset. It maps sentences and paragraphs to a 384-dimensional dense vector space and can be used for tasks like clustering or semantic search. To fine tune it, we will use the Amazon Bedrock FAQs, a dataset of question and answer pairs, using the MultipleNegativesRankingLoss function.

In Losses, you can find the different loss functions that can be used to fine-tune embedding models on training data. The choice of loss function plays a critical role when fine tuning the model. It determines how well our embedding model will work for the specific downstream task.

The MultipleNegativesRankingLoss function is recommended when you only have positive pairs in your training data, for example, only pairs of similar texts like pairs of paraphrases, pairs of duplicate questions, pairs of query and response, or pairs of (source_language and target_language).

In our case, considering that we’re using Amazon Bedrock FAQs as training data, which consists of pairs of questions and answers, the MultipleNegativesRankingLoss function could be a good fit.

The following code snippet demonstrates how to load a training dataset from a JSON file, prepares the data for training, and then fine tunes the pre-trained model. After fine tuning, the updated model is saved.

The EPOCHS variable determines the number of times the model will iterate over the entire training dataset during the fine-tuning process. A higher number of epochs typically leads to better convergence and potentially improved performance but might also increase the risk of overfitting if not properly regularized.

In this example, we have a small training set consisting of only 100 records. As a result, we’re using a high value for the EPOCHS parameter. Typically, in real-world scenarios, you would have a much larger training set. In such cases, the EPOCHS value should be a single- or two-digit number to avoid overfitting the model to the training data.

from sentence_transformers import SentenceTransformer, InputExample, losses, evaluation
from torch.utils.data import DataLoader
from sentence_transformers.evaluation import InformationRetrievalEvaluator
import json

def load_data(path):
    """Load the dataset from a JSON file."""
    with open(path, 'r', encoding='utf-8') as f:
        data = json.load(f)
    return data

dataset = load_data("training.json")


# Load the pre-trained model
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Convert the dataset to the required format
train_examples = [InputExample(texts=[data["sentence1"], data["sentence2"]]) for data in dataset]

# Create a DataLoader object
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=8)

# Define the loss function
train_loss = losses.MultipleNegativesRankingLoss(model)

EPOCHS=100

model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    epochs=EPOCHS,
    show_progress_bar=True,
)

# Save the fine-tuned model
model.save("opt/ml/model/",safe_serialization=False)

To deploy and serve the fine-tuned embedding model for inference, we create an inference.py Python script that serves as the entry point. This script implements two essential functions: model_fn and predict_fn, as required by SageMaker for deploying and using machine learning models.

The model_fn function is responsible for loading the fine-tuned embedding model and the associated tokenizer. The predict_fn function takes input sentences, tokenizes them using the loaded tokenizer, and computes their sentence embeddings using the fine-tuned model. To obtain a single vector representation for each sentence, it performs mean pooling over the token embeddings followed by normalization of the resulting embedding. Finally, predict_fn returns the normalized embeddings as a list, which can be further processed or stored as required.

%%writefile opt/ml/model/inference.py

from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F
import os

def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


def model_fn(model_dir, context=None):
  # Load model from HuggingFace Hub
  tokenizer = AutoTokenizer.from_pretrained(f"{model_dir}/model")
  model = AutoModel.from_pretrained(f"{model_dir}/model")
  return model, tokenizer

def predict_fn(data, model_and_tokenizer, context=None):
    # destruct model and tokenizer
    model, tokenizer = model_and_tokenizer
    
    # Tokenize sentences
    sentences = data.pop("inputs", data)
    encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

    # Compute token embeddings
    with torch.no_grad():
        model_output = model(**encoded_input)

    # Perform pooling
    sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

    # Normalize embeddings
    sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
    
    # return dictonary, which will be json serializable
    return {"vectors": sentence_embeddings[0].tolist()}

After creating the inference.py script, we package it together with the fine-tuned embedding model into a single model.tar.gz file. This compressed file can then be uploaded to an S3 bucket, making it accessible for deployment as a SageMaker endpoint.

import boto3
import tarfile
import os

model_dir = "opt/ml/model"
model_tar_path = "model.tar.gz"

with tarfile.open(model_tar_path, "w:gz") as tar:
    tar.add(model_dir, arcname=os.path.basename(model_dir))
    
s3 = boto3.client('s3')

# Get the region name
session = boto3.Session()
region_name = session.region_name

# Get the account ID from STS (Security Token Service)
sts_client = session.client("sts")
account_id = sts_client.get_caller_identity()["Account"]

model_path = f"s3://sagemaker-{region_name}-{account_id}/model_trained_embedding/model.tar.gz"

bucket_name = f"sagemaker-{region_name}-{account_id}"
s3_key = "model_trained_embedding/model.tar.gz"

with open(model_tar_path, "rb") as f:
    s3.upload_fileobj(f, bucket_name, s3_key)

Finally, we can deploy our fine-tuned model in a SageMaker endpoint.

from sagemaker.huggingface.model import HuggingFaceModel
import sagemaker

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=model_path,  # path to your trained SageMaker model
   role=sagemaker.get_execution_role(),                                            # IAM role with permissions to create an endpoint
   transformers_version="4.26",                           # Transformers version used
   pytorch_version="1.13",                                # PyTorch version used
   py_version='py39',                                    # Python version used
   entry_point="opt/ml/model/inference.py",
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge"
)

After the deployment is completed, you can find the deployed SageMaker endpoint in the AWS Management Console for SageMaker by choosing the Inference from the navigation pane, and then choosing Endpoints.

You have multiple options to invoke you endpoint. For example, in your SageMaker JupyterLab, you can invoke it with the following code snippet:

# example request: you always need to define "inputs"
data = {
   "inputs": "Are Agents fully managed?."
}

# request
predictor.predict(data)

It returns the vector containing the embedding of the inputs key:

{'vectors': [0.04694557189941406,
-0.07266131788492203,
-0.058242443948984146,
....,
]}

To illustrate the impact of fine tuning, we can compare the cosine similarity scores between two semantically related sentences using both the original pre-trained model and the fine-tuned model. A higher cosine similarity score indicates that the two sentences are more semantically similar, because their embeddings are closer in the vector space.

Let’s consider the following pair of sentences:

  • What are agents, and how can they be used?
  • Agents for Amazon Bedrock are fully managed capabilities that automatically break down tasks, create an orchestration plan, securely connect to company data through APIs, and generate accurate responses for complex tasks like automating inventory management or processing insurance claims.

These sentences are related to the concept of agents in the context of Amazon Bedrock, although with different levels of detail. By generating embeddings for these sentences using both models and calculating their cosine similarity, we can evaluate how well each model captures the semantic relationship between them.

The original pre-trained model returns a similarity score of only 0.54.

The fine-tuned model returns a similarity score of 0.87.

We can observe how the fine-tuned model was able to identify a much higher semantic similarity between the concepts of agents and Agents for Amazon Bedrock when compared to the pre-trained model. This improvement is attributed to the fine-tuning process, which exposed the model to the domain-specific language and concepts present in the Amazon Bedrock FAQs data, enabling it to better capture the relationship between these terms.

Clean up

To avoid future charges in your account, delete the resources you created in this walkthrough. The SageMaker endpoint and the SageMaker JupyterLab instance will incur charges as long as the instances are active, so when you’re done delete the endpoint and resources that you created while running the walkthrough.

Conclusion

In this blog post, we have explored the importance of fine tuning embedding models to improve the accuracy of RAG systems in specific domains or tasks. We discussed the limitations of pre-trained embeddings, which are trained on general-purpose datasets and might not capture the nuances and domain-specific semantics required for specialized domains or tasks.

We highlighted the need for domain-specific embeddings, which can be obtained by fine tuning the embedding model on data representative of the target domain or task. This process allows the model to capture the relevant semantics, jargon, and contextual relationships that are crucial for accurate vector representations and, consequently, better retrieval performance in RAG systems.

We then demonstrated how to fine tune embedding models on Amazon SageMaker using the popular Sentence Transformers library.

By fine tuning embeddings on domain-specific data using SageMaker, you can unlock the full potential of RAG systems, enabling more accurate and relevant responses tailored to your specific domain or task. This approach can be particularly valuable in domains like legal, medical, or technical fields where capturing domain-specific nuances is crucial for generating high-quality and trustworthy outputs.

This and more examples are available in the GitHub repo. Try it out today using the Set up for single users (Quick setup) on Amazon SageMaker and let us know what you think in the comments.


About the Authors

Ennio Emanuele Pastore is a Senior Architect on the AWS GenAI Labs team. He is an enthusiast of everything related to new technologies that have a positive impact on businesses and general livelihood. He helps organizations in achieving specific business outcomes by using data and AI and accelerating their AWS Cloud adoption journey.

Read More

How BRIA AI used distributed training in Amazon SageMaker to train latent diffusion foundation models for commercial use

How BRIA AI used distributed training in Amazon SageMaker to train latent diffusion foundation models for commercial use

This post is co-written with Bar Fingerman from BRIA AI.

This post explains how BRIA AI trained BRIA AI 2.0, a high-resolution (1024×1024) text-to-image diffusion model, on a dataset comprising petabytes of licensed images quickly and economically. Amazon SageMaker training jobs and Amazon SageMaker distributed training libraries took on the undifferentiated heavy lifting associated with infrastructure management. SageMaker helps you build, train, and deploy machine learning (ML) models for your use cases with fully managed infrastructure, tools, and workflows.

BRIA AI is a pioneering platform specializing in responsible and open generative artificial intelligence (AI) for developers, offering advanced models exclusively trained on licensed data from partners such as Getty Images, DepositPhotos, and Alamy. BRIA AI caters to major brands, animation and gaming studios, and marketing agencies with its multimodal suite of generative models. Emphasizing ethical sourcing and commercial readiness, BRIA AI’s models are source-available, secure, and optimized for integration with various tech stacks. By addressing foundational challenges in data procurement, continuous model training, and seamless technology integration, BRIA AI aims to be the go-to platform for creative AI application developers.

You can also find the BRIA AI 2.0 model for image generation on AWS Marketplace.

This blog post discusses how BRIA AI worked with AWS to address the following key challenges:

  • Achieving out-of-the-box operational excellence for large model training
  • Reducing time-to-train by using data parallelism
  • Maximizing GPU utilization with efficient data loading
  • Reducing model training cost (by paying only for net training time)

Importantly, BRIA AI was able to use SageMaker while keeping the initially used HuggingFace Accelerate (Accelerate) software stack intact. Thus, transitioning to SageMaker training didn’t require changes to BRIA AI’s model implementation or training code. Later, BRIA AI was able to seamlessly evolve their software stack on SageMaker along with their model training.

Training pipeline architecture

Training pipeline architecture

BRIA AI’s training pipeline consists of two main components:

Data preprocessing:

  • Data contributors upload licensed raw image files to BRIA AI’s Amazon Simple Storage Service (Amazon S3) bucket.
  • An image pre-processing pipeline using Amazon Simple Queue Service (Amazon SQS) and AWS Lambda functions generates missing image metadata and packages training data into large webdataset files for later efficient data streaming directly from an S3 bucket, and data sharding across GPUs. See the [Challenge 1] section. Webdataset is a PyTorch implementation therefore it fits well with Accelerate.

Model training:

  • SageMaker distributes training jobs for managing the training cluster and runs the training itself.
  • Streaming data from S3 to the training instances using SageMaker’s FastFile mode.

Pre-training challenges and solutions

Pre-training foundation models is a challenging task. Challenges include cost, performance, orchestration, monitoring, and the engineering expertise needed throughout the weeks-long training process.

The four challenges we faced were:

Challenge 1: Achieving out-of-the-box operational excellence for large model training

To orchestrate the training cluster and recover from failures, BRIA AI relies on SageMaker Training Jobs’ resiliency features. These include cluster health checks, built-in retries, and job resiliency. Before your job starts, SageMaker runs GPU health checks and verifies NVIDIA Collective Communications Library (NCCL) communication on GPU instances, replacing faulty instances (if necessary) to make sure your training script starts running on a healthy cluster of instances. You can also configure SageMaker to automatically retry training jobs that fail with a SageMaker internal server error (ISE). As part of retrying a job, SageMaker will replace instances that encountered unrecoverable GPU errors with fresh instances, reboot the healthy instances, and start the job again. This results in faster restarts and workload completion. By using AWS Deep Learning Containers, the BRIA AI workload benefited from the SageMaker SDK automatically setting the necessary environment variables to tune NVIDIA NCCL AWS Elastic Fabric Adapter (EFA) networking based on well-known best practices. This helps maximize the workload throughput.

To monitor the training cluster, BRIA AI used the built-in SageMaker integration to Amazon CloudWatch logs (applicative logs), and CloudWatch metrics (CPU, GPU, and networking metrics).

Challenge 2: Reducing time-to-train by using data parallelism

BRIA AI needed to train a stable-diffusion 2.0 model from scratch on petabytes-scale licensed image dataset. Training on a single GPU could take few month to complete. To meet deadline requirements, BRIA AI used data parallelism by using a SageMaker training with 16 p4de.24xlarge instances, reducing the total training time to under two weeks. Distributed data parallel training allows for much faster training of large models by splitting data across many devices that train in parallel, while syncing gradients regularly to keep a consistent shared model. It uses the combined computing power of many devices. BRIA AI used a cluster of four p4de.24xlarge instances (8xA100 80GB NVIDIA GPUs) to achieve a throughput of 1.8 it per second for an effective batch size of 2048 (batch=8, bf16, accumulate=2).

p4de.24xlarge instances include 600 GB per second peer-to-peer GPU communication with NVIDIA NVSwitch. 400 gigabits per second (Gbps) instance networking with support for EFA and NVIDIA GPUDirect RDMA (remote direct memory access).

Note: Currently you can use p5.48xlarge instances (8XH100 80GB GPUs) with 3200 Gbps networking between instances using EFA 2.0 (not used in this pre-training by BRIA AI).

Accelerate is a library that enables the same PyTorch code to be run across a distributed configuration with minimal code adjustments.

BRIA AI used Accelerate for small scale training off the cloud. When it was time to scale out training in the cloud, BRIA AI was able to continue using Accelerate, thanks to its built-in integration with SageMaker and Amazon SageMaker distributed data parallel library (SMDDP). SMDDP is purpose built to the AWS infrastructure, reducing communications overhead in two ways:

  • The library performs AllReduce, a key operation during distributed training that’s responsible for a large portion of communication overhead (optimal GPU usage with efficient AllReduce overlapping with a backward pass).
  • The library performs optimized node-to-node communication by fully utilizing the AWS network infrastructure and Amazon Elastic Compute Cloud (Amazon EC2) instance topology (optimal bandwidth use with balanced fusion buffer).

Note that SageMaker training supports many open source distributed training libraries, for example Fully Sharded Data Parallel (FSDP), and DeepSpeed. BRIA AI used FSDP in SageMaker in other training workloads. In this case, by using the ShardingStrategy.SHARD_GRAD_OP feature, BRIA AI was able to achieve an optimal batch size and accelerate their training process.

Challenge 3: Achieving efficient data loading

The BRIA AI dataset included hundreds of millions of images that needed to be delivered from storage onto GPUs for processing. Efficiently accessing this large amount of data across a training cluster presents several challenges:

  • The data might not fit into the storage of a single instance.
  • Downloading the multi-terabyte dataset to each training instance is time consuming while the GPUs sit idle.
  • Copying millions of small image files from Amazon S3 can become a bottleneck because of accumulated roundtrip time of fetching objects from S3.
  • The data needs to be split correctly between instances.

BRIA AI addressed these challenges by using SageMaker fast file input mode, which provided the following out-of-the-box features:

  • Streaming Instead of copying data when training starts, or using an additional distributed file system, we chose to stream data directly from Amazon S3 to the training instances using SageMaker fast file mode. This allows training to start immediately without waiting for downloads. Streaming also reduces the need to fit datasets into instance storage.
  • Data distribution: Fast file mode was configured to shard the dataset files between multiple instances using S3DataDistributionType=ShardedByS3Key.
  • Local file access: Fast file mode provides a local POSIX filesystem interface to data in Amazon S3. This allowed BRIA AI’s data loader to access remote data as if it was local.
  • Packaging files to large containers: Using millions of small image and metadata files is an overhead when streaming data from object storage like Amazon S3. To reduce this overhead, BRIA AI compacted multiple files into large TAR file containers (2–5 GB), which can be efficiently streamed from S3 using fast file mode to the instances. Specifically, BRIA AI used WebDataset for efficient local data loading and used a policy wherein there is no data loading synchronization between instances and each GPU loads random batches through a fixed seed. This policy helps eliminate bottlenecks and maintains fast and deterministic data loading performance.

For more on data loading considerations, see Choose the best data source for your Amazon SageMaker training job blog post.

Challenge 4: Paying only for net training time

Pre-training large language models is not continuous. The model training often requires intermittent stops for evaluation and adjustments. For instance, the model might stop converging and need adjustments, or you might want to pause training to test the model, refine data, or troubleshoot issues. These pauses result in extended periods where the GPU cluster is idle. With SageMaker training jobs, BRIA AI was able to only pay for the duration of their active training time. This allowed BRIA AI to train models at a lower cost and with greater efficiency.

BRIA AI training strategy is composed of three steps for resolution for optimal model convergence:

  1. Initial training on a 256×256 – 32 GPUs cluster
  2. Progressive refinement to a 512×512 – 64 GPUs cluster
  3. Final training on a 1024×1024 – 128 GPUs cluster

In each step, the computing required was different due to applied tradeoffs, such as the batch size per resolution and the upper limit of the GPU and gradient accumulation. The tradeoff is between cost-saving and model coverage.

BRIA AI’s cost calculations were facilitated by maintaining a consistent iteration per second rate, which allowed for accurate estimation of training time. This enabled precise determination of the required number of iterations and calculation of the training compute cost per hour.

BRIA AI training GPU utilization and average batch size time:

  • GPU utilization:  Average is over 98 percent, signifying maximization of GPUs for the whole training cycle and that our data loader is efficiently streaming data at a high rate.
  • Iterations per second :  Training strategy is composed of three steps—Initial training on 256×256, progressive refinement to 512×512, and final training on 1024×1024 resolution for optimal model convergence. For each step, the amount of computing varies because there are tradeoffs that we can apply with different batch sizes per resolution while considering the upper limit of the GPU and gradient accumulation, where the tension is cost-saving against model coverage.

Result examples

Result examples

Prompts used for generating the images
Prompt 1, upper left image: A stylish man sitting casually on outdoor steps, wearing a green hoodie, matching green pants, black shoes, and sunglasses. He is smiling and has neatly groomed hair and a short beard. A brown leather bag is placed beside him. The background features a brick wall and a window with white frames.

Prompt 2, upper right image: A vibrant Indian wedding ceremony. The smiling bride in a magenta saree with gold embroidery and henna-adorned hands sits adorned in traditional gold jewelry. The groom, sitting in front of her, in a golden sherwani and white dhoti, pours water into a ceremonial vessel. They are surrounded by flowers, candles, and leaves in a colorful, festive atmosphere filled with traditional objects.

Prompt 3, lower left image: A wooden tray filled with a variety of delicious pastries. The tray includes a croissant dusted with powdered sugar, a chocolate-filled croissant, a partially eaten croissant, a Danish pastry and a muffin next to a small jar of chocolate sauce, and a bowl of coffee beans, all arranged on a beige cloth.

Prompt 4, lower right image: A panda pouring milk into a white cup on a table with coffee beans, flowers, and a coffee press. The background features a black-and-white picture and a decorative wall piece.

Conclusion

In this post, we saw how Amazon SageMaker enabled BRIA AI to train a diffusion model efficiently, without needing to manually provision and configure infrastructure. By using SageMaker training, BRIA AI was able to reduce costs and accelerate iteration speed, reducing training time with distributed training while maintaining 98 percent GPU utilization, and maximize value per cost. By taking on the undifferentiated heavy lifting, SageMaker empowered BRIA AI’s team to be more productive and deliver innovations faster. The ease of use and automation offered by SageMaker training jobs makes it an attractive option for any team looking to efficiently train large, state-of-the-art models.

To learn more about how SageMaker can help you train large AI models efficiently and cost-effectively, explore the Amazon SageMaker page. You can also reach out to your AWS account team to discover how to unlock the full potential of your large-scale AI initiatives.


About the Authors

Bar FingermanBar Fingerman, Head Of Engineering AI/ML at BRIA AI.

Doron BleibergDoron Bleiberg, Senior Startup Solutions Architect.

Gili Nachum, Principal Gen AI/ML Specialist Solutions ArchitectGili Nachum, Principal Gen AI/ML Specialist Solutions Architect.

Erez ZarumErez Zarum, Startup Solutions Architect,

Read More

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

Amazon SageMaker Studio provides a comprehensive suite of fully managed integrated development environments (IDEs) for machine learning (ML), including JupyterLab, Code Editor (based on Code-OSS), and RStudio. It supports all stages of ML development—from data preparation to deployment, and allows you to launch a preconfigured JupyterLab IDE for efficient coding within seconds. Additionally, its flexible interface and artificial intelligence (AI) powered coding assistant simplifies and enhances the ML workflow configuration, debugging, and code testing.

Geospatial data such as satellite images, coordinate traces, or aerial maps that are enriched with characteristics or attributes of other business and environmental datasets is becoming increasingly available. This unlocks valuable use cases in fields such as environmental monitoring, urban planning, agriculture, disaster response, transportation, and public health.

To effectively utilize the wealth of information contained in such datasets for ML and analytics, access to the right tools for geospatial data handling is crucial. This is especially relevant given that geospatial data often comes in specialized file formats such as Cloud Optimized GeoTIFF (COG), Zarr files, GeoJSON, and GeoParquet that require dedicated software tools and libraries to work with.

To address these specific needs within SageMaker Studio, this post shows you how to extend Amazon SageMaker Distribution with additional dependencies to create a custom container image tailored for geospatial analysis. Although the example in this post focuses on geospatial data science, the methodology presented can be applied to any kind of custom image based on SageMaker Distribution.

SageMaker Distribution images are Docker images that come with preinstalled data science packages and are preconfigured with a JupyterLab IDE, which allows you to use these images in the SageMaker Studio UI as well as for non-interactive workflows like processing or training. This allows you to use the same runtime across SageMaker Studio notebooks and asynchronous jobs like processing or training, facilitating a seamless transition from local experimentation to batch execution while only having to maintain a single Docker image.

In this post, we provide step-by-step guidance on how you can build and use custom container images in SageMaker Studio. Specifically, we demonstrate how you can customize SageMaker Distribution for geospatial workflows by extending it with open-source geospatial Python libraries. We explain how to build and deploy the image on AWS using continuous integration and delivery (CI/CD) tools and how to make the deployed image accessible in SageMaker Studio. All code used in this post, including the Dockerfile and infrastructure as code (IaC) templates for quick deployment, is available as a GitHub repository.

Solution overview

You can building a custom container image and use it in SageMaker Studio with the following steps:

  1. Create a Dockerfile that includes the additional Python libraries and tools.
  2. Build a custom container image from the Dockerfile.
  3. Push the custom container image to a private repository on Amazon Elastic Container Registry (Amazon ECR).
  4. Attach the image to your Amazon SageMaker Studio domain.
  5. Access the image from your JupyterLab space.

The following diagram illustrates the solution architecture.
Solution overview

The solution uses AWS CodeBuild, a fully managed service that compiles source code and produces deployable software artifacts, to build a new container image from a Dockerfile. CodeBuild supports a broad selection of git version control sources like AWS CodeCommit, GitHub, and GitLab. For this post, we host our build files on Amazon Simple Storage Service (Amazon S3) and use it as the source provider for the CodeBuild project. You can extend this solution to work with alternative CI/CD tooling, including GitLab, Jenkins, Harness, or other tools.

CodeBuild retrieves the build files from Amazon S3, runs a Docker build, and pushes the resulting container image to a private ECR repository. Amazon ECR is a managed container registry that facilitates the storage, management, and deployment of container images.

The custom image is then attached to a SageMaker Studio domain and can be used by data scientists and data engineers as an IDE or as runtime for SageMaker processing or training jobs.

Prerequisites

This post covers the default approach for SageMaker Studio, which involves a managed network interface that allows internet communication. We also include steps to adapt this for use within a private virtual private cloud (VPC).

Before you get started, verify that you have the following prerequisites:

If you intend to follow this post and deploy the CodeBuild project and the ECR repository using IaC, you also need to install the AWS Cloud Development Kit (AWS CDK) on your local machine. For instructions, see Getting started with the AWS CDK. If you’re using a cloud-based IDE like AWS Cloud9, the AWS CDK will usually come preinstalled.

If you want to securely deploy your custom container using your private VPC, you also need the following:

To set up a SageMaker Studio domain with a private VPC, see Connect Studio notebooks in a VPC to external resources.

Extend SageMaker Distribution

By default, SageMaker Studio provides a selection of curated pre-built Docker images as part of SageMaker Distribution. These images include popular frameworks for ML, data science, and visualization, including deep learning frameworks like PyTorch, TensorFlow and Keras; popular Python packages like NumPy, scikit-learn, and pandas; and IDEs like JupyterLab and Code Editor. All installed libraries and packages are mutually compatible and are provided with their latest compatible versions. Each distribution version is available in two variants, CPU and GPU, and is hosted on the Amazon ECR Public Gallery. To be able to work with geospatial data in SageMaker Studio, you need to extend SageMaker Distribution by adding the required geospatial libraries like gdal, geospandas, leafmap, or rioxarray and make it accessible to users through SageMaker Studio.

Let’s first review how to extend SageMaker Distribution for geospatial analyses and ML. To do so, we largely follow the provided template for creating custom Docker files in SageMaker, with a few subtle but important differences specific to the geospatial libraries we want to install. The full Dockerfile is as follows:

# set distribution type (cpu or gpu)
ARG DISTRIBUTION_TYPE

# get SageMaker Distribution base image
# use fixed version for reproducibility, use "latest" for most recent version
FROM public.ecr.aws/sagemaker/sagemaker-distribution:1.8.0-$DISTRIBUTION_TYPE

#set SageMaker specific parameters and arguments
#see here for supported values: https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-image-specifications.html#studio-updated-jl-admin-guide-custom-images-user-and-filesystem
ARG NB_USER="sagemaker-user"
ARG NB_UID=1000
ARG NB_GID=100

ENV MAMBA_USER=$NB_USER

USER $ROOT

#set environment variables required for GDAL
ARG CPLUS_INCLUDE_PATH=/usr/include/gdal
ARG C_INCLUDE_PATH=/usr/include/gdal

#install GDAL and other required Linux packages
RUN apt-get --allow-releaseinfo-change update -y -qq 
   && apt-get update 
   && apt install -y software-properties-common 
   && add-apt-repository --yes ppa:ubuntugis/ppa 
   && apt-get update 
   && apt-get install -qq -y groff unzip libgdal-dev gdal-bin ffmpeg libsm6 libxext6 
   && apt-get install -y --reinstall build-essential 
   && apt-get clean 
   && rm -fr /var/lib/apt/lists/*

# use micromamaba package manager to install required geospatial python packages
USER $MAMBA_USER

RUN micromamba install gdal==3.6.4 --yes --channel conda-forge --name base 
   && micromamba install geopandas==0.13.2 rasterio==1.3.8 leafmap==0.31.3 rioxarray==0.15.1 --yes --channel conda-forge --name base 
   && micromamba clean -a

# set entrypoint and jupyter server args
ENTRYPOINT ["jupyter-lab"]
CMD ["--ServerApp.ip=0.0.0.0", "--ServerApp.port=8888", "--ServerApp.allow_origin=*", "--ServerApp.token=''", "--ServerApp.base_url=/jupyterlab/default"]

Let’s break down the key geospatial-specific modifications.

First, you install the Geospatial Data Abstraction Library (GDAL) on Linux. GDAL is an open source library that provides drivers for reading and writing raster and vector geospatial data formats. It provides the backbone for many open source and proprietary GIS applications, including the libraries used in the post. This is implemented as follows (see see Install GDAL for Python for more details for more details):

#install GDAL and other required Linux packages
RUN apt-get --allow-releaseinfo-change update -y -qq 
   && apt-get update 
   && apt install -y software-properties-common 
   && add-apt-repository --yes ppa:ubuntugis/ppa 
   && apt-get update 
   && apt-get install -qq -y groff unzip libgdal-dev gdal-bin ffmpeg libsm6 libxext6 
   && apt-get install -y --reinstall build-essential 
   && apt-get clean 
   && rm -fr /var/lib/apt/lists/*

You also need to set the following GDAL-specific environment variables:

ARG CPLUS_INCLUDE_PATH=/usr/include/gdal
ARG C_INCLUDE_PATH=/usr/include/gdal

With GDAL installed, you can now install the required geospatial Python libraries using the recommended micromamba package manager. This is implemented in the following code block:

# use micromamaba package manager to install required geospatial python packages
USER $MAMBA_USER

RUN micromamba install gdal==3.6.4 --yes --channel conda-forge --name base 
   && micromamba install geopandas==0.13.2 rasterio==1.3.8 leafmap==0.31.3 rioxarray==0.15.1 --yes --channel conda-forge --name base 
   && micromamba clean -a

The versions defined here have been tested with the underlying SageMaker Distribution. You can freely add additional libraries that you may need. Identifying the right version may require some level of experimentation.

Now that you have created your custom geospatial Dockerfile, you can build it and push the image to Amazon ECR.

Build a custom geospatial image

To build the Docker image, you need a build environment equipped with Docker and the AWS Command Line Interface (AWS CLI). This environment can be set up on your local machine, in a cloud-based IDE like AWS Cloud9, or as part of a continuous integration service like CodeBuild.

Before you build the Docker image, identify the ECR repository where you will push the image. Your image must be tagged in the following format: <your-aws-account-id>.dkr.ecr.<your-aws-region>.amazonaws.com/<your-repository-name>:<tag>. Without this tag, pushing it to an ECR repository is not possible. If you’re deploying the solution using the AWS CDK, an ECR repository is automatically created, and a CodeBuild project is configured to use this repository as the target for pushing the image. When you initiate the CodeBuild build, the image is built, tagged, and then pushed to the previously created ECR repository.

The following steps are applicable only if you choose to perform these actions manually.

To build the image manually, run the following command in the same directory as the Dockerfile:

docker build --build-arg DISTRIBUTION_TYPE=cpu -t ${ECR_ACCOUNT_ID}.dkr.ecr.${ECR_REGION}.amazonaws.com/${ECR_REPO_NAME}:latest-cpu .

After building your image, you must log in to the ECR repository with this command before pushing the image:

aws ecr get-login-password --region ${ECR_REGION} | docker login --username AWS --password-stdin ${ECR_ACCOUNT_ID}.dkr.ecr.${ECR_REGION}.amazonaws.com

Next, push your Docker image using the following command:

docker push ${ECR_ACCOUNT_ID}.dkr.ecr.${ECR_REGION}.amazonaws.com/${ECR_REPO_NAME}:latest-cpu

Your image has now been pushed to the ECR repository and you can proceed to attach it to SageMaker.

Attach the custom geospatial image to SageMaker Studio

After your custom image has been successfully pushed to Amazon ECR, you need to attach it to a SageMaker domain to be able to use it within SageMaker Studio.

  1. On the SageMaker console, choose Domains under Admin configurations in the navigation pane.

If you don’t have a SageMaker domain set up yet, you can create one.

  1. From the list of available domains, choose the domain to which you want to attach the geospatial image.
  2. On the Domain details page, choose the Environment tab
  3. In Custom images for personal Studio apps section, choose Attach image.

Studio Attach Image

  1. Choose New image and enter the ECR image URI from the build pipeline output. This should have the following format <your-aws-account-id>.dkr.ecr.<your-aws-region>.amazonaws.com/<your-repository-name>:<tag>
  2. Choose Next.
  3. For Image name, enter a custom image name (for this post, we use custom-geospatial-sm-dist).
  4. For Image display name, enter a custom display name (for this post, we use Geospatial SageMaker Distribution (CPU)).
  5. For Description, enter an image description.

Attach image 01

  1. Choose JupyterLab image as the application type and choose Submit.

Attach image 02

When returning to the Environment tab on the Domain details page, you should now see your image listed under Custom images for personal Studio apps.

Attach the custom geospatial image using the AWS CLI

You can also automate the process using the AWS CLI.

First, register the image in SageMaker and create an image version:

SAGEMAKER_IMAGE_NAME=sagemaker-dist-custom-geospatial # adapt with your image name
ECR_IMAGE_URL='<account_id>.dkr.ecr.<region>.amazonaws.com/<ecr-repo-name>:latest-cpu' # replace with your ECR repository url
ROLE_ARN='The ARN of an IAM role for the execution role you want to use' # replace with the desired execution role

aws sagemaker create-image 
    --image-name ${SAGEMAKER_IMAGE_NAME} 
    --role-arn ${ROLE_ARN}

aws sagemaker create-app-image-config 
    --app-image-config-name ${SAGEMAKER_IMAGE_NAME}-app-image-config 
    --jupyter-lab-app-image-config {}

aws sagemaker create-image-version 
    --image-name ${SAGEMAKER_IMAGE_NAME} 
    --base-image ${ECR_IMAGE_URL}

Next, create a file containing the following content. You can add multiple custom images by adding additional entries to the CustomImages list.

{
  "DefaultUserSettings": {
    "JupyterLabAppSettings": {
      "CustomImages": [
                {
                    "ImageName": "sagemaker-dist-custom-geospatial",
                    "ImageVersionNumber": 1,
                    "AppImageConfigName": "sagemaker-dist-custom-geospatial-app-image-config "
                }
            ]
        }
    }
}

The next step assumes that you named the file from the previous step default-user-settings.json. The following command attaches the SageMaker image to the specified Studio domain:

DOMAIN_ID=d-####### # replace with your SageMaker Studio domain id
aws sagemaker update-domain --domain-id ${DOMAIN_ID} --cli-input-json file://default-user-settings.json

Use the custom geospatial Image in the JupyterLab app

In the previous section, we demonstrated how to attach the image to a SageMaker domain. When you create a new (or modify an existing) JupyterLab space inside this domain, the newly created custom image will now be available. You can choose it on the Image dropdown menu, where it now appears alongside the default AWS curated SageMaker Distribution image versions under Custom.

To run a space using the custom geospatial image, choose Geospatial SageMaker Distribution (CPU) as your image, then choose Run space.

Studio Run Space

After the space has been provisioned and is in Running state, choose Open JupyterLab. This will bring up the JupyterLab IDE in a new browser tab. Select a notebook with Python3 (ipykernel) to start up a new Jupyter notebook running on top of the custom geospatial image.

Run interactive geospatial data analyses and large-scale processing jobs in SageMaker

After you build the custom geospatial image and attach it to your SageMaker domain, you can use it in one of two main ways:

  • You can use the image as the base to run a JupyterLab notebook kernel to perform in-notebook interactive development and geospatial analytics.
  • You can use the image in a SageMaker processing job to run highly parallelized geospatial processing pipelines. Reusing the interactive kernel image for asynchronous batch processing can be advantageous because only a single image will have to maintained and routines developed in an interactive manner using a notebook can be expected to work seamlessly in the processing job. If startup latency caused by longer image load times is a concern, you can choose to build a dedicated more lightweight image just for processing (see Build Your Own Processing Container for details).

For hands-on examples of both approaches, refer to the accompanying GitHub repository.

In-notebook interactive development using a custom image

After you choose the custom geospatial image as the base image for your JupyterLab space, SageMaker provides you with access to many geospatial libraries that can now be imported without the need for additional installs. For example, you can run the following code to initialize a geometry object and plot it on a map within the familiar environment of a notebook:

import shapely
import leafmap
import geopandas

coords = [[-102.00723310488662,40.596123257503024],[-102.00723310488662,40.58168585757733],[-101.9882214495914,40.58168585757733],[-101.9882214495914,40.596123257503024],[-102.00723310488662,40.596123257503024]]
polgyon = shapely.Polygon(coords)
gdf = geopandas.GeoDataFrame(index=[0], crs='epsg:4326', geometry=[polgyon])
Map = leafmap.Map(center=[40.596123257503024, -102.00723310488662], zoom=13)
Map.add_basemap("USGS NAIP Imagery")
Map.add_gdf(gdf, layer_name="test", style={"color": "yellow", "fillOpacity": 0.3, "clickable": True,})
Map

Geospatial notebook

Highly parallelized geospatial processing pipelines using a SageMaker processing job and a custom image

You can specify the custom image as the image to run a SageMaker processing job. This enables you to use specialist geospatial processing frameworks to run large-scale distributed data processing pipelines with just a few lines of code. The following code snippet initializes and then runs a SageMaker ScriptProcessor object that uses the custom geospatial image (specified using the geospatial_image_uri variable) to run a geospatial processing routine (specified in a processing script) on 20 ml.m5.2xlarge instances:

import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.processing import ScriptProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput

region = sagemaker.Session().boto_region_name
role = get_execution_role()

geospatial_image_uri = "<GEOSPATIAL-IMAGE-URI>" #<-- set to uri of the custom geospatial image

processor_geospatial_data_cube = ScriptProcessor(
    command=['python3'],
    image_uri=geospatial_image_uri,
    role=role,
    instance_count=20,
    instance_type='ml.m5.2xlarge',
    base_job_name='aoi-data-cube'
)

processor_geospatial_data_cube.run(
    code='scripts/generate_aoi_data_cube.py', #<-- processing script
    inputs=[
        ProcessingInput(
            source=f"s3://{bucket_name}/{bucket_prefix_aoi_meta}/",
            destination='/opt/ml/processing/input/aoi_meta/', #<-- meta data (incl. geography) of the area of observation
            s3_data_distribution_type="FullyReplicated" #<-- sharding strategy for distribution across nodes
        ),        
        ProcessingInput(
            source=f"s3://{bucket_name}/{bucket_prefix_sentinel2_meta}/",
            destination='/opt/ml/processing/input/sentinel2_meta/', #<-- Sentinel-2 scene metadata (1 file per scene)
            s3_data_distribution_type="ShardedByS3Key" #<-- sharding strategy for distribution across nodes
        ),
    ],
    outputs=[
        ProcessingOutput(
            source='/opt/ml/processing/output/',
            destination=f"s3://{bucket_name}/processing/geospatial-data-cube/{execution_id}/output/" #<-- output S3 path
        )
    ]
)

A typical processing routine involving raster file loading, clipping to an area of observation, resampling specific bands, and masking clouds among other steps across 134 110x110km Sentinel-2 scenes completes in under 15 minutes, as can be seen in the following Amazon CloudWatch dashboard.

CloudWatch Metrics

Clean up

After you’re done running the notebook, don’t forget to stop the SageMaker Studio JupyterLab application to avoid incurring unnecessary costs. If you deployed the additional infrastructure using the AWS CDK, you can delete the deployed stack by running the following command in your local code checkout:

cd <path to repository>
cd deployment && cdk destroy

Conclusion

This post has equipped you with the knowledge and tools to build and use custom container images tailored for geospatial analysis in SageMaker Studio. By extending SageMaker Distribution with specialized geospatial libraries, you can customize your environment for specialized use cases. This empowers you to unlock the vast potential of geospatial data for applications such as environmental monitoring, urban planning, and precision agriculture—all within the familiar and user-friendly environment of SageMaker Studio.

Although this post focused on geospatial workflows, the methodology presented is broadly applicable. You can utilize the same principles to tailor container images for any domain requiring specific libraries or tools beyond the scope of SageMaker Distribution. This empowers you to create a truly customized development experience within SageMaker Studio, catering to your unique project needs.

The provided resources, including sample code and IaC templates, offer a solid foundation for building your own custom images. Experiment and explore how this approach can streamline your ML workflows involving geospatial data or any other specialized domain. To get started, visit the accompanying GitHub repository.


About the Authors

Janosch Woschitz is a Senior Solutions Architect at AWS, specializing in AI/ML. With over 15 years of experience, he supports customers globally in leveraging AI and ML for innovative solutions and building ML platforms on AWS. His expertise spans machine learning, data engineering, and scalable distributed systems, augmented by a strong background in software engineering and industry expertise in domains such as autonomous driving.

Dr. Karsten Schroer is a Senior Machine Learning (ML) Prototyping Architect at AWS, focused on helping customers leverage artificial intelligence (AI), ML, and generative AI technologies. With deep ML expertise, he collaborates with companies across industries to design and implement data- and AI-driven solutions that generate business value. Karsten holds a PhD in applied ML.

Anirudh Viswanathan is a Senior Product Manager, Technical, at AWS with the SageMaker team, where he focuses on Machine Learning. He holds a Master’s in Robotics from Carnegie Mellon University and an MBA from the Wharton School of Business. Anirudh is a named inventor on more than 50 AI/ML patents. He enjoys long-distance running, exploring art galleries, and attending Broadway shows.

Read More

Automating model customization in Amazon Bedrock with AWS Step Functions workflow

Automating model customization in Amazon Bedrock with AWS Step Functions workflow

Large language models have become indispensable in generating intelligent and nuanced responses across a wide variety of business use cases. However, enterprises often have unique data and use cases that require customizing large language models beyond their out-of-the-box capabilities. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. To enable secure and scalable model customization, Amazon Web Services (AWS) announced support for customizing models in Amazon Bedrock at AWS re:Invent 2023. This allows customers to further pre-train selected models using their own proprietary data to tailor model responses to their business context. The quality of the custom model depends on multiple factors including the training data quality and hyperparameters used to customize the model. This requires customers to perform multiple iterations to develop the best customized model for their requirement.

To address this challenge, AWS announced native integration between Amazon Bedrock and AWS Step Functions. This empowers customers to orchestrate repeatable and automated workflows for customizing Amazon Bedrock models.

In this post, we will demonstrate how Step Functions can help overcome key pain points in model customization. You will learn how to configure a sample workflow that orchestrates model training, evaluation, and monitoring. Automating these complex tasks through a repeatable framework reduces development timelines and unlocks the full value of Amazon Bedrock for your unique needs.

Architecture

Architecture Diagram

We will use a summarization use case using Cohere Command Light Model in Amazon Bedrock for this demonstration. However, this workflow can be used for the summarization use case for other models by passing the base model ID and the required hyperparameters and making model-specific minor changes in the workflow. See the Amazon Bedrock user guide for the full list of supported models for customization. All the required infrastructure will be deployed using the AWS Serverless Application Model (SAM).

The following is a summary of the functionality of the architecture:

  • User uploads the training data in JSON Line into an Amazon Simple Storage Service (Amazon S3) training data bucket and the validation, reference inference data into the validation data bucket. This data must be in the JSON Line format.
  • The Step Function CustomizeBedrockModel state machine is started with the input parameters such as the model to customize, hyperparameters, training data locations, and other parameters discussed later in this post.
    • The workflow invokes the Amazon Bedrock CreateModelCustomizationJob API synchronously to fine tune the base model with the training data from the S3 bucket and the passed-in hyperparameters.
    • After the custom model is created, the workflow invokes the Amazon Bedrock CreateProvisionedModelThroughput API to create a provisioned throughput with no commitment.
    • The parent state machine calls the child state machine to evaluate the performance of the custom model with respect to the base model.
    • The child state machine invokes the base model and the customized model provisioned throughput with the same validation data from the S3 validation bucket and stores the inference results into the inference bucket.
    • An AWS Lambda function is called to evaluate the quality of the summarization done by custom model and the base model using the BERTScore metric. If the custom model performs worse than the base model, the provisioned throughput is deleted.
    • A notification email is sent with the outcome.

Prerequisites

  • Create an AWS account if you do not already have one.
  • Access to the AWS account through the AWS Management Console and the AWS Command Line Interface (AWS CLI). The AWS Identity and Access Management (IAM) user that you use must have permissions to make the necessary AWS service calls and manage AWS resources mentioned in this post. While providing permissions to the IAM user, follow the principle of least-privilege.
  • Git Installed.
  • AWS Serverless Application Model (AWS SAM) installed.
  • Docker must be installed and running.
  • You must enable the Cohere Command Light Model access in the Amazon Bedrock console in the AWS Region where you’re going to run the AWS SAM template. We will customize the model in this demonstration. However, the workflow can be extended with minor model-specific changes to support customization of other supported models. See the Amazon Bedrock user guide for the full list of supported models for customization. You must have no commitment model units reserved for the base model to run this demo.

Demo preparation

The resources in this demonstration will be provisioned in the US East (N. Virginia) AWS Region (us-east-1). We will walk through the following phases to implement our model customization workflow:

  1. Deploy the solution using the AWS SAM template
  2. Upload proprietary training data to the S3 bucket
  3. Run the Step Functions workflow and monitor
  4. View the outcome of training the base foundation model
  5. Clean up

Step 1: Deploy the solution using the AWS SAM template

Refer to the GitHub repository for latest instruction. Run the below steps to deploy the Step Functions workflow using the AWS SAM template. You can

  1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
git clone https://github.com/aws-samples/amazon-bedrock-model-customization.git
  1. Change directory to the solution directory:
cd amazon-bedrock-model-customization
  1. Run the build.sh to create the container image.
bash build.sh
  1. When prompted, enter the following parameter values:
image_name=model-evaluation
repo_name=bedrock-model-customization
aws_account={your-AWS-account-id}
aws_region={your-region}
  1. From the command line, use AWS SAM to deploy the AWS resources for the pattern as specified in the template.yml file:
sam deploy --guided
  1. Provide the below inputs when prompted:
Enter a stack name.
Enter us-east-1 or your AWS Region where you enabled Amazon Bedrock Cohere Command Light Model.
Enter SenderEmailId - Once the model customization is complete email will come from this email id. You need to have access to this mail id to verify the ownership.
Enter RecipientEmailId - User will be notified to this email id.
Enter ContainerImageURI - ContainerImageURI is available from the output of the `bash build.sh` step.
Keep default values for the remaining fields.
  1. Note the outputs from the SAM deployment process. These contain the resource names and/or ARNs which are used in the subsequent steps.

Step 2: Upload proprietary training data to the S3 bucket

Our proprietary training data will be uploaded to the dedicated S3 bucket created in the previous step, and used to fine-tune the Amazon Bedrock Cohere Command Light model. The training data needs to be in JSON Line format with every line containing a valid JSON with two attributes: prompt and completion.

I used this public dataset from HuggingFace and converted it to JSON Line format.

  1. Upload the provided training data files to the S3 bucket using the command that follows. Replace TrainingDataBucket with the value from the sam deploy --guided output. Update your-region with the Region that you provided while running the SAM template.
aws s3 cp training-data.jsonl s3://{TrainingDataBucket}/training-data.jsonl --region {your-region}
  1. Upload the validation-data.json file to the S3 bucket using the command that follows. Replace ValidationDataBucket with the value from the sam deploy --guided output. Update your-region with the Region that you provided while running the SAM template:
aws s3 cp validation-data.json s3://{ValidationDataBucket}/validation-data.json --region {your-region}
  1. Upload the reference-inference.json file to the S3 bucket using the command that follows. Replace ValidationDataBucket with the value from the sam deploy --guided output. Update your-region with the region that you provided while running the SAM template.
aws s3 cp reference-inference.json s3://{ValidationDataBucket}/reference-inference.json --region {your-region}
  1. You should have also received an email for verification of the sender email ID. Verify the email ID by following the instructions given in the email.

Email Address Verification Request

Step 3: Run the Step Functions workflow and monitor

We will now start the Step Functions state machine to fine tune the Cohere Command Light model in Amazon Bedrock based on the training data uploaded into the S3 bucket in the previous step. We will also pass the hyperparameters. Feel free to change them.

  1. Run the following AWS CLI command to start the Step Functions workflow. Replace StateMachineCustomizeBedrockModelArn and TrainingDataBucket with the values from the  sam deploy --guided output. Replace UniqueModelName and UniqueJobName with unique values. Change the values of the hyperparameters based on the selected model. Update your-region with the region that you provided while running the SAM template.
aws stepfunctions start-execution --state-machine-arn "{StateMachineCustomizeBedrockModelArn}" --input "{"BaseModelIdentifier": "cohere.command-light-text-v14:7:4k","CustomModelName": "{UniqueModelName}","JobName": "{UniqueJobName}", "HyperParameters": {"evalPercentage": "20.0", "epochCount": "1", "batchSize": "8", "earlyStoppingPatience": "6", "earlyStoppingThreshold": "0.01", "learningRate": "0.00001"},"TrainingDataFileName": "training-data.jsonl"}" --region {your-region}

Example output:

{
"executionArn": "arn:aws:states:{your-region}:123456789012:execution:{stack-name}-wcq9oavUCuDH:2827xxxx-xxxx-xxxx-xxxx-xxxx6e369948",
"startDate": "2024-01-28T08:00:26.030000+05:30"
}

The foundation model customization and evaluation might take 1 hour to 1.5 hours to complete! You will get a notification email after the customization is done.

  1. Run the following AWS CLI command or sign in to the AWS Step Functions console to check the Step Functions workflow status. Wait until the workflow completes successfully. Replace the executionArn from the previous step output and update your-region.
aws stepfunctions describe-execution --execution-arn {executionArn} --query status --region {your-region}

Step 4: View the outcome of training the base foundation model

After the Step Functions workflow completes successfully, you will receive an email with the outcome of the quality of the customized model. If the customized model isn’t performing better than the base model, the provisioned throughput will be deleted. The following is a sample email:

Model Customization Complete

If the quality of the inference response is not satisfactory, you will need to retrain the base model based on the updated training data or hyperparameters.

See the ModelInferenceBucket for the inferences generated from both the base foundation model and custom model.

Step 5: Cleaning up

Properly decommissioning provisioned AWS resources is an important best practice to optimize costs and enhance security posture after concluding proofs of concept and demonstrations. The following steps will remove the infrastructure components deployed earlier in this post:

  1. Delete the Amazon Bedrock provisioned throughput of the custom mode. Ensure that the correct ProvisionedModelArn is provided to avoid an accidental unwanted delete. Also update your-region.
aws bedrock delete-provisioned-model-throughput --provisioned-model-id {ProvisionedModelArn} --region {your-region}
  1. Delete the Amazon Bedrock custom model. Ensure that the correct CustomModelName is provided to avoid accidental unwanted delete. Also update your-region.
aws bedrock delete-custom-model --model-identifier {CustomModelName} --region {your-region}
  1. Delete the content in the S3 bucket using the following command. Ensure that the correct bucket name is provided to avoid accidental data loss:
aws s3 rm s3://{TrainingDataBucket} --recursive --region {your-region}
aws s3 rm s3://{CustomizationOutputBucket} --recursive --region {your-region}
aws s3 rm s3://{ValidationDataBucket} --recursive --region {your-region}
aws s3 rm s3://{ModelInferenceBucket} --recursive --region {your-region}
  1. To delete the resources deployed to your AWS account through AWS SAM, run the following command:
sam delete

Conclusion

This post outlined an end-to-end workflow for customizing an Amazon Bedrock model using AWS Step Functions as the orchestration engine. The automated workflow trains the foundation model on customized data and tunes hyperparameters. It then evaluates the performance of the customized model against the base foundation model to determine the efficacy of the training. Upon completion, the user is notified through email of the training results.

Customizing large language models requires specialized machine learning expertise and infrastructure. AWS services like Amazon Bedrock and Step Functions abstract these complexities so enterprises can focus on their unique data and use cases. By having an automated workflow for customization and evaluation, customers can customize models for their needs more quickly and with fewer the operational challenges.

Further study


About the Author

Biswanath Mukherjee is a Senior Solutions Architect at Amazon Web Services. He works with large strategic customers of AWS by providing them technical guidance to migrate and modernize their applications on AWS Cloud. With his extensive experience in cloud architecture and migration, he partners with customers to develop innovative solutions that leverage the scalability, reliability, and agility of AWS to meet their business needs. His expertise spans diverse industries and use cases, enabling customers to unlock the full potential of the AWS cloud.

Read More

‘Once Human,’ Twice the Thrills on GeForce NOW

‘Once Human,’ Twice the Thrills on GeForce NOW

Unlock new experiences every GFN Thursday. Whether post-apocalyptic survival adventures, narrative-driven games or vast, open worlds, GeForce NOW always has something fresh for members to explore.

This week, GeForce NOW brings the survival game Once Human from Starry Studio to the cloud, part of three new titles.

Survive the Stardust

Once Human on GeForce NOW
We’re all just made of stardust.

Step into a post-apocalyptic world where cosmic energy has transformed humanity in Once Human. As a Meta-Human, survive the contamination and use the powers of Stardust to navigate a new and bizarre open-world universe.

Experience elements of survival, crafting and combat while challenging players to gather resources, build shelters and fend off human and monstrous threats. Uncover the rich lore through interactions with various characters and artifacts scattered throughout the world.

Delve into the truth of Stardust — discover where it came from and what it wants. Play alone or grab a squad to fight, build and explore together. Level up with an Ultimate or Priority membership to stream across devices at higher resolutions and frame rates over free members. Gaming sessions are up to six hours for Priority members and eight hours for Ultimate members, plenty of time to unravel the cosmic mysteries of Once Human.

Happy New Games

Anger Foot on GeForce NOW
Taking names and kicking butt.

Unleash the world’s deadliest feet on a colorful cast of anthropomorphic enemies in Anger Foot from Devolver Digital. Clear out slums, sewers and skyscrapers, grab new weapons, unlock new sneakers and upgrade powers in absurd and wonderful ways. Kick and shoot to get to the exit — and leave behind a smoldering trail of shattered doors, broken bones and crumpled energy drinks.

Check out the list of new games this week:

  • Cricket 24 (New release on Xbox and available on PC Game Pass, July 9)
  • Once Human (New release on Steam, July 9)
  • Anger Foot (New release on Steam, July 11)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Collaborators: Sustainable electronics with Jake Smith and Aniruddh Vashisth

Collaborators: Sustainable electronics with Jake Smith and Aniruddh Vashisth

photos of Jake Smith and Aniruddh Vashisth for the Microsoft Research Collaborators podcast

Transforming research ideas into meaningful impact is no small feat. It often requires the knowledge and experience of individuals from across disciplines and institutions. Collaborators, a Microsoft Research Podcast series, explores the relationships—both expected and unexpected—behind the projects, products, and services being pursued and delivered by researchers at Microsoft and the diverse range of people they’re teaming up with.

Printed circuit boards (PCBs) are abundant—in the items we use daily and then in landfills when they’ve reached end of life. In this episode, Senior Researcher Jake Smith (opens in new tab) and Aniruddh Vashisth (opens in new tab), assistant professor of mechanical engineering at the University of Washington, join host Gretchen Huizinga to talk about the development of vitrimer-based PCBs, or vPCBs, that perform comparably to traditional circuit boards but have less environmental impact. Smith and Vashisth explore machine learning’s role in accelerating the discovery of more sustainable materials and what the more healable vitrimer polymer could mean not only for e-waste but more broadly for aerospace, the automotive industry, and beyond.

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

ANIRUDDH VASHISTH: From the computation point of view, we always thought that if somebody gave us, like, a hundred different chemistries, we can do a bunch of simulations; tell you, like, 10 of these actually work. What we’ve been able to do specifically for vitrimers is that we’re able to look at the problem from the other side, and we are able to say that if you tell me a particular application, this particular chemistry would work best for you. In essence, what we were thinking of is that if aliens abducted all the chemists from the world, can we actually come up with a framework? [LAUGHTER]

JAKE SMITH: If all of this work is successful, in 10 years, maybe our materials design process looks completely different, where we’ve gone from this kind of brute-force screening to an approach where you start with the properties that you care about—they’re defined by the application that you have in mind—and we use this, like, “need space” to define the material that we would like, and we can use machine learning, artificial intelligence, in order to get us to the structure that we need to make in order to actually achieve this design space.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Collaborators, a Microsoft Research Podcast showcasing the range of expertise that goes into transforming mind-blowing ideas into world-changing technologies. I’m Dr. Gretchen Huizinga.

[MUSIC FADES]


I’m thrilled to be in the booth today, IRL, with Dr. Jake Smith, a senior researcher at Microsoft Research and part of the Microsoft Climate Research Initiative, or MCRI. And with him is Dr. Aniruddh Vashisth. He’s an assistant professor of mechanical engineering at the University of Washington and director of the Vashisth Research Lab. Jake and Aniruddh are working on a project that uses machine learning to help scientists design sustainable polymers with a particularly exciting application in the field of the ubiquitous printed circuit board, or PCB. But before we get all sustainable, let’s meet our collaborators!

Jake, I’ll start with you. You’re a self-described “chemist with relatively broad interests across applications” and you’ve done some pretty cool things in your career. Tell us about those interests and where they’ve led you and how they’ve contributed to the work you’re doing now in MCRI, or the Microsoft Climate Research Initiative.

JAKE SMITH: Yes. Thank you very much for having me. So I started, like most chemists, poking things around in the lab and learning really fundamentally about how atoms interact with one another and how this affects what we do or what we see at our microscopic level. And so after I left grad school doing this super-basic research, I wanted to do something more applied, and so I did a couple of postdocs, first, looking at how we can more effectively modify proteins after we’ve synthesized them so they might have a property that we care about and then later doing similar work on small molecules in a more traditional drug-design sense. But after I finished that, I wound up here at Microsoft. We were very interested in one molecule in particular, one family of molecules, which is DNA, and we wanted to know, how do we make DNA at just gigantic scale so that we can take that DNA and we could store digital data in it? And because DNA has this nice property that it kind of lasts forever, …

HUIZINGA: Yeah.

SMITH: … at least on our, you know, human scale, it makes a very, you know, nice archival storage medium. So we worked on this project for a while, and at some point, we determined we can, kind of, watch it blossom and find the next challenge to go work on.

HUIZINGA: Interesting …

SMITH: And the challenge that we, you know, wound up at I’ll describe as the Microsoft Climate Research Initiative, the MCRI. We were a group of applied scientists from, like, natural scientist backgrounds within Microsoft, and we said, how can we make a difference for Microsoft? And the difference that we thought was Microsoft has climate goals.

HUIZINGA: Oh, yeah!

SMITH: Microsoft wants to be carbon negative, it wants to be water positive, and it wants to be zero waste. And in order to make this happen, we need novel materials, which really are a macroscopic view of, once again, atomic behavior. And we said, hey, we understand atomic behavior. We’re interested in this.

HUIZINGA: [LAUGHS] We can help! We’re from the government …

SMITH: Yeah, maybe this is something we could help on. Yeah. And so here we are. We wound up with Aniruddh, and we’ll go into that later, I’m sure.

HUIZINGA: Yeah, yeah. So just quickly back to the DNA thing. Was that another collaboration? I had Karin Strauss on the podcast a while ago, and she talked about that.

SMITH: Oh, absolutely. Yeah, this was with Karin, and we had great collaborators, also at the University of Washington in the Molecular Information Systems Lab, or MISL, who did a lot of work with us on the practicalities of working with DNA once it’s synthesized and how would you do things like retrieve information from a big pool of DNA.

HUIZINGA: Right. Right. They could … people could go back to that podcast because she does unpack that quite a bit. Well, Aniruddh, you describe yourself as a “trained mechanician who hangs out with chemists,” hence your friendship with Jake here, but for your day job, you’re a professor and you have your own lab that conducts interdisciplinary research at the intersection, as you say, of mechanics and material science. So what made you want to move to that neighborhood, and what goes on there?

ANIRUDDH VASHISTH: Yeah. Well, again, thank you so much for having me here. I’m super excited about this. Yeah, just a little bit of background about me. So I started off with my undergrad in civil and mechanics from IIT BHU, did a PhD in mechanics at Penn State, and moved to Texas …

HUIZINGA: Go back … go back to, what’s the first one?

VASHISTH: It’s Indian Institute of Technology, in India, so that’s …

HUIZINGA: IIT …

VASHISTH: … IIT. I did my undergrad there and then straight away came to the US to do my PhD in mechanics at Penn State and then ended up going to Texas, to Texas A&M University, and postdoc-ed in a chemical engineering lab, and that’s how I became, like, super familiar and fond of chemical engineers and chemists! [LAUGHTER] And we moved to Seattle, when I got the job at University of Washington in 2021, with my wife and my daughter. And what we do in our lab is we make and break things now! [LAUGHS] We try to see, like, you know, when we are making and breaking these things, we try to see them from an experimental and a simulation point of view and try to gain some understanding of the mechanics of these different types of materials. Especially, we are very interested in polymers. I always joke with my students and my class that go about one day without touching a polymer, and I’m always surprised by the smiles or the smirks that I get! But in general, like, we have been super, super excited and interested about sustainable polymers, making sustainable composites. Particularly, we are very excited and interested in vitrimer polymers. So let me just take, like, a step back. I’ll probably wear my professor hat straight away here.

HUIZINGA: Yeah. Let’s do! Let’s go. [LAUGHTER]

VASHISTH: And I’ll tell you, just, like, taking a step back, what are the different types of polymers. So in general, you can think of polymers as thermosets or thermoplastics. So to Jake’s point, let’s just go to the molecular scale there, and you can think of polymers as bunch of these pasta noodles which can slide over each other, right. Or these bunch of pasta noodles which are packed together. So thermoset, as the name suggests, it’s a set network. The pasta noodles are kind of, like, set in their place. Thermoplastics is when these pasta noodles can slide over each other. So you’ve probably put too much sauce in there! [LAUGHTER] Yeah, so a good analogy there would be a lot of the adhesives that we use are thermosets because they set after a while. Thermoplastic … we use plastics for 3D printing a lot, so those are thermoplastics. So they’re solid. You can heat them up, you can make them flow, print something, and they solidify. Vitrimers are very exciting because, just like thermoplastics, they have this flowability associated to them but more at a molecular scale. Like, if you think of a single pasta noodle, it can unclick and re-click back again. So it’s like, you know, it’s made up of these small LEGO blocks that can unclick and re-click back again …

HUIZINGA: LEGO pasta …

VASHISTH: LEGO pasta …

HUIZINGA: I like that! [LAUGHS]

VASHISTH: Exactly. So this unclicking and re-clicking can make them re-processable, reusable, recyclable. Gives them, like, much longer life because you can heal them. And then vitrimers basically become the vampires of the polymer universe!

HUIZINGA: Meaning they don’t die?

VASHISTH: Well …

HUIZINGA: Or …

VASHISTH: They have like much longer life! [LAUGHTER]

SMITH: They sleep every now and then to regenerate! Yes … [LAUGHS]

HUIZINGA: Aniruddh, sticking with you for a minute, before we get into the collaboration, let’s do a quick level set on what we might call “The Secret Life of Circuit Boards.” For this, I’d like you to channel David Attenborough and narrate this PCB documentary. Where do we find printed circuit boards in their natural habitat? How many species are there? What do they do during the day? How long do they live? And what happens when they die?

VASHISTH: OK, so do I have to speak like David … ?

HUIZINGA: Yes, I’d appreciate it if you’d try. [LAUGHTER] … No. Just be your voice.

VASHISTH: Yeah. Yeah. So PCBs are, if you think about it, they are everywhere. PCBs are in these laptops that we have in front of us. Probably there are PCBs in these mics. Automobiles. Medical devices. So PCBs are, they’re just, like, everywhere. And depending upon, like, what is their end applications, they have a composite part of it, where you have, like, some sort of a stiff inclusion in a polymeric matrix, which is holding this part together and has bunch of electronics on top of it. And depending on the end application, it might come in different flavors: something that can sustain much higher temperatures; something which is flexible. Things of that sort. And they live as long as we use the material for, like, you know, as long as we are using these laptops or as long as we end up using our cars. And unfortunately, there is a lot of e-waste which is created at the end.

HUIZINGA: Right …

VASHISTH: There’s been a lot of effort in recycling and reusing these materials, but I’m confident we can do more.

HUIZINGA: Right.

VASHISTH: I think there’s like close to 50 million metric tons of …

HUIZINGA: Wow!

VASHISTH: … of e-waste which is generated—more than that actually—every year, so …

HUIZINGA: OK.

VASHISTH: … a lot of scope for us to work there.

HUIZINGA: Um, so right now, are they sort of uniform? The printed circuit board? I know we’re going to talk about vitrimer-based ones, but I mean, other than that, are there already multiple materials used for these PCBs? Jake, you can even address that.

SMITH: Yeah. Of course. So there are, like, kind of, graded ranks of circuit board materials …

HUIZINGA: OK.

SMITH: … that as Aniruddh said, you know, might be for specialty applications where you need higher-temperature tolerance than normal or you need lower noise out of your circuit board.

HUIZINGA: Gotcha.

SMITH: But, kind of, the bog-standard circuit board, the green one that you think about if you’ve ever seen a circuit board, this is like anti-flammability coating on a material called FR-4. So FR-4—which is an industrial name for a class of polymers that are flame-retardant, thus FR, and 4 gives you the general class—this is the circuit board material …

HUIZINGA: OK …

SMITH: … that, you know, we really targeted with this effort.

HUIZINGA: Interesting. So, Jake, let’s zoom out for a minute and talk about the big picture and why this is interesting to Microsoft Research. I keep hearing two phrases: sustainable electronics and a circular economy. So talk about how the one feeds into the other and what an ultimate success story would look like here.

SMITH: Yeah, absolutely. So I’ll start with the latter. When we set out to start the Microsoft Climate Research Initiative, we started with this vision of a circular economy that would do things that avoid what we, you know, can avoid using. But there are many cases where you can’t avoid using something that is nonrenewable. And there, what we really want to do is we want to recapture what we can’t avoid. And this project, you know, falls in the latter. There’s a lot of things that fall in the latter case. So, you know, we were looking at this at a very carbon dioxide-centric viewpoint where CO2 is ultimately the thing that we’re thinking about in the circle, although you can draw a circular economy diagram with a lot of things in the circle. But from the CO2 viewpoint, you know, what led us to this project with Aniruddh is we thought, we need to capture CO2, but once you capture CO2, you know, what do you do with it? [LAUGHTER] You can pump some of it back into the ground, but this is, you know, an economically non-productive activity. And so it’s something we have to do. It’s not something we want to do.

HUIZINGA: Right.

SMITH: And so what could we want to do with the CO2 that we’ve captured? And the thought was we do something economically viable with it. We, you know, upcycle the CO2 into something interesting, and what we really want, and what we still really want, is to be able to take that CO2, convert it down into a useful chemical feedstock—and there are great laboratories …

HUIZINGA: Oh, interesting …

SMITH: … doing work on this—and then we could, you know, look at our plastic design problem and say, hey, we have all this FR-4 in the world. How could we replace the FR-4—the, you know, explicit atoms that are in the FR-4—with atoms that have come from CO2 that we pulled out of the air? And so this is, you know, the circular economy portion. We come down to, you know, the specific problem here. Aniruddh talked a lot about e-waste.

HUIZINGA: Yeah.

SMITH: And I have great colleagues who also collaborated with us on this project—Bichlien Nguyen, Kali Frost—who have been doing work with our product teams here at Microsoft on, you know, what can we do to reduce the amount of e-waste that they put out towards Microsoft’s climate goals?

HUIZINGA: Right.

SMITH: And Microsoft, as a producer of consumer electronics and a consumer of, you know, industrial electronics, has a big e-waste problem itself that we need to, you know, actually take research steps in order to ultimately address, and so what we thought was, you know, we have this end-of-life electronic. We can do things like desolder the components. We can recapture those ICs, which have a lot of embedded carbon in them in the silicon that’s actually there. We can take and we can etch out the copper that has been put over this to form the traces, and we can precipitate out that electrochemically to recapture the copper, but at the end of the day, we’re left with this big chunk of plastic, and it’s got some glass inside of it, too, for completeness sake, and the thought was, you know, how do we do this? You can’t recapture this with FR-4. FR-4, to go back to the spaghetti thing, …

HUIZINGA: Right … [LAUGHS]

SMITH: … spaghetti is glued to itself. It doesn’t come apart. It rips apart if you try and take it apart. And so we wanted to say, you know, what could we do and, you know, what could we do with Aniruddh and his lab in order to get at this problem and to get us at a FR-4 replacement that we could actually reach this complete circularity with.

HUIZINGA: Interesting! Well, Jake, that is an absolutely perfect segue into “how I met your mother,” which is, you know, how you all started working together. Who thought of who first, and so on. I’m always interested to hear both sides of the meet-up. So, Aniruddh, why don’t you take the baton from Jake right there and talk about, from your perspective, how you saw this coming together, who approached who, what happened—and then Jake can confirm or deny the story! [LAUGHTER]

VASHISTH: Yeah, yeah. So it actually started off, I have a fantastic colleague and a very good friend in CS department, Professor Vikram Iyer, and he actually introduced me to Bichlien Nguyen from Microsoft, and we got a coffee together and we were talking about vitrimers, like the work that we do in our lab, and I had this one schematic—I forget if it was on my phone or I was carrying around one paper in my pocket—and I showed them. I was like, you know, if we can actually do a bunch of simulations, guide an ML model, we can create, for lack of a better word, like a ChatGPT-type of model where instead of telling like, “This is the chemistry; tell me what the properties are,” we can go from the other side. You can ask the model, “Hey, I want a vitrimer chemistry which is recyclable, re-processable, that I can make airplanes out of or I can make glasses out of. Tell me what that chemistry would look like.” And I think, you know, Bichlien was excited about this idea, and she connected me with Jake, and I think I’ve been enjoying this collaboration for the last couple of years, …

HUIZINGA: Right …

VASHISTH: … working on that.

HUIZINGA: Was there a paper that started the talk, or was it just this napkin drawing? [LAUGHS]

VASHISTH: I think, to give myself a little bit of credit there, I think there was a paper with a nice drawing on it.

HUIZINGA: Right?

VASHISTH: Yeah. There was a white paper. Yeah.

HUIZINGA: That’s good. Well, Jake, what’s your side of this story?

SMITH: Ah, this is awesome! We got the first half that I didn’t know, so …

HUIZINGA: Oh—filling in gaps!

SMITH: This was the Bichlien-mediated half! [LAUGHTER] I was sharing an office with Bichlien, who apparently came up from this meeting, and, you know, I saw the mythical paper! She put this on my desk. And I’ll plug another MCRI project that we were working on there where—or at the time—where we were attempting to do reverse design, or inverse design, of metal organic frameworks, which are these really interesting molecules that have the possibility to actually serve as carbon capture absorbents, …

HUIZINGA: Oh, wow.

SMITH: … but the approach there was to use machine learning to help us, you know, sample this giant space of metal organic frameworks and find ones that had the property that we cared about. I mean, you draw this diagram that’s much like Aniruddh just described, where you’ve got this model that you train and out the other side comes what you want, and so this paper came down on my desk, and I looked at it and I said, “Hey, that’s what we’re doing!” [LAUGHTER] And it, kind of, you know, went from there. We had a chat. We determined, hey, we’re both interested in, you know, this general approach to getting to novel materials.

HUIZINGA: Right.

SMITH: And then, you know, we’ve already talked about the synergy between our interests and Microsoft’s interests and the, you know, great work or the great particular applications that are possible with the type of polymer work that Aniruddh does.

HUIZINGA: Yeah. So the University of Washington and Microsoft meet again. [LAUGHTER] Well, Jake, let’s do another zoom out question because I know there’s more than just the Microsoft Climate Research Initiative. This project is a perfect example of another broader initiative within Microsoft which has the potential to quote “accelerate and enhance current research,” and that’s AI for Science. So talk about the vision behind AI for Science, and then if you have any success stories—maybe including this one—tell us how it’s working out.

SMITH: Yeah, absolutely. We are—and by we, I mean myself and my immediate colleagues—are certainly not the only ones interested in applying AI to scientific discovery at Microsoft. And it turned out, a year or two after we started this collaboration, a bigger organization named AI for Science arose, and we became part of it. And it’s, you know, generally a group of people who—along with our kind of sister organization in research called Health Futures, who work more on the biology side—are interested in how AI can help us do science in (a) a faster way, but (b) maybe a smarter, better-use-of-resources way, or the ultimate goal, or the ultimate dream, is (c) a way that we just can’t think of doing right now. A way that, you know, it just is fundamentally incompatible with the way that research has historically been done in, you know, small groups of grad students directed by a professor who are themselves, you know, the actual engine behind the work that happens. And so the AI for Science vision, you know, it’s got a couple of parts that really map very well onto this project. The first part is we want to be able to simulate bigger systems. We want to be able to run simulations for longer, and we want to be able to do simulations at higher accuracy. When we get into the details of, you know, the particulars of the vitrimer project, you’ll see that one of the fundamental blocks here is the ability to run simulations, and Aniruddh’s excellent grad student Yiwen, you know, spent a ton of time trying to identify the appropriate simulation parameters in order to capture the behavior that we care about here. And so, the first AI for Science vision says we don’t need Yiwen to do that, you know, we’re going to have a drop-in solution or we’re going to have, you know, a set of drop-in solutions that can, you know, take this work away from you and make it much easier for you to go straight to running the simulations that you care about.

HUIZINGA: Yeah. A couple questions. Not on the list here, but you prompted them. No pun intended. Are these specialized models with the kinds of information … I mean, if I go to ChatGPT and ask it to do what you guys are doing, I’m not going to get the same return am I?

SMITH: Absolutely.

HUIZINGA: Am I?

SMITH: Oh, no, no, no, no! [LAUGHTER] I was saying you were absolutely correct. [LAUGHS] You can ask ChatGPT, and it will tell you all sorts of things that are very interesting. It can tell you, probably, a vitrimer. It could give you Aniruddh’s spiel about the spaghetti, I’m sure, if you prompted it in the correct way. But what it can’t tell you is, you know, “Hey, I have this particular vitrimer composition, and I would like to know at what temperature it’s going to melt when I heat it up.”

HUIZINGA: Right. OK, so I have one more question. You talk about the simulations. Those take a lot of compute. Am I right? Am I right?

SMITH: You’re absolutely right.

VASHISTH: Yeah.

HUIZINGA: So is that something that Microsoft brings to the party in terms of … I mean, does the University of Washington have the same access to that compute, or what’s the deal?

VASHISTH: I think especially on the scale, we were super happy and excited that we were collaborating with Microsoft. I think one of these simulations took, like, close to a couple of weeks, and we ended up doing, I would say, like, close to more than 30,000 simulations. So that’s a lot of compute time if you think about it.

HUIZINGA: To put that in perspective, how long would it take a human to do those simulations? [LAUGHS]

SMITH: [LAUGHS] Oh, man, to try and actually, like, go do all this in the lab …

HUIZINGA: Right!

SMITH: First, you got to make these 30,000, like, starting materials. This in itself … let’s say you could buy those. Then to actually run the experiments, how long does it take to do one …

HUIZINGA: And how much money?

VASHISTH: That’s … that’s like you’re talking about like one PhD student there.

HUIZINGA: Right?

VASHISTH: That’s like, you know, it takes like a couple of years just to synthesize something properly and then characterize it, and it’s …

HUIZINGA: Yeah …

VASHISTH: Yeah, no, I think the virtual world does have some pluses to it.

HUIZINGA: So this is a really good argument for AI for Science, meaning the things that it can do, artificial intelligence can do, at a scale that’s much smaller than what it would take a human to do.

SMITH: Yeah, absolutely. And I’ll plug the other big benefit now, which is, hey, we can run simulations. This is fantastic. But the other thing that I think all of us really hope AI can do is it can help us determine which simulations to run …

HUIZINGA: Ooh …

SMITH: … so we need less compute overall, we need less experiments if we have to go do the experiments, and this is …

HUIZINGA: So it’s the winnowing process.

SMITH: Exactly.

HUIZINGA: OK. That’s actually really interesting.

SMITH: And this is, like, the second, or maybe even the largest, vector for acceleration that we could see.

HUIZINGA: Cool. Well, every show I ask, what could possibly go wrong if you got everything right? And, Aniruddh, I want to call this the “Defense Against the Dark Arts” question for you. You’re using generative AI to propose what you call novel chemistries, which can sound really cool or really scary, depending on how you look at it. But you can’t just take advice from a chatbot and apply it directly to aerospace. You have to kind of go through some processes before. So what role do people, particularly experts in other disciplines, play here, and what other things do you need to be mindful of to ensure the outputs you get from this research are valid?

VASHISTH: Yeah, yeah. That’s a fantastic question. And I’ll actually piggyback on what Jake just said here, about Yiwen Zheng, who’s like a fantastic graduate student that we have in our lab. He figured out how to run these simulations at the first point. It was like six months of … like, really long ordeal. How to make sure that in the virtual world, we are synthesizing these polymers correctly and we are testing them correctly. So that human touch is essential, I feel like, at every step of this research, not just like doing virtual characterization or virtual synthesis of these materials, training the models, but eventually, when you train the models also and the model tells you that, well, these are, like, the 10 best polymers that would work out, there you need people like Jake who are like chemists, you know. They come in [LAUGHTER] and they’re like, hey, you know what? Like, out of these 10 chemistries, this one you can actually synthesize. It’s a one-step reaction or things of that sort. So we have a chemist in our lab also, Dr. Agni Biswal, who’s a postdoc. So we actually show him all these chemistries, apart from Jake and Bichlien. We show the chemistries to all the chemists and say, like, OK, what do you think about this? How do these look like? Are they totally insane, or can we actually make them? [LAUGHTER]

SMITH: Yeah, we still need that, like, human evaluation step at the end, at this point.

HUIZINGA: Yeah … VASHISTH: Exactly.

HUIZINGA: Ask a chemist! Well, and I would imagine it would be further than just, “This would be the best one,” or something like, “You better not do that one.” Are there ever like crazy responses or replies from the model?

SMITH: [LAUGHS] It’s fascinating. Models are very good—and particularly we’ll talk about models that generate small organic structures—at generating things that look reasonable. They follow all the rules. But there’s this next step beyond that. And you see this when you talk to people who’ve worked in med chem for, you know, 30 years of their life. Well, they’ll look at a structure and they’ll, like, get this gut feeling like, you know, a storm is coming in and their knee hurts, and they really don’t like that molecule. [LAUGHTER] And if you push them a little bit, you know, sometimes they can figure out why. They’ll be like, oh, I worked on, you know, a molecule that looked like that 20 years ago, and it, you know, turned out to have this toxicity, and so I don’t want to touch that again. But oftentimes, people can’t even tell you. They’ve just got this instinct …

HUIZINGA: Really?

SMITH: … that they’ve built up, and trying to, you know, capture that intuition is a really interesting next frontier for this sort of research.

HUIZINGA: Wow. You know, you guys are just making my brain fry because it’s like so many other questions I want to ask, but we’re actually getting there to some of them, and I’m hoping we’ll address those questions with the other things I have. So, Jake, I want to come … Well, first of all, Aniruddh, have you finished your defense against the dark arts? [LAUGHS]

VASHISTH: I think I can point out one more thing very quickly there, and as Jake said, like, we are learning a lot, particularly about these materials, like, the vitrimer materials. These are new chemistries, and we are still learning about, like, the mechanical, thermorheological properties; how to handle these materials. So I think there’s a lot that we don’t know right now. So it’s like a bunch of, like, unknown unknowns that are there. So …

HUIZINGA: Well, and that’s research, right? The unknown unknowns. Jake, I want to come back to the vision of the climate research initiative for a minute. One goal is to develop technologies that reduce the raw tonnage of e-waste, obviously. But if we’re honest, advances in technology have almost encouraged us to throw stuff away. It’s like before it even wears out. And I think we talked earlier about, you know, this will last as long as my car lasts or whatever, but I don’t like my car in five years. I want a different one, right? So I wonder if you’ve given any thought to what things, in addition to the work on reusable and recyclable components, we might do to reverse engineer the larger throwaway culture?

SMITH: This was interesting. I feel like this gets into real questions about social psychology and our own behaviors …

HUIZINGA: Yeah …

SMITH: … with individual things. Why do I have this can of carbonated water here when I could have a glass of carbonated water? But I want to, kind of, completely sidestep that because …

HUIZINGA: Yeah … Well, we know why! Because it’s convenient, and you can take it in your car and not spill.

SMITH: Agreed. Yes. All right. [LAUGHTER] I also have this cup, and it could not spill, as well.

HUIZINGA: True! Recyclable—reusable.

SMITH: Ahhh … no, no … this is like a—it’s an ingrained consumer behavior that I’ve developed that might … I’ll slip into “Jake’s Personal Perspectives” here, which is that it should not be on the individual consumer behavior changes to ultimately drive a shift towards reusable and recyclable things. And so one of the fundamental, like, hypotheses that we had with the, you know, design of the projects we put together with the MCRI was that if we put appropriate economic incentives in place, then we can naturally guide behavior at a much bigger scale than the individual consumer. And maybe we’ll see that trickle down to the consumer. Or maybe this means that the actual actors, the large-scale actors, then have the economic incentive to follow it themselves.

HUIZINGA: Right.

SMITH: And so with the e-waste question in particular, we talked a lot about FR-4 and, you know, it’s the part of the circuit board that you’re left over with at the end that there’s just nothing to do with …

HUIZINGA: Right.

SMITH: … and so you toss into landfill, you burn it, you do something like this. But, you know, with a project like this, where our goal was to take that material and now make it reusable, we can add this actual economic value to the waste there.

HUIZINGA: Yeah. I realized even as I asked that question, that I had the answer embedded in the question because, in part, how we design technologies drives how people use things.

SMITH: Yeah, absolutely. VASHISTH: Yeah.

HUIZINGA: And usually, the drivers are convenience and economics. So if upstream of consumer … consumption? [LAUGHTER] Upstream of that, the design drives environmental health and so on, that’s actually … that’s up to you guys! So let’s get out of this booth and get back to work! [LAUGHTER] Well, Jake, to that point, talk about the economics. We talk about a circular economy. And I know that recycling is expensive. Can you talk a little bit about how that could be impacted by work that you guys do?

SMITH: Recycling absolutely is expensive relative to landfilling or a similar alternative.

HUIZINGA: Right …

SMITH: One of the things that makes us target e-waste is that there are things of value in e-waste that are, like, innately valuable. When you go recollect that copper or the gold that you’ve put into this, when you recollect the integrated circuits, you know, they had value, and so a lot of the economic drive is already there to get you to the point where you have these circuit boards. And then, you know, the question was, how do we get that next bit of economic value so that you’ve taken steps this far, you have this pile of circuit boards, so you’ve already been incentivized to get to here and it will be easy to make this—even if it’s not a completely economically productive material—versus synthesizing a circuit board from virgin plastic, but it’s offset enough. We’ve taken enough of that penalty for reuse out that it can be justifiable to go do.

HUIZINGA: Right. OK. So talk—again, off script a little bit—but talk a little bit about how vitrimers help take it to the last mile.

VASHISTH: Yeah, I think the inherent property of the polymer to kind of unclick and re-click back again, the heal-ability of the polymer, that’s something that, kind of, drives this reusability and re-processability of the material. I’ll just, like, point out, like, you know, particularly to the PCB case, where we recently published a collaborative paper where we showed that we can actually make PCB boards using vitrimers. We can unassemble everything. We can take out the electronics, and even the composite, the glass fiber and the polymer composite, we can actually separate that, as well, which is, in my mind, like, a pretty big success.

HUIZINGA: Yeah.

VASHISTH: And then we can actually put everything back together and remake a PCB board, and, you know, keep on doing that. So …

HUIZINGA: OK, so you had talked to me before about “Ring Around the Rosie” and the hands and the feet. Can you … ?

SMITH: [LAUGHS] His favorite analogy!

HUIZINGA: Do that one just for our audience because it’s good.

VASHISTH: OK. So I’ll talk a little bit about thermoset/thermoplastic again, and then I’ll just give you a much broader perspective there.

HUIZINGA: Yeah.

VASHISTH: So the FR-4 PCBs that are made, they are usually made with thermosetting polymers. So if you think about thermosetting polymers, just think of kids playing “Ring of Roses,” right? Like their hands are fixed and their feet are fixed. Once the network is formed, there’s no way you can actually destroy that network. The nice thing about vitrimers is that when you provide an external stimulus, like, just think about these kids playing “Ring of Roses” again. Their feet can move and their handshakes can change, but the number of handshakes remain the same. So the polymer is kind of, like, unclicking and re-clicking back again.

HUIZINGA: OK.

VASHISTH: And if you can cleverly use this mechanism, you can actually recycle, reprocess the polymer itself. But what we showed, particularly for the PCB paper, was that you can actually separate all the other constituents that are associated with this composite, yeah.

HUIZINGA: OK. That’s … I love that. Well, sticking with you for a second, Aniruddh, talking about mechanical reality—not just chemical reality, but mechanical reality—even the best composites wear out, from wear and tear. Talk about the goal of this work on novel polymers from an engineering perspective. How do you think about designing for reality in this way?

VASHISTH: Yeah, yeah. That’s a fantastic question. So we were really motivated by what type of mechanical or thermal loadings materials see in day-to-day life. You know, I sit in my car, I drive it, it drives over the road, there is some fatigue loadings, there’s dynamic loading, and that dynamic loading actually leads to some mechanical flaws in the material, which damages it. And the thought was always that, can we restrict that flaw, or can we go a step further? Can we actually reverse that damage in these composites? And that’s where, you know, that unclicking/re-clicking behavior of vitrimer becomes, like, really powerful. So actually, the first work that we did on these type of materials was that we took a vitrimer composite and we applied fatigue loading on it, cyclic loading on it, mechanical loading. And then we saw that when there was enough damage accumulated in the system, we healed the system. And then we did this again. And we were able to do it again and again until I was like, I’ve spent too much money on this test frame! [LAUGHS] But it was really exciting because for a particular loading case that we were looking at, traditional composites were able to sustain that for 10,000 cycles, but for vitrimers, if we did periodic healing in the material, we were able to go up to a million cycles. So I think that’s really powerful.

HUIZINGA: Orders of magnitude.

VASHISTH: Yeah, exactly.

HUIZINGA: Wow. Jake, I want to broaden the conversation right now, beyond just you and Aniruddh, and talk about the larger teams you need to assemble to ensure success of projects like this. Do you have any stories you could share about how you go about building a team? You kind of alluded to it at the beginning. There’s sort of a pickup basketball metaphor there. Hey, he’s doing that. We’re doing this. But you have some intentionality about people you bring in. So what strengths do each institution bring, and how do you build a team?

SMITH: Yeah, absolutely. We’ve tried a bunch of these collaborations, and we’ve definitely got some learnings about which ones work better than others. This has been a super productive one. I think it’s because it has that right mix of skills and the right mix of things that each side are bringing. So what we want from a Microsoft side for a successful collaboration is we want a collaborator who is really a domain expert in, you know, something that we don’t necessarily understand but who can tell us, in great detail, these are the actual design criteria; these are, you know, where I run into trouble with my traditional research; this is the area that, you know, I’d like to do faster, but I don’t necessarily know how. And this was the critical part, I think, you know, from the get-go. They need to, themselves, be an extremely, you know, capable subject matter expert. Otherwise, we’re just kind of chatting. We don’t have anyone that really knows what the problem truly is and you make no progress or you … worse, you spend a whole lot of resources to make “progress”—I’m doing air quotes …

HUIZINGA: Yeah. I love air quotes on a podcast!

SMITH: [LAUGHS]—that is actually just completely tangential to what the field needs or what the actual device needs. So this was, you know, the fundamental ingredient. And then on top of that, we need to find a problem that’s of joint interest where, in particular, …

HUIZINGA: Right …

SMITH: … computation can help. You talked about the amount of computation that we have at our disposal as researchers at Microsoft, which is a tremendous strength. And so we want to be able to leverage that. And so for a collaboration like this, where running a large number of simulations was a fundamental ingredient to doing it, this was, you know, a really good fit, that we could come in and we could enable them to have more data to train the models that we build together.

HUIZINGA: Mm-hm. Well, as researchers, are you each kind of always scanning the horizon for who else is doing things in your field that—or tangential to your field but necessary? How does that work for recruiting, I would say?

VASHISTH: Yeah, that’s a good question. I think … I mean, that’s kind of like the job, right. For the machine learning work we did, we saw a lot of inspiration from biology, where people have been designing biomolecules. The challenges are different for us. Like, we are designing much larger chains. But we saw some inspiration from there. So always, like, looking out for, like, who is doing what is super helpful, and it leads to, like, really nice collaborations, as well. We’ve had, like, really fruitful collaborations with the professor Sid Kumar at TU Delft, and we always get his wisdom on some of these things, as well. But yeah, recruiting students also becomes, like, very interesting and how, like, people who can help us achieve our idea …

HUIZINGA: Yeah. Jake, what’s your take on it from the other seat? I mean, do you look actively at universities around the world—and even in your backyard—to … like U Dub … ? [LAUGHTER]

SMITH: My perspective on, like, how collaborations come in to be is they’re really serendipitous. You know, we talked about how this one came in to be, and it was because we all happen to know Vikram, and Vikram happened to connect Bichlien with Aniruddh, and it kind of rolled from there. But you can have serendipitous, you know, meetings at a conference, where you happen to, you know, sit next to someone at a talk and you both share the same perspective on, you know, how a research problem should be tackled, and something could come out of that. Or in some cases, you go actually shopping for a collaborator.

HUIZINGA: Right. [LAUGHTER]

SMITH: You know, you need to talk to 10 people to find the one that has that same research perspective as you. I’ll second Aniruddh’s, you know, observation that you get a very different perspective if you go find someone who, they may have the same, like, perspective on how research should be tackled, but they have a different perspective on what the ultimate output of that research would be. But, you know, they can often point you in areas where your research could be helpful that you can’t necessarily see because you lack the domain knowledge or you lack that particular angle on it.

HUIZINGA: Which is another interesting thing in my mind is, you know, the role that papers, published papers, play—that’s a lot of p’s in a sentence [LAUGHTER] … alliteration—that you would be reading or hearing about either in a lightning talk or a presentation at a conference. Does that broaden your perspective, as well? And how do you … like, do you call people up? “I read your paper … ”?

SMITH: [LAUGHS] I have cold-emailed people. You know, this works sometimes! Sometimes this is just the introduction that you need. But the interesting thing in my mind is how much the computer science conferences and things like ChemRxiv and arXiv have really replaced, for me, the traditional chemistry literature or the traditional publishing literature where you can have a conversation with this person while they’re still actively doing the work because they put their initial draft up there and it still needs revision, and there’s opportunities even earlier on in the research process than we’ve had in the past.

HUIZINGA: Huh. And to your earlier point, I’m envisioning an Amazon shopping cart for research collaborators. [LAUGHTER] “Oh, he looks good. Into my cart.” Aniruddh, I always like to know where a project is on the spectrum from what I call lab to life, and I know there are different development stages when it comes to technology finding its way into production and then into broader use. So to use another analogy I like, pretend this is a relay race and research is the first leg. Who else has to run, and who brings it across the line?

VASHISTH: Yeah, yeah. So I think the initial work that we have done, I think it’s been super fruitful, and to Jake’s point, like, converging to, like, a nice output. It took a bunch of chemists, mechanical engineers, simulation folks, machine learning scientists to get where we are. And, as Jake mentioned, we’ve actually put some of our publications on arXiv, and it’s getting traction now. So we’ve had some excitement from startups and companies which make polymers asking us, “Oh, can you actually … can we get a slice of this framework that you’re developing for designing vitrimers?” Which is very promising. So we have done very fundamental work, but now, like, what’s called “the valley of death” in research, [LAUGHTER] like taking it from lab to like production scale, …

HUIZINGA: Yeah.

VASHISTH: … it’s usually a very tightly knit collaboration between industry, labs, and sometimes national labs, too. So we’re excited that, actually, a couple of national labs have been interested in the work that we have been doing, so super optimistic about it.

HUIZINGA: So would you say that the vitrimer-based printed circuit board is a proof of concept right now? Or have you made prototypes? Where is that now?

SMITH: Yeah, absolutely. We’ve mentioned our other collaborator, Vikram Iyer, a couple of times. And in collaboration with his lab, we did actually make a prototype circuit board. We showed that it works as you expect. We showed that it can be disassembled. It can be put back together, and it still works as expected …

HUIZINGA: The “break stuff/make stuff back” thing …

VASHISTH: Yeah, exactly.

SMITH: But, you know, I think to the spirit of the question, it’s still individual kind of one-off experiments being run in a lab, and Aniruddh is right. There’s a long way to go from, like, Technology Readiness Level 3, where we’re doing it ourselves on bench scale, up to, you know, the 7, 8, 9, where it’s actually commercially viable and someone has been able to reproduce this at scale.

HUIZINGA: Right. … So that’s when you bring investors in or labs that can make stuff in and scale.

VASHISTH: Yeah. Yeah, I think once you’re, like, close to 7, I think that’s where you’re pretty much ready for the big show.

HUIZINGA: So where are you now? 2? 3?

VASHISTH: I would say, like, 2 or 3 …

SMITH: 2, 3, somewhere in that range.

VASHISTH: Yeah.

HUIZINGA: OK.

SMITH: The scales, kind of, differ depending on which agencies you see put it out.

HUIZINGA: So, Jake, before we close, I want to talk briefly about other applications of recyclable vitrimer-based polymers, in light of their importance to the climate research initiative and AI for Science. So what other industries have polymer components that have nowhere to go after they die but the landfill, and will this research transfer across to those industries?

SMITH: An excellent question. So my personal view on this is that there’s a couple of classes of polymers. There’s these very high-value application uses of polymers where we’re talking about the printed circuit boards; we’re talking about aerospace composite; we’re talking about the panels on your car; we’re talking about things like wind turbines …

HUIZINGA: Oh, yeah.

SMITH: … where there’s a long life cycle. You have this device that’s going to be in use for five years, 50 years, and at the end of that, the polymer itself is still probably pretty good. You could still use it and regenerate it. And so Aniruddh’s lab has done great work showing that you can take things like the side panel of a plane and actually disassemble this thing, heal it, keep it in use longer, and use it at the end of its lifetime. There’s this other class of polymers, which I think are the ones that most people think about—your Coke bottle—and vitrimers seem like a much harder sell there. I think this is more the domain of, you know, biodegradable polymers in the long run to really tackle the issues there. But I’m very excited in this, you know, high-value polymer, this long-lifetime polymer, this, like, permanent install polymer, however you want to think about it, for work like this to have an impact.

HUIZINGA: Yeah. From your lab’s perspective, Aniruddh, where do you see other applications with great promise?

VASHISTH: Yeah. So as Jake said, places where we need high-performance polymers is where we can go. So PCBs is one, aerospace and automotive industry is one, and maybe medical industry is, …

HUIZINGA: Oh, interesting…

VASHISTH: … yeah, is another one where we can actually … if you can make prosthetics out of vitrimers … prosthetics actually lose a little bit of their stiffness, you know, as you use them, and that’s because of localized damage. It’s the fatigue cycle, right. So what if you can actually heal your prosthetics and reuse them? So, yeah, I feel like, you know, there’s so many different applications, so many different routes that we can go down.

HUIZINGA: Yeah. Well, I like to end our Collaborators shows with a little vision casting, and I feel like this whole podcast is that. I should also say, you know, back in the ’50s, there was the big push to make plastics! Your word is vitrimers! So let’s do a little vision casting for vitrimer-based polymers. Assuming your research is wildly successful and becomes a truly game-changing technology, what does the future look like—I mean, specified future, not general future—and how has your work disrupted this field and made the world a better place? I’ll let you each have the last word. Who’d like to go first?

VASHISTH: Sure, I can go first. I’ll try to make sure that I break it up into computation and experiments …

HUIZINGA: Good.

VASHISTH: … so that once I go back, like, my lab does not, like, pounce on me. [LAUGHS] Yeah, so I think from the computation point of view, we always thought that if somebody gave us, like, a hundred different chemistries, we can actually bottle it down to, like, we can do a bunch of simulations; tell you, like, 10 of these actually work. What we’ve been able to do specifically for vitrimers is that we’re able to look at the problem from the other side, and we are able to say that if you tell me a particular application, this particular chemistry would work best for you. In essence, what we were thinking of is that if aliens abducted all the chemists from the world, can we actually come up with a framework? [LAUGHS] So I think it’ll be difficult to get there because as I said earlier that, you know, you need that human touch. But I think we are happy that that we are getting there. And I think what remains to be seen now is, like, you know, now that we have this type of a framework, like what are the next challenges? Like, we are going from the lab to the large scale; like, what challenges are associated there? And I think similarly for the experimental side of things also, we know a lot—we have developed frameworks—but there’s a lot of work that still needs to be done in understanding and translating these technologies to real-life applications.

HUIZINGA: I like that you’re kind of hedging your bets there, saying, I’m not going to paint a picture of the perfect world because my lab is going to be responsible for delivering it. [LAUGHTER] Jake, assuming you haven’t been abducted by aliens, what’s your take on this?

SMITH: I view, kind of, the goal of this work and the ideal impact of this work as an acceleration of getting us to these polymers being deployed in all these other applications that we’ve talked about, and we can go broader than this.

HUIZINGA: Yeah …

SMITH: I think that there’s a lot of work, both within the MCRI, within Microsoft, and outside of Microsoft in the bigger field, focused on acceleration towards a specific goal. And if all of this work is successful, in 10 years, maybe our materials design process looks completely different, where we’ve gone from this kind of brute-force screening that Aniruddh has talked about to an approach where you start with the properties that you care about; they’re defined by the application that you have in mind. You want to make your vitrimer PCB, it needs to have, you know, a specific temperature where it becomes gummy; it needs to have a specific resistance to burning; it needs to be able to effectively serve as the dielectric for your bigger circuits. And we use this, like, “need space” to define the material that we would like, and we can use machine learning, artificial intelligence, in order to get us to the structure that we need to make in order to actually achieve this design space. And so, this was, you know, our big bet within AI for Science. This is the big bet of this project. And with this project, you know, we take one step towards showing that you can do this in one case. And the future casting would be we can do this in every materials design case that you can think about.

HUIZINGA: Hmmm. You know, I’m thinking of lanes—track analogy again—but, you know, you’ve got mechanical engineering, you’ve got chemistry, and you’ve got artificial intelligence, and each of those sciences is advancing, and they’re using each other to, sort of, help advance in various ways, so this is an exciting, exciting project and collaboration.

[MUSIC]

Jake, Aniruddh, thanks for joining us today on Collaborators. This has been really fun for me. [LAUGHTER] So thanks for coming in and sharing your stories today.

VASHISTH: Thank you so much.

SMITH: Yeah. Of course. Thank you.

[MUSIC FADES]

The post Collaborators: Sustainable electronics with Jake Smith and Aniruddh Vashisth appeared first on Microsoft Research.

Read More

Japan Enhances AI Sovereignty With Advanced ABCI 3.0 Supercomputer

Japan Enhances AI Sovereignty With Advanced ABCI 3.0 Supercomputer

Enhancing Japan’s AI sovereignty and strengthening its research and development capabilities, Japan’s National Institute of Advanced Industrial Science and Technology (AIST) will integrate thousands of NVIDIA H200 Tensor Core GPUs into its AI Bridging Cloud Infrastructure 3.0 supercomputer (ABCI 3.0). The HPE Cray XD system will feature NVIDIA Quantum-2 InfiniBand networking for superior performance and scalability.

ABCI 3.0 is the latest iteration of Japan’s large-scale Open AI Computing Infrastructure designed to advance AI R&D. This collaboration underlines Japan’s commitment to advancing its AI capabilities and fortifying its technological independence.

“In August 2018, we launched ABCI, the world’s first large-scale open AI computing infrastructure,” said AIST Executive Officer Yoshio Tanaka. “Building on our experience over the past several years managing ABCI, we’re now upgrading to ABCI 3.0. In collaboration with NVIDIA we aim to develop ABCI 3.0 into a computing infrastructure that will advance further research and development capabilities for generative AI in Japan.”

“As generative AI prepares to catalyze global change, it’s crucial to rapidly cultivate research and development capabilities within Japan,” said AIST Solutions Co. Producer and Head of ABCI Operations Hirotaka Ogawa. “I’m confident that this major upgrade of ABCI in our collaboration with NVIDIA and HPE will enhance ABCI’s leadership in domestic industry and academia, propelling Japan towards global competitiveness in AI development and serving as the bedrock for future innovation.”

The ABCI 3.0 supercomputer will be housed in Kashiwa at a facility run by Japan’s National Institute of Advanced Industrial Science and Technology. Credit: Courtesy of National Institute of Advanced Industrial Science and Technology.

ABCI 3.0: A New Era for Japanese AI Research and Development

ABCI 3.0 is constructed and operated by AIST, its business subsidiary, AIST Solutions, and its system integrator, Hewlett Packard Enterprise (HPE).

The ABCI 3.0 project follows support from Japan’s Ministry of Economy, Trade and Industry, known as METI, for strengthening its computing resources through the Economic Security Fund and is part of a broader $1 billion initiative by METI that includes both ABCI efforts and investments in cloud AI computing.

NVIDIA is closely collaborating with METI on research and education following a visit last year by company founder and CEO, Jensen Huang, who met with political and business leaders, including Japanese Prime Minister Fumio Kishida, to discuss the future of AI.

NVIDIA’s Commitment to Japan’s Future

Huang pledged to collaborate on research, particularly in generative AI, robotics and quantum computing, to invest in AI startups and provide product support, training and education on AI.

During his visit, Huang emphasized that “AI factories” — next-generation data centers designed to handle the most computationally intensive AI tasks — are crucial for turning vast amounts of data into intelligence.

“The AI factory will become the bedrock of modern economies across the world,” Huang said during a meeting with the Japanese press in December.

With its ultra-high-density data center and energy-efficient design, ABCI provides a robust infrastructure for developing AI and big data applications.

The system is expected to come online by the end of this year and offer state-of-the-art AI research and development resources. It will be housed in Kashiwa, near Tokyo.

Unmatched Computing Performance and Efficiency

The facility will offer:

  • 6 AI exaflops of computing capacity, a measure of AI-specific performance without sparsity
  • 410 double-precision petaflops, a measure of general computing capacity
  • Each node is connected via the Quantum-2 InfiniBand platform at 200GB/s of bisectional bandwidth.

NVIDIA technology forms the backbone of this initiative, with hundreds of nodes each equipped with 8 NVLlink-connected H200 GPUs providing unprecedented computational performance and efficiency.

NVIDIA H200 is the first GPU to offer over 140 gigabytes (GB) of HBM3e memory at 4.8 terabytes per second (TB/s). The H200’s larger and faster memory accelerates generative AI and LLMs, while advancing scientific computing for HPC workloads with better energy efficiency and lower total cost of ownership.

NVIDIA H200 GPUs are 15X more energy-efficient than ABCI’s previous-generation architecture for AI workloads such as LLM token generation.

The integration of advanced NVIDIA Quantum-2 InfiniBand with In-Network computing — where networking devices perform computations on data, offloading the work from the CPU — ensures efficient, high-speed, low-latency communication, crucial for handling intensive AI workloads and vast datasets.

ABCI boasts world-class computing and data processing power, serving as a platform to accelerate joint AI R&D with industries, academia and governments.

METI’s substantial investment is a testament to Japan’s strategic vision to enhance AI development capabilities and accelerate the use of generative AI.

By subsidizing AI supercomputer development, Japan aims to reduce the time and costs of developing next-generation AI technologies, positioning itself as a leader in the global AI landscape.

Read More

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Attention, as a core layer of the ubiquitous Transformer architecture, is a bottleneck for large language models and long-context applications. FlashAttention (and FlashAttention-2) pioneered an approach to speed up attention on GPUs by minimizing memory reads/writes, and is now used by most libraries to accelerate Transformer training and inference. This has contributed to a massive increase in LLM context length in the last two years, from 2-4K (GPT-3, OPT) to 128K (GPT-4), or even 1M (Llama 3). However, despite its success, FlashAttention has yet to take advantage of new capabilities in modern hardware, with FlashAttention-2 achieving only 35% utilization of theoretical max FLOPs on the H100 GPU. In this blogpost, we describe three main techniques to speed up attention on Hopper GPUs: exploiting asynchrony of the Tensor Cores and TMA to (1) overlap overall computation and data movement via warp-specialization and (2) interleave block-wise matmul and softmax operations, and (3) incoherent processing that leverages hardware support for FP8 low-precision.

We’re excited to release FlashAttention-3 that incorporates these techniques. It’s 1.5-2.0x faster than FlashAttention-2 with FP16, up to 740 TFLOPS, i.e., 75% utilization of H100 theoretical max FLOPS. With FP8, FlashAttention-3 reaches close to 1.2 PFLOPS, with 2.6x smaller error than baseline FP8 attention.

FlashAttention-3 is available at: https://github.com/Dao-AILab/flash-attention
Paper

FlashAttention Recap

FlashAttention is an algorithm that reorders the attention computation and leverages tiling and recomputation to significantly speed it up and reduce memory usage from quadratic to linear in sequence length. We use tiling to load blocks of inputs from HBM (GPU memory) to SRAM (fast cache), perform attention with respect to that block, and update the output in HBM. By not writing the large intermediate attention matrices to HBM, we reduce the amount of memory reads/writes, which brings 2-4x wallclock time speedup.

Here we show a diagram of FlashAttention forward pass: with tiling and softmax rescaling, we operate by blocks and avoid having to read/write from HBM, while obtaining the correct output with no approximation.

math equations

New hardware features on Hopper GPUs – WGMMA, TMA, FP8

While FlashAttention-2 can achieve up to 70% theoretical max FLOPS on Ampere (A100) GPUs, it does not yet take advantage of new features on Hopper GPUs to maximize performance. We describe some of the new Hopper-specific features here, and why they are important.

1. WGMMA (Warpgroup Matrix Multiply-Accumulate). This new feature makes use of the new Tensor Cores on Hopper, with much higher throughput1 than the older mma.sync instruction in Ampere (image from the H100 white paper).

image from the H100 white paper

2. TMA (Tensor Memory Accelerator). This is a special hardware unit that accelerates the transfer of data between global memory and shared memory, taking care of all index calculation and out-of-bound predication. This frees up registers, which is a valuable resource to increase tile size and efficiency.

block diagram

3. Low-precision with FP8. This doubles the Tensor Core throughput (e.g. 989 TFLOPS with FP16 and 1978 TFLOPS with FP8), but trades off accuracy by using fewer bits to represent floating point numbers.

6x throughput

FlashAttention-3 makes use of all of these new features of Hopper, using powerful abstractions from NVIDIA’s CUTLASS library.

By rewriting FlashAttention to use these new features, we can already significantly speed it up (e.g., from 350 TFLOPS in FlashAttention-2 FP16 forward pass to around 540-570 TFLOPS). However, the asynchronous nature of the new instructions on Hopper (WGMMA and TMA) opens up additional algorithmic opportunities to overlap operations and thereby extract even greater performance. For this blogpost, we’ll explain two such techniques specific to attention. The generic technique of warp specialization, with separate producer and consumer warps doing TMA and WGMMA, is well-covered elsewhere in the context of GEMM and works the same here.

Asynchrony: Overlapping GEMM and Softmax

Why overlap?

Attention has GEMMs (those matmuls between Q and K and between attention probability P and V) and softmax as its two main operations. Why do we need to overlap them? Isn’t most of the FLOPS in the GEMMs anyway? As long as the GEMMs are fast (e.g., computed using WGMMA instructions), shouldn’t the GPU be going brrrr?

The problem is that non-matmul operations are much slower than matmul operations on modern accelerators. Special functions such as exponential (for the softmax) have even lower throughput than floating point multiply-add; they are evaluated by the multi-function unit, a unit separate from floating point multiply-add or matrix multiply-add. As an example, the H100 GPU SXM5 has 989 TFLOPS of FP16 matrix multiply, but only 3.9 TFLOPS (256x less throughput) for special functions2! For head dimension 128, there are 512x more matmul FLOPS than exponential, which means that exponential can take 50% of the time compared to matmul. The situation is even worse for FP8, where the matmul FLOPS are twice as fast yet exponential FLOPS stay the same speed. Ideally we want matmul and softmax to operate in parallel. While the Tensor Cores are busy with matmul, the multi-function units should be calculating exponential!

Inter-warpgroup overlapping with pingpong scheduling

The first and easiest way to overlap GEMM and softmax is to do nothing at all! The warp schedulers already try to schedule warps so that if some warps are blocked (e.g., waiting for GEMM results), other warps can run. That is, the warp schedulers do some of this overlapping for us, for free.

However, we can improve on this by doing some of the scheduling manually. As an example, if we have 2 warpgroups (labeled 1 and 2 – each warpgroup is a group of 4 warps), we can use synchronization barriers (bar.sync) so that warpgroup 1 first does its GEMMs (e.g., GEMM1 of one iteration and GEMM0 of the next iteration), and then warpgroup 2 does its GEMMs while warpgroup 1 does its softmax, and so on. This “pingpong” schedule is illustrated in the figure below, where the same color denotes the same iteration.

block chart

This would allow us to perform the softmax in the shadow of the GEMMs of the other warpgroup. Of course, this figure is just a caricature; in practice the scheduling is not really this clean. Nevertheless, pingpong scheduling can improve FP16 attention forward pass from around 570 TFLOPS to 620 TFLOPS (head dim 128, seqlen 8K).

Intra-warpgroup overlapping of GEMM and Softmax

Even within one warpgroup, we can have some part of softmax running while the GEMMs of that warpgroup is running. This is illustrated in this figure, where the same color denotes the same iteration.

block chart

This pipelining increases throughput from around 620 TFLOPS to around 640-660 TFLOPS for FP16 attention forward, at the cost of higher register pressure. We need more registers to hold both accumulators of the GEMMs, and the input/output of softmax. Overall, we find this technique to offer a favorable tradeoff.

Low-precision: reduce quantization error with incoherent processing

LLM activation can have outliers with much larger magnitude than the rest of the features. These outliers make it difficult to quantize, producing much larger quantization errors. We leverage incoherent processing, a technique used in the quantization literature (e.g. from QuIP) that multiplies the query and key with a random orthogonal matrix to “spread out” the outliers and reduce quantization error. In particular, we use the Hadamard transform (with random signs), which can be done per attention head in O(d log d) instead of O(d^2) time, where d is the head dimension. Since the Hadamard transform is memory-bandwidth bound, it can be fused with previous operations such as rotary embedding (also memory-bandwidth bound) “for free”.

In our experiment where Q, K, V are generated from a standard normal distribution but 0.1% of the entries have large magnitudes (to simulate outliers), we found that incoherent processing can reduce the quantization error by 2.6x. We show numerical error comparison in the table below. Please see the paper for details.

text diagram

Attention benchmark

We show some results with FlashAttention-3, and compare it to FlashAttention-2, as well as the implementation in Triton and cuDNN (both of which already use new hardware features of Hopper GPUs).

For FP16, we see about 1.6x-1.8x speedup over FlashAttention-2

speed charts

speed charts

For FP8, we can reach close to 1.2 PFLOPS!

speed charts

Discussion

This blogpost highlights some of the optimizations for FlashAttention available on Hopper GPUs. Other optimizations (e.g., variable length sequences, persistent kernel, and in-kernel transpose for FP8) are covered in the paper.

We have seen that designing algorithms that take advantage of the hardware they run on can bring significant efficiency gains and unlock new model capabilities such as long context. We look forward to future work on optimization for LLM inference, as well as generalizing our techniques to other hardware architectures.

We also look forward to FlashAttention-3 being integrated in a future release of PyTorch.

Notes

  1. Without the wgmma instruction, the older mma.sync instruction can only reach about ⅔ the peak throughput of Hopper Tensor Cores: https://arxiv.org/abs/2402.13499v1 

  2. The CUDA programming guide specifies that the throughput for special functions is 16 operations per streaming multiprocessor (SM) per clock cycle. We multiply 16 by 132 SMs and 1830 Mhz (clock speed used to calculate 989 TFLOPS of FP16 matmul) to get 3.9 TFLOPS 

Read More

Knowledge Bases for Amazon Bedrock now supports advanced parsing, chunking, and query reformulation giving greater control of accuracy in RAG based applications

Knowledge Bases for Amazon Bedrock now supports advanced parsing, chunking, and query reformulation giving greater control of accuracy in RAG based applications

Knowledge Bases for Amazon Bedrock is a fully managed service that helps you implement the entire Retrieval Augmented Generation (RAG) workflow from ingestion to retrieval and prompt augmentation without having to build custom integrations to data sources and manage data flows, pushing the boundaries for what you can do in your RAG workflows.

However, it’s important to note that in RAG-based applications, when dealing with large or complex input text documents, such as PDFs or .txt files, querying the indexes might yield subpar results. For example, a document might have complex semantic relationships in its sections or tables that require more advanced chunking techniques to accurately represent this relationship, otherwise the retrieved chunks might not address the user query. To address these performance issues, several factors can be controlled. In this blog post, we will discuss new features in Knowledge Bases for Amazon Bedrock can improve the accuracy of responses in applications that use RAG. These include advanced data chunking options, query decomposition, and CSV and PDF parsing improvements. These features empower you to further improve the accuracy of your RAG workflows with greater control and precision. In the next section, let’s go over each of the features including their benefits.

Features for improving accuracy of RAG based applications

In this section we will go through the new features provided by Knowledge Bases for Amazon Bedrock to improve the accuracy of generated responses to user query.

Advanced parsing

Advanced parsing is the process of analyzing and extracting meaningful information from unstructured or semi-structured documents. It involves breaking down the document into its constituent parts, such as text, tables, images, and metadata, and identifying the relationships between these elements.

Parsing documents is important for RAG applications because it enables the system to understand the structure and context of the information contained within the documents.

There are several techniques to parse or extract data from different document formats, one of which is using foundation models (FMs) to parse the data within the documents. It’s most helpful when you have complex data within documents such as nested tables, text within images, graphical representations of text and so on, which hold important information.

Using the advanced parsing option offers several benefits:

  • Improved accuracy: FMs can better understand the context and meaning of the text, leading to more accurate information extraction and generation.
  • Adaptability: Prompts for these parsers can be optimized on domain-specific data, enabling them to adapt to different industries or use cases.
  • Extracting entities: It can be customized to extract entities based on your domain and use case.
  • Complex document elements: It can understand and extract information represented in graphical or tabular format.

Parsing documents using FMs are particularly useful in scenarios where the documents to be parsed are complex, unstructured, or contain domain-specific terminology. It can handle ambiguities, interpret implicit information, and extract relevant details using their ability to understand semantic relationships, which is essential for generating accurate and relevant responses in RAG applications. These parsers might incur additional fees, see the pricing details before using this parser selection.

In Knowledge Bases for Amazon Bedrock, we provide our customers the option to use FMs for parsing complex documents such as .pdf files with nested tables or text within images.

From the AWS Management Console for Amazon Bedrock, you can start creating a knowledge base by choosing Create knowledge base. In Step 2: Configure data source, select Advanced (customization) under Chunking & parsing configurations, as shown in the following image. You can select one of the two models (Anthropic Claude 3 Sonnet or Haiku) currently available for parsing the documents.

If you want to customize the way the FM will parse your documents, you can optionally provide instructions based on your document structure, domain, or use case.

Based on your configuration, the ingestion process will parse and chunk documents, enhancing the overall response accuracy. We will now explore advanced data chunking options, namely semantic and hierarchical chunking which splits the documents into smaller units, organizes and store chunks in a vector store, which can improve the quality of chunks during retrieval.

Advanced data chunking options

The objective shouldn’t be to chunk data merely for the sake of chunking, but rather to transform it into a format that facilitates anticipated tasks and enables efficient retrieval for future value extraction. Instead of inquiring, “How should I chunk my data?”, the more pertinent question should be, “What is the most optimal approach to use to transform the data into a form the FM can use to accomplish the designated task?”[1]

To achieve this goal, we introduced two new data chunking options within Knowledge Bases for Amazon Bedrock in addition to the fixed chunking, no chunking, and default chunking options:

  • Semantic chunking: Segments your data based on its semantic meaning, helping to ensure that the related information stays together in logical chunks. By preserving contextual relationships, your RAG model can retrieve more relevant and coherent results.
  • Hierarchical chunking: Organizes your data into a hierarchical structure, allowing for more granular and efficient retrieval based on the inherent relationships within your data.

Let’s do a deeper dive on each of these techniques.

Semantic chunking

Semantic chunking analyzes the relationships within a text and divides it into meaningful and complete chunks, which are derived based on the semantic similarity calculated by the embedding model. This approach preserves the information’s integrity during retrieval, helping to ensure accurate and contextually appropriate results.

By focusing on the text’s meaning and context, semantic chunking significantly improves the quality of retrieval. It should be used in scenarios where maintaining the semantic integrity of the text is crucial.

From the console, you can start creating a knowledge base by choosing Create knowledge base. In Step 2: Configure data source, select Advanced (customization) under the Chunking & parsing configurations and then select Semantic chunking from the Chunking strategy drop down list, as shown in the following image.

Details for the parameters that you need to configure.

  • Max buffer size for grouping surrounding sentences: The number of sentences to group together when evaluating semantic similarity. If you select a buffer size of 1, it will include the sentence previous, sentence target, and sentence next while grouping the sentences. Recommended value of this parameter is 1.
  • Max token size for a chunk: The maximum number of tokens that a chunk of text can contain. It can be minimum of 20 up to a maximum of 8,192 based on the context length of the embeddings model. For example, if you’re using the Cohere Embeddings model, the maximum size of a chunk can be 512. The recommended value of this parameter is 300.
  • Breakpoint threshold for similarity between sentence groups: Specify (by a percentage threshold) how similar the groups of sentences should be when semantically compared to each other. It should be a value between 50 and 99. The recommended value of this parameter is 95.

Knowledge Bases for Amazon Bedrock first divides documents into chunks based on the specified token size. Embeddings are created for each chunk, and similar chunks in the embedding space are combined based on the similarity threshold and buffer size, forming new chunks. Consequently, the chunk size can vary across chunks.

Although this method is more computationally intensive than fixed-size chunking, it can be beneficial for chunking documents where contextual boundaries aren’t clear—for example, legal documents or technical manuals.[2]

Example:

Consider a legal document discussing various clauses and sub-clauses. The contextual boundaries between these sections might not be obvious, making it challenging to determine appropriate chunk sizes. In such cases, the dynamic chunking approach can be advantageous, because it can automatically identify and group related content into coherent chunks based on the semantic similarity among neighboring sentences.

Now that you understand the concept of semantic chunking, including when to use it, let’s do a deeper dive into hierarchical chunking.

Hierarchical chunking

With hierarchical chunking, you can organize your data into a hierarchical structure, allowing for more granular and efficient retrieval based on the inherent relationships within your data. Organizing your data into a hierarchical structure enables your RAG workflow to efficiently navigate and retrieve information from complex, nested datasets.

From the console, start creating a knowledge base by choose Create knowledge base. Configure data source, select Advanced (customization) under the Chunking & parsing configurations and then select Hierarchical chunking from the Chunking strategy drop-down list, as shown in the following image.

The following are some parameters that you need to configure.

  • Max parent token size: This is the maximum number of tokens that a parent chunk can contain. The value can range from 1 to 8,192 and is independent of the context length of the embeddings model because the parent chunk isn’t embedded. The recommended value of this parameter is 1,500.
  • Max child token size: This is the maximum number of tokens that a child token can contain. The value can range from 1 to 8,192 based on the context length of the embeddings model. The recommended value of this parameter is 300.
  • Overlap tokens between chunks: This is the percentage overlap between child chunks. Parent chunk overlap depends on the child token size and child percentage overlap that you specify. The recommended value for this parameter is 20 percent of the max child token size value.

After the documents are parsed, the first step is to chunk the documents based on the parent and child chunking size. The chunks are then organized into a hierarchical structure, where parent chunk (higher level) represents larger chunks (for example, documents or sections), and child chunks (lower level) represent smaller chunks (for example, paragraphs or sentences). The relationship between the parent and child chunks are maintained. This hierarchical structure allows for efficient retrieval and navigation of the corpus.

Some of the benefits include:

  • Efficient retrieval: The hierarchical structure allows faster and more targeted retrieval of relevant information; first by performing semantic search on the child chunk and then returning the parent chunk during retrieval. By replacing the children chunks with the parent chunk, we provide large and comprehensive context to the FM.
  • Context preservation: Organizing the corpus in a hierarchical manner helps preserve the contextual relationships between chunks, which can be beneficial for generating coherent and contextually relevant text.

Note: In hierarchical chunking, we return parent chunks and semantic search is performed on children chunks, therefore, you might see less number of search results returned as one parent can have multiple children.

Hierarchical chunking is best suited for complex documents that have a nested or hierarchical structure, such as technical manuals, legal documents, or academic papers with complex formatting and nested tables. You can combine the FM parsing discussed previously to parse the documents and select hierarchical chunking to improve the accuracy of generated responses.

By organizing the document into a hierarchical structure during the chunking process, the model can better understand the relationships between different parts of the content, enabling it to provide more contextually relevant and coherent responses.

Now that you understand the concepts for semantic and hierarchical chunking, in case you want to have more flexibility, you can use a Lambda function for adding custom processing logic to chunks such as metadata processing or defining your custom logic for chunking. In the next section, we discuss custom processing using Lambda function provided by Knowledge bases for Amazon Bedrock.

Custom processing using Lambda functions

For those seeking more control and flexibility, Knowledge Bases for Amazon Bedrock now offers the ability to define custom processing logic using AWS Lambda functions. Using Lambda functions, you can customize the chunking process to align with the unique requirements of your RAG application. Furthermore, you can extend it beyond chunking, because Lambda can also be used to streamline metadata processing, which can help unlock additional avenues for efficiency and precision.

You can begin by writing a Lambda function with your custom chunking logic or use any of the chunking methodologies provided by your favorite open source framework such as LangChain and LLamaIndex. Make sure to create the Lambda layer for the specific open source framework. After writing and testing the Lambda function, you can start creating a knowledge base by choosing Create knowledge base, in Step 2: Configure data source, select Advanced (customization) under the Chunking & parsing configurations and then select corresponding lambda function from Select Lambda function drop down, as shown in the following image:

From the drop down, you can select any Lambda function created in the same AWS Region, including the verified version of the Lambda function. Next, you will provide the Amazon Simple Storage Service (Amazon S3) path where you want to store the input documents to run your Lambda function on and to store the output of the documents.

So far, we have discussed advanced parsing using FMs and advanced data chunking options to improve the quality of your search results and accuracy of the generated responses. In the next section, we will discuss some optimizations that have been added to Knowledge Bases for Amazon Bedrock to improve the accuracy of parsing .csv files.

Metadata customization for .csv files

Knowledge Bases for Amazon Bedrock now offers an enhanced .csv file processing feature that separates content and metadata. This update streamlines the ingestion process by allowing you to designate specific columns as content fields and others as metadata fields. Consequently, it reduces the number of required files and enables more efficient data management, especially for large .csv file datasets. Moreover, the metadata customization feature introduces a dynamic approach to storing additional metadata alongside data chunks from .csv files. This contrasts with the current static process of maintaining metadata.

This customization capability unlocks new possibilities for data cleaning, normalization, and enrichment processes, enabling augmentation of your data. To use the metadata customization feature, you need to provide metadata files alongside the source .csv files, with the same name as the source data file and a <filename>.csv.metadata.json suffix. This metadata file specifies the content and metadata fields of the source .csv file. Here’s an example of the metadata file content:

{
    "metadataAttributes": {
        "docSpecificMetadata1": "docSpecificMetadataVal1",
        "docSpecificMetadata2": "docSpecificMetadataVal2"
    },
    "documentStructureConfiguration": {
        "type": "RECORD_BASED_STRUCTURE_METADATA",
        "recordBasedStructureMetadata": {
            "contentFields": [
                {
                    "fieldName": "String"
                }
            ],
            "metadataFieldsSpecification": {
                "fieldsToInclude": [
                    {
                         "fieldName": "String"
                    }
                ],
                "fieldsToExclude": [
                    {
                        "fieldName": "String"
                    }
                ]
            }
        }
    }
}

Use the following steps to experiment with the .csv file improvement feature:

  1. Upload the .csv file and corresponding <filename>.csv.metadata.json file in the same Amazon S3 prefix.
  2. Create a knowledge base using either the console or the Amazon Bedrock SDK.
  3. Start ingestion using either the console or the SDK.
  4. Retrieve API and RetrieveAndGenerate API can be used to query the structured .csv file data using either the console or the SDK.

Query reformulation

Often, input queries can be complex with many questions and complex relationships. With such complex prompts, the resulting query embeddings might have some semantic dilution, resulting in retrieved chunks that might not address such a multi-faceted query resulting in reduced accuracy along with a less than desirable response from your RAG application.

Now with query reformulation supported by Knowledge Bases for Amazon Bedrock, we can take a complex input query and break it into multiple sub-queries. These sub-queries will then separately go through their own retrieval steps to find relevant chunks. In this process, the subqueries having less semantic complexity might find more targeted chunks. These chunks will then be pooled and ranked together before passing them to the FM to generate a response.

Example: Consider the following complex query to a financial document for the fictitious company Octank asking about multiple unrelated topics:

“Where is the Octank company waterfront building located and how does the whistleblower scandal hurt the company and its image?”

We can decompose the query into multiple subqueries:

  1. Where is the Octank Waterfront building located?
  2. What is the whistleblower scandal involving Octank?
  3. How did the whistleblower scandal affect Octank’s reputation and public image?

Now, we have more targeted questions that might help retrieve chunks from the knowledge base from more semantically relevant sections of the documents without some of the semantic dilution that can occur from embedding multiple asks in a single complex query.

Query reformulation can be enabled in the console after creating a knowledge base by going to Test Knowledge Base Configurations and turning on Break down queries under Query modifications.

Query reformulation can also be enabled during runtime using the RetrieveAndGenerateAPI by adding an additional element to the KnowledgeBaseConfiguration as follows:

    "orchestrationConfiguration": {
        "queryTransformationConfiguration": {
        "type": "QUERY_DECOMPOSITION"
    }
}

Query reformulation is another tool that might help increase accuracy for complex queries that you might encounter in production, giving you another way to optimize for the unique interactions your users might have with your application.

Conclusion

With the introduction of these advanced features, Knowledge Bases for Amazon Bedrock solidifies its position as a powerful and versatile solution for implementing RAG workflows. Whether you’re dealing with complex queries, unstructured data formats, or intricate data organizations, Knowledge Bases for Amazon Bedrock empowers you with the tools and capabilities to unlock the full potential of your knowledge base.

By using advanced data chunking options, query decomposition, and .csv file processing, you have greater control over the accuracy and customization of your retrieval processes. These features not only help improve the quality of your knowledge base, but also can facilitate more efficient and effective decision-making, enabling your organization to stay ahead in the ever-evolving world of data-driven insights.

Embrace the power of Knowledge Bases for Amazon Bedrock and unlock new possibilities in your retrieval and knowledge management endeavors. Stay tuned for more exciting updates and features from the Amazon Bedrock team as they continue to push the boundaries of what’s possible in the realm of knowledge bases and information retrieval.

For more detailed information, code samples, and implementation guides, see the Amazon Bedrock documentation and AWS blog posts.

For additional resources, see:

References:

[1] LlamaIndex: Chunking Strategies for Large Language Models. Part — 1
[2] How to Choose the Right Chunking Strategy for Your LLM Application


About the authors

Sandeep Singh is a Senior Generative AI Data Scientist at Amazon Web Services, helping businesses innovate with generative AI. He specializes in Generative AI, Artificial Intelligence, Machine Learning, and System Design. He is passionate about developing state-of-the-art AI/ML-powered solutions to solve complex business problems for diverse industries, optimizing efficiency and scalability.

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Chris Pecora is a Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions while also focused on customer-obsessed science. When not running experiments and keeping up with the latest developments in generative AI, he loves spending time with his kids.

Read More

Streamline generative AI development in Amazon Bedrock with Prompt Management and Prompt Flows (preview)

Streamline generative AI development in Amazon Bedrock with Prompt Management and Prompt Flows (preview)

Today, we’re excited to introduce two powerful new features for Amazon Bedrock: Prompt Management and Prompt Flows, in public preview. These features are designed to accelerate the development, testing, and deployment of generative artificial intelligence (AI) applications, enabling developers and business users to create more efficient and effective solutions that are easier to maintain. You can use the Prompt Management and Flows features graphically on the Amazon Bedrock console or Amazon Bedrock Studio, or programmatically through the Amazon Bedrock SDK APIs.

As the adoption of generative AI continues to grow, many organizations face challenges in efficiently developing and managing prompts. Also, modern applications often require chaining or routing logics that add complexity to the development. With the Prompt Management and Flows features, Amazon Bedrock addresses these pain points by providing intuitive tools for designing and storing prompts, creating complex workflows, and advancing collaboration among team members.

Before introducing the details of the new capabilities, let’s review how prompts are typically developed, managed, and used in a generative AI application.

The prompt lifecycle

Developing effective prompts for generative AI applications is an iterative process that requires careful design, testing, and refinement. Understanding this lifecycle is crucial for creating high-quality, reliable AI-powered solutions. Let’s explore the key stages of a typical prompting lifecycle:

  • Prompt design – This initial stage involves crafting prompts that effectively communicate the desired task or query to the foundation model. Prompts are often built as prompt templates that contain variables, dynamic context, or other content to be provided at inference time. Good prompt design considers factors such as clarity, specificity, and context to elicit the most relevant and accurate responses.
  • Testing and evaluation – After they’re designed, prompts or prompt templates are tested with various inputs to assess their performance and robustness. This stage often involves comparing multiple variations to identify the most effective formulations.
  • Refinement – Based on the testing results, prompts are iteratively refined to improve their effectiveness. This often involves adjusting wording, adding or removing context, or modifying the structure of the prompt.
  • Versioning and cataloging – As prompts are developed and refined, it’s crucial to maintain versions and organize them in a prompt catalog. This allows teams to track changes, compare performance across versions, and access proven prompts for reuse.
  • Deployment – After prompts have been optimized, they can be deployed as part of a generative AI application. This involves integrating the prompt into a larger system or workflow.
  • Monitoring and iteration – After deployment, teams continually monitor the performance of prompts in live applications and iterate to maintain or improve their effectiveness.

Throughout this lifecycle, the prompt design and prompt catalog play critical roles. A well-designed prompt significantly enhances the quality and relevance of AI-generated responses. A comprehensive prompt catalog is a valuable resource for developers, enabling them to use proven prompts and best practices across projects, saving both time and money.

For more complex generative AI applications, developers often employ patterns such as prompt chaining or prompt routing. These approaches allow for the definition of more sophisticated logic and dynamic workflows, often called prompt flows.

Prompt chaining uses the output of one prompt as input for another, creating a sequence of interactions with the foundation model (FM) to accomplish more complex tasks. For example, a customer service chatbot could initially use an FM to extract key information about a customer and their issue, then pass the details as input for calling a function to open a support ticket. The following diagram illustrates this workflow.

Prompt routing refers to the process of dynamically selecting and applying different prompts based on certain conditions or the nature of the input, allowing for more flexible and context-aware AI applications. For example, a user request to a banking assistant could dynamically decide if the answer can be best found with Retrieval Augmented Generation (RAG) when asked about the available credit cards details, or calling a function for running a query when the user asks about their account balance. The following diagram illustrates this workflow.

Combining these two patterns is common in modern generative AI application development. By understanding and optimizing each stage of the prompting lifecycle and using techniques like chaining and routing, you can create more powerful, efficient, and effective generative AI solutions.

Let’s dive into the new features in Amazon Bedrock and explore how they can help you transform your generative AI development process.

Prompt management: Optimize your AI interactions

The Prompt Management feature streamlines the creation, evaluation, deployment, and sharing of prompts. This feature helps developers and business users obtain the best responses from FMs for their specific use cases.

Key benefits of Prompt Management include the following:

  • Rapid prompt creation and iteration – Create your prompts and variations with the built-in prompt builder on the Amazon Bedrock console or with the CreatePrompt Incorporate dynamic information using inputs for building your prompt templates.
  • Seamless testing and deployment – Quickly test individual prompts, set variables and their test values. Create prompt versions stored in the built-in prompt library for cataloging and management using the Amazon Bedrock console or the GetPrompt, ListPrompts, and CreatePromptVersion
  • Collaborative prompt development – Use your prompts and prompt templates in flows or Amazon Bedrock Studio. Prompt management enables team members to collaborate on prompt creation, evaluation, and deployment, improving efficiencies in the development process.

There are no prerequisites for using the Prompt Management feature beyond access to the Amazon Bedrock console. For information on AWS Regions and models supported, refer to Prompt management in Amazon Bedrock. If you don’t currently have access to the Amazon Bedrock console, refer to Set up Amazon Bedrock.

To get started with the Prompt Management feature on the Amazon Bedrock console, complete the following steps:

  1. On the Amazon Bedrock console, under Builder tools in the navigation pane, choose Prompt management.
  1. Create a new prompt or select an existing one from the prompt library.
  1. Use the prompt builder to select a model, set parameters, and write the prompt content.
  1. Configure variables for creating prompt templates and test your prompts dynamically.
  1. Create and manage prompt versions for using in your generative AI flows.

Prompt flows: Visualize and accelerate Your AI workflows

The Amazon Bedrock Flows feature introduces a visual builder that simplifies the creation of complex generative AI workflows. This feature allows you to link multiple FMs, prompts, and other AWS services, reducing development time and effort.

Key benefits of prompt flows include:

  • Intuitive visual builder – Drag and drop components to create a flow, linking prompts with other prompts, AI services, knowledge bases, and business logic. This visual approach helps eliminate the need for extensive coding and provides a comprehensive overview of your application’s structure. Alternatively, you can use the CreateFlow API for a programmatic creation of flows that help you automate processes and development pipelines.
  • Rapid testing and deployment – Test your flows directly on the Amazon Bedrock console for faster iteration or using the InvokeFlow At any time, you can snapshot the flow for integration into your generative AI application. The flow is surfaced through an Agents for Amazon Bedrock runtime endpoint. You can create flow versions on the Amazon Bedrock console or with the CreateFlowVersion API. Creating an alias on the Amazon Bedrock console or with the CreateFlowAlias API enables straightforward rollbacks and A/B testing between different versions of the flow without impacting your service or development pipelines.
  • Manage and templatize – Accelerate your development with flow templates for repeated common use cases. You can manage your flows on the Amazon Bedrock console or with the GetFlow and ListFlows

Before you get started in your account, refer to How Flows for Amazon Bedrock works for details on the permissions required and quotas. When you’re ready, complete the following steps to get started with flows on the Amazon Bedrock console:

  1. On the Amazon Bedrock console, under Builder tools in the navigation pane, choose Flows.
  2. Create a flow by providing a name, description, and AWS Identity and Access Management (IAM) role.
  3. Access the visual builder in the working draft of your flow.
  4. Drag and drop individual components or nodes, including prompt templates from your prompt catalog, and link them together. You can edit the properties of each node and use other elements available in Amazon Bedrock.
  5. Use the available nodes to implement conditions, code hooks with AWS Lambda functions, or integrations with AI services such as Amazon Lex, among many other options to be added soon. You can chain or route steps to define your own logic and processing outputs.
  6. Test your prompt flows dynamically and set up your outputs for deploying your generative AI applications.

In our example, we create a flow for dynamically routing the user question to either query a knowledge base in Amazon Bedrock or respond directly from the LLM. We can now invoke this flow from our application frontend.

Example use case: Optimizing ecommerce customer service chatbots

To illustrate the power of these new features, let’s consider Octank, a fictional large ecommerce company facing challenges to efficiently create, test, and deploy AI-powered customer service chatbots for different product categories. This resulted in inconsistent performance and slow iteration cycles.

In the following notebook, we provide a guided example that you can follow to get started with Prompt Management and Prompt Flows programmatically.

Using prompt management and flows in Amazon Bedrock, Octank’s development and prompt engineering teams can now accomplish the following:

  • Create visual and programmatic workflows for each product category chatbot, incorporating different FMs and AI services as needed
  • Rapidly prototype and test prompt variations for each chatbot, optimizing for accuracy and relevance
  • Collaborate across teams to refine prompts and share best practices
  • Deploy and A/B test different chatbot versions to identify the most effective configurations

As a result, Octank has significantly reduced their development time, improved chatbot response quality, and achieved more consistent performance across product lines with increased reuse of artefacts.

Conclusion

The new Prompt Management and Flows features in Amazon Bedrock represent a significant leap forward in generative AI development. By streamlining workflow creation, prompt management, and team collaboration, these tools enable faster time-to-market and higher-quality AI-powered solutions.

We invite you to explore these new features in preview and experience firsthand how they can improve your generative AI development process. To get started, open the Amazon Bedrock console or discover the new APIs in the Amazon Bedrock SDK, and begin creating your prompts and flows today.

We’re excited to see the innovative applications you’ll build with these new capabilities. As always, we welcome your feedback through AWS re:Post for Amazon Bedrock or your usual AWS contacts. Join the generative AI builder community at community.aws to share your experiences and learn from others.

Stay tuned for more updates as we continue to enhance Amazon Bedrock and empower you to build the next generation of AI-powered applications!

To learn more, refer to the documentation on prompt management and prompt flows for Amazon Bedrock.


About the Authors

Antonio RodriguezAntonio Rodriguez is a Sr. Generative AI Specialist Solutions Architect at AWS. He helps companies of all sizes solve their challenges, embrace innovation, and create new business opportunities with Amazon Bedrock. Apart from work, he loves to spend time with his family and play sports with his friends.

Jared Dean is a Principal AI/ML Solutions Architect at AWS. Jared works with customers across industries to develop machine learning applications that improve efficiency. He is interested in all things AI, technology, and BBQ.

Read More