March 2025 – Page 14

Does Spatial Cognition Emerge in Frontier Models?

Not yet. We present SPACE, a benchmark that systematically evaluates spatial cognition in frontier models. Our benchmark builds on decades of research in cognitive science. It evaluates large-scale mapping abilities that are brought to bear when an organism traverses physical environments, smaller-scale reasoning about object shapes and layouts, and cognitive infrastructure such as spatial attention and memory. For many tasks, we instantiate parallel presentations via text and images, allowing us to benchmark both large language models and large multimodal models. Results suggest that…Apple Machine Learning Research

SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions

In this work, we present and evaluate SELMA, a Speech-Enabled Language Model for virtual Assistant interactions that integrates audio and text as inputs to a Large Language Model (LLM). SELMA is designed to handle three primary and two auxiliary tasks related to interactions with virtual assistants simultaneously within a single end-to-end model. We employ low-rank adaptation modules for parameter-efficient training of both the audio encoder and the LLM. Additionally, we implement a feature pooling strategy enabling the system to recognize global patterns and improve accuracy on tasks less…Apple Machine Learning Research

Advancing healthcare and scientific discovery with AI

Google Research AI is making accurate health information more accessible and personalized while also improving health outcomes globally.Read More

4 creative ways I’m using Pixel Studio

Learn more about Pixel Studio, including the just-launched human image generation feature.Read More

Accelerate AWS Well-Architected reviews with Generative AI

Building cloud infrastructure based on proven best practices promotes security, reliability and cost efficiency. To achieve these goals, the AWS Well-Architected Framework provides comprehensive guidance for building and improving cloud architectures. As systems scale, conducting thorough AWS Well-Architected Framework Reviews (WAFRs) becomes even more crucial, offering deeper insights and strategic value to help organizations optimize their growing cloud environments.

In this post, we explore a generative AI solution leveraging Amazon Bedrock to streamline the WAFR process. We demonstrate how to harness the power of LLMs to build an intelligent, scalable system that analyzes architecture documents and generates insightful recommendations based on AWS Well-Architected best practices. This solution automates portions of the WAFR report creation, helping solutions architects improve the efficiency and thoroughness of architectural assessments while supporting their decision-making process.

Scaling Well-Architected reviews using a generative AI-powered solution

As organizations expand their cloud footprint, they face several challenges in adhering to the Well-Architected Framework:

Time-consuming and resource-intensive manual reviews
Inconsistent application of Well-Architected principles across different teams
Difficulty in keeping pace with the latest best practices
Challenges in scaling reviews for large or numerous architectures

To address these challenges, we have built a WAFR Accelerator solution that uses generative AI to help streamline and expedite the WAFR process. By automating the initial assessment and documentation process, this solution significantly reduces time spent on evaluations while providing consistent architecture assessments against AWS Well-Architected principles. This allows teams to focus more on implementing improvements and optimizing AWS infrastructure. The solution incorporates the following key features:

Using a Retrieval Augmented Generation (RAG) architecture, the system generates a context-aware detailed assessment. The assessment includes a solution summary, an evaluation against Well-Architected pillars, an analysis of adherence to best practices, actionable improvement recommendations, and a risk assessment.
An interactive chat interface allows deeper exploration of both the original document and generated content.
Integration with the AWS Well-Architected Tool pre-populates workload information and initial assessment responses.

This solution offers the following key benefits:

Rapid analysis and resource optimization – What previously took days of manual review can now be accomplished in minutes, allowing for faster iteration and improvement of architectures. This time efficiency translates to significant cost savings and optimized resource allocation in the review process.
Consistency and enhanced accuracy – The approach provides a consistent application of AWS Well-Architected principles across reviews, reducing human bias and oversight. This systematic approach leads to more reliable and standardized evaluations.
Depth of insight – Advanced analysis can identify subtle patterns and potential issues that might be missed in manual reviews, providing deeper insights into architectural strengths and weaknesses.
Scalability – The solution can handle multiple reviews simultaneously, making it suitable for organizations of all sizes, from startups to enterprises. This scalability allows for more frequent and comprehensive reviews.
Interactive exploration -The generative AI-driven chat interface allows users to dive deeper into the assessment, asking follow-up questions and gaining a better understanding of the recommendations. This interactivity enhances engagement and promotes more thorough comprehension of the results.

Solution overview

The WAFR Accelerator is designed to streamline and enhance the architecture review process by using the capabilities of generative AI through Amazon Bedrock and other AWS services. This solution automates the analysis of complex architecture documents, evaluating them against the AWS Well-Architected Framework’s pillars and providing detailed assessments and recommendations.

The solution consists of the following capabilties:

Generative AI-powered analysis – Uses Amazon Bedrock to rapidly analyze architecture documents against AWS Well-Architected best practices, generating detailed assessments and recommendations.
Knowledge base integration – Incorporates up-to-date WAFR documentation and cloud best practices using Amazon Bedrock Knowledge Bases, providing accurate and context-aware evaluations.
Customizable – Uses prompt engineering, which enables customization and iterative refinement of the prompts used to drive the large language model (LLM), allowing for refining and continuous enhancement of the assessment process.
Integration with the AWS Well-Architected Tool – Creates a Well-Architected workload milestone for the assessment and prepopulates answers for WAFR questions based on generative AI-based assessment.
Generative AI-assisted chat – Offers an AI-driven chat interface for in-depth exploration of assessment results, supporting multi-turn conversations with context management.
Scalable architecture – Uses AWS services like AWS Lambda and Amazon Simple Queue Service (Amazon SQS) for efficient processing of multiple reviews.
Data privacy and network security – With Amazon Bedrock, you are in control of your data, and all your inputs and customizations remain private to your AWS account. Your data, such as prompts, completions, custom models, and data used for fine-tuning or continued pre-training, is not used for service improvement and is never shared with third-party model providers. Your data remains in the AWS Region where the API call is processed. All data is encrypted in transit and at rest. You can use AWS PrivateLink to create a private connection between your VPC and Amazon Bedrock.

A human-in-the-loop review is still crucial to validate the generative AI findings, checking for accuracy and alignment with organizational requirements.

The following diagram illustrates the solution’s technical architecture.

The workflow consists of the following steps:

WAFR guidance documents are uploaded to a bucket in Amazon Simple Storage Service (Amazon S3). These documents form the foundation of the RAG architecture. Using Amazon Bedrock Knowledge Base, the sample solution ingests these documents and generates embeddings, which are then stored and indexed in Amazon OpenSearch Serverless. This creates a vector database that enables retrieval of relevant WAFR guidance during the review process
Users access the WAFR Accelerator Streamlit application through Amazon CloudFront, which provides secure and scalable content delivery. User authentication is handled by Amazon Cognito, making sure only authenticated user have access.
Users upload their solution architecture document in PDF format using the Streamlit application running on an Amazon Elastic Compute Cloud (Amazon EC2) instance that stores it in an S3 bucket. On submission, the WAFR review process is invoked by Amazon SQS, which queues the review request.
The WAFR reviewer, based on Lambda and AWS Step Functions, is activated by Amazon SQS. It orchestrates the review process, including document content extraction, prompt generation, solution summary, knowledge embedding retrieval, and generation.
Amazon Textract extracts the content from the uploaded documents, making it machine-readable for further processing.
The WAFR reviewer uses Amazon Bedrock Knowledge Bases’ fully managed RAG workflow to query the vector database in OpenSearch Serverless, retrieving relevant WAFR guidance based on the selected WAFR pillar and questions. Metadata filtering is used to improve retrieval accuracy.
Using the extracted document content and retrieved embeddings, the WAFR reviewer generates an assessment using Amazon Bedrock. A workload is created in the AWS Well-Architected Tool with answers populated with the assessment results. This allows users to download initial version of the AWS Well-Architected report from the AWS Well-Architected Tool console on completion of the assessment.
The assessment is also stored in an Amazon DynamoDB table for quick retrieval and future reference.
The WAFR Accelerator application retrieves the review status from the DynamoDB table to keep the user informed.
Users can chat with the content using Amazon Bedrock, allowing for deeper exploration of the document, assessment, and recommendations.
Once the assessment is complete, human reviewers can review it in the AWS Well-Architected Tool.

Deploy the solution

To implement the solution in your own environment, we’ve provided resources in the following GitHub repo to guide you through the process. The setup is streamlined using the AWS Cloud Development Kit (AWS CDK), which allows for infrastructure as code (IaC) deployment. For step-by-step instructions, we’ve prepared a detailed README file that walks you through the entire setup process.

To get started, complete the following steps:

Clone the provided repository containing the AWS CDK code and README file.
Review the README file for prerequisites and environment setup instructions.
Follow the AWS CDK deployment steps outlined in the documentation.
Configure necessary environment-specific parameters as described.

Deploying and running this solution in your AWS environment will incur costs for the AWS services used, including but not limited to Amazon Bedrock, Amazon EC2, Amazon S3, and DynamoDB. It is highly recommended that you use a separate AWS account and setup AWS Budget to monitor the costs.

DISCLAIMER: This is sample code for non-production usage. You should work with your security and legal teams to adhere to your organizational security, regulatory, and compliance requirements before deployment.

Test the solution

The following diagram illustrates the workflow for using the application.

To demonstrate how generative AI can accelerate AWS Well-Architected reviews, we have developed a Streamlit-based demo web application that serves as the front-end interface for initiating and managing the WAFR review process.

Complete the following steps to test the demo application:

Open a new browser window and enter the CloudFront URL provided during the setup.
Add a new user to the Amazon Cognito user pool deployed by the AWS CDK during the setup. Log in to the application using this user’s credentials.
Choose New WAFR Review in the navigation pane.
For Analysis type, choose the analysis type:
- Quick – You can generate a quick analysis without creating a workload in the AWS Well-Architected Tool. This option is faster because it groups the questions for an individual pillar into a single prompt. It’s suitable for an initial assessment.
- Deep with Well-Architected Tool – You can generate a comprehensive and detailed analysis that automatically creates a workload in the AWS Well-Architected tool. This thorough review process requires more time to complete as it evaluates each question individually rather than grouping them together. The deep review typically takes approximately 20 minutes, though the actual duration may vary depending on the document size and the number of Well- Architected pillars selected for evaluation.
Enter the analysis name and description.
Choose the AWS Well-Architected lens and desired pillars.
Upload your solution architecture or technical design document
Choose Create WAFR Analysis.
Choose Existing WAFR Reviews in the navigation pane.
Choose your newly submitted analysis.

After the status changes to Completed, you can view the WAFR analysis at the bottom of the page. For multiple reviews, choose the relevant analysis on the dropdown menu.

You can chat with the uploaded document as well as the other generated content by using the WAFR Chat section on the Existing WAFR Reviews page.

Improving assessment quality

The solution uses prompt engineering to optimize textual input to the foundation model (FM) to obtain desired assessment responses. The quality of prompt (the system prompt, in this case) has significant impact on the model output. The solution provides a sample system prompt that is used to drive the assessment. You could enhance this prompt further to align with specific organizational needs. This becomes more crucial when defining and ingesting your own custom lenses.

Another important factor is the quality of the document that is uploaded for assessment. Detailed and architecture-rich documents can result in better inferences and therefore finer assessments. Prompts are defined in such a way that if there is inadequate information for assessment, then it’s highlighted in the output. This minimizes hallucination by the FM and provides a potential opportunity to enrich your design templates in alignment with AWS Well-Architected content.

You could further enhance this solution by using Amazon Bedrock Guardrails to further reduce hallucinations and ground responses in your own source information.

At the time of writing of this blog, only the AWS Well-Architected Framework, Financial Services Industry, and Analytics lenses have been provisioned. However, other lenses, including custom lenses, could be added with a few refinements to the UI application and underlying data store.

Clean up

After you’ve finished exploring or using the solution and no longer require these resources, be sure to clean them up to avoid ongoing charges. Follow these steps to remove all associated resources:

Navigate to the directory containing your AWS CDK code.
Run the following command: cdk destroy.
Confirm the deletion when prompted.
Manually check for and delete any resources that might not have been automatically removed, such as S3 buckets with content or custom IAM roles.
Verify that all related resources have been successfully deleted.

Conclusion

In this post, we showed how generative AI and Amazon Bedrock can play a crucial role in expediting and scaling the AWS Well-Architected Framework reviews within an organization. By automating document analysis and using a WAFR-aware knowledge base, the solution offers rapid and in-depth assessments, helping organizations build secure, high-performing, resilient, and efficient infrastructure for a variety of applications and workloads.

To learn more, refer to the following:

About the Authors

Shoeb Bustani is a Senior Enterprise Solutions Architect at AWS, based in the United Kingdom. As a senior enterprise architect, innovator, and public speaker, he provides strategic architectural partnership and guidance to help customers achieve their business outcome leveraging AWS services and best practices.

Brijesh Pati is an Enterprise Solutions Architect at AWS, helping enterprise customers adopt cloud technologies. With a background in application development and enterprise architecture, he has worked with customers across sports, finance, energy, and professional services sectors. Brijesh specializes in AI/ML solutions and has experience with serverless architectures.

Rohan Ghosh is as an Enterprise Solutions Architect at Amazon Web Services (AWS), specializing in the Advertising and Marketing sector. With extensive experience in Cloud Solutions Engineering, Application Development, and Enterprise Support, he helps organizations architect and implement cutting-edge cloud solutions. His current focus areas include Data Analytics and Generative AI, where he guides customers in leveraging AWS technologies to drive innovation and business transformation.

Dynamic metadata filtering for Amazon Bedrock Knowledge Bases with LangChain

Amazon Bedrock Knowledge Bases offers a fully managed Retrieval Augmented Generation (RAG) feature that connects large language models (LLMs) to internal data sources. It’s a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts. It also provides developers with greater control over the LLM’s outputs, including the ability to include citations and manage sensitive information.

Amazon Bedrock Knowledge Bases has a metadata filtering capability that allows you to refine search results based on specific attributes of the documents, improving retrieval accuracy and the relevance of responses. These metadata filters can be used in combination with the typical semantic (or hybrid) similarity search. Improving document retrieval results helps personalize the responses generated for each user. Dynamic metadata filters allow you to instantly create custom queries based on the varying user profiles or user-inputted responses so the documents retrieved only contain information relevant to the your needs.

In this post, we discuss using metadata filters with Amazon Bedrock Knowledge Bases.

Solution overview

The following code is an example metadata filter for Amazon Bedrock Knowledge Bases. Logical operators (such as AND or OR) can be nested to combine other logical operators and filter conditions. For more information, refer to the Retrieve API.

{
    "andAll": [
        {
            "equals": {
                "key": "desired_destination",
                "value": "<UNKNOWN>"  # This will be overwritten with appropriate values at runtime
            }
        },
        {
            "equals": {
                "key": "travelling_with_children",
                "value": "<UNKNOWN>"  # This will be overwritten with appropriate values at runtime
            }
        }
    ]
}

For our use case, we use an example of a travel website where the user answers a few questions about their travel preferences (including desired destination, preferred activities, and traveling companions) and then the system retrieves relevant documents.

We exclusively focus on the retrieval portion of RAG in this post. We provide the upstream components, including document ingestion and query formatting, as static data instead of code. The downstream generation component is out of scope for this post.

Prerequisites

To follow along with this post, you should understand basic retrieval techniques such as similarity search.

Additionally, you need an Amazon Bedrock knowledge base populated with documents and metadata. For instructions, see Create an Amazon Bedrock knowledge base. We have provided example documents and metadata in the accompanying GitHub repo for you to upload.

The associated notebook contains the required library imports and environment variables. Make sure you run the notebook using an AWS Identity and Access Management (IAM) role with the correct permissions for Amazon Simple Storage Service (Amazon S3) and Amazon Bedrock (AmazonS3FullAccess and AmazonBedrockFullAccess, respectively). We recommend running the notebook locally or in Amazon SageMaker. Then you can run the following code to test your AWS and knowledge base connection:

# Test AWS connection
# Create a session using your AWS credentials
session = boto3.Session()

# Create an STS client
sts_client = session.client('sts')

# Get the caller identity
response = sts_client.get_caller_identity()

# Print the response
print(response)

knowledge_base_id = 'XXXXXXXXXX'

retrieval_config = {
    "vectorSearchConfiguration": {
        "numberOfResults": 4,
        "overrideSearchType": "HYBRID"
    }
}

# Test bedrock knowledge bases connection
client = boto3.client('bedrock-agent-runtime')

response = client.retrieve(
    knowledgeBaseId=knowledge_base_id,
    retrievalConfiguration=retrieval_config,
    retrievalQuery={"text": "Hello world"}
)

print(response)

Create a dynamic filter

The "value" field within the filter needs to be updated at request time. This means overwriting the retrieval_config object, as shown in the following figure. The placeholder values in the filter get overwritten with the user data at runtime.

Because the retrieval_config object is a nested hierarchy of logical conditions (a tree), you can implement a breadth first search to identify and replace all the "value" field values (where "value" is the key and "<UNKNOWN>" is the placeholder value) with the corresponding value from the user data. See the following code:

def setup_retrieval_config(inputs):

    # Make a copy because the filter is updated dynamically based on the user_data, this allows you to start from the default each time
    local_retrieval_config = copy.deepcopy(retrieval_config)

    updated_vector_search_config = replace_values(local_retrieval_config["vectorSearchConfiguration"], inputs["user_data"])
    local_retrieval_config["vectorSearchConfiguration"] = updated_vector_search_config

    return local_retrieval_config

def replace_values(vector_search_config: Dict, user_data: Dict):
    # Replace the value fields in the filter with the correct value according to the user_data
    # Uses breadth first search to find all of the value fields

    # Filter is not a required key, if you do not want any filters get rid of the key
    if "filter" in vector_search_config and not vector_search_config["filter"]:
        del vector_search_config["filter"]

    # Recursively traverse from the root
    elif 'filter' in vector_search_config:
        vector_search_config['filter'] = replace_values(vector_search_config['filter'], user_data)

    # At a node that is not the root
    else:
        for key, value in vector_search_config.items():
            if isinstance(value, dict):

                # At a leaf e.g. {"key": "age", "value": ""}}
                if 'key' in value and 'value' in value:

                    # Only overwrite value['value'] that are not unknown
                    if value['key'] in user_data and not (value["value"] == "unknown" or value["value"] == ["unknown"]):

                        # Primitive data type
                        if type(value["value"]) in [str, int, float, bool]:
                            value['value'] = user_data[value['key']]

                        # List data type
                        elif isinstance(value["value"], list):
                            value['value'] = [user_data[value['key']]]
                        else:
                            raise ValueError(f"Unsupported value['value'] type {type(value['value'])}")
                else:
                    vector_search_config[key] = replace_values(value, user_data)

            # Recurse on each item in the list
            elif isinstance(value, list):
                vector_search_config[key] = [replace_values(item, user_data) for item in value]
            else:
                raise ValueError(f"Unsupported value type {type(value)}")

    return vector_search_config

Option 1: Create a retriever each time

To define the retrieval_config parameter dynamically, you can instantiate AmazonKnowledgeBasesRetriever each time. This integrates into a larger LangChain centric code base. See the following code:

def create_retrieval_chain() -> Runnable:
        """
        Creates a retrieval chain for the retriever.

        Returns:
            Runnable: The retrieval chain.
        """

        query = create_query_for_retrieval()

        def create_retriever(inputs):
            # This wrapper is necessary because if you return a callable object LangChain will automatically call it immediately, which is not the desired behavior
            # instead we want to call the retriever on the next step of the chain
            retriever_wrapper = {"retriever": AmazonKnowledgeBasesRetriever(knowledge_base_id=knowledge_base_id, retrieval_config=inputs["retrieval_config"])}
            return retriever_wrapper

        # Retrieval chain has three steps: (1) create the filter based off of the user data, (2) create the retriever, and (3) invoke the retriever
        retrieval_chain = (
            {
                "user_data" : itemgetter("user_data"),
                "retrieval_config" : lambda inputs: setup_retrieval_config(inputs)
            } |
            {
                "query" : query,
                "retriever" : lambda inputs: create_retriever(inputs)
            } |
            RunnableLambda(lambda inputs: inputs["retriever"]["retriever"].invoke(inputs["query"]))
        )
        return retrieval_chain

Option 2: Access the underlying Boto3 API

The Boto3 API is able to directly retrieve with a dynamic retrieval_config. You can take advantage of this by accessing the object that AmazonKnowledgeBasesRetriever wraps. This is slightly faster but is less pythonic because it relies on LangChain implementation details, which may change without notice. This requires additional code to adapt the output to the proper format for a LangChain retriever. See the following code:

retriever = AmazonKnowledgeBasesRetriever(
    knowledge_base_id=knowledge_base_id,
    retrieval_config=retrieval_config
)

def create_retrieval_chain() -> Runnable:
        """
        Creates a retrieval chain for the retriever.

        Returns:
            Runnable: The retrieval chain.
        """

        query = create_query_for_retrieval()
        
        def retrieve_and_format(inputs):
            results = retriever.client.retrieve(
                retrievalQuery={"text": inputs["query"]}, 
                knowledgeBaseId=knowledge_base_id, 
                retrievalConfiguration=inputs["retrieval_config"]
            )
        
            documents = []
            for result in results["retrievalResults"]:
                metadata = {
                    "location": result["location"],
                    "source_metadata": result["metadata"],
                    "score": result["score"],
                }

                document = Document(
                    page_content=result["content"]["text"],
                    metadata=metadata
                )
                documents.append(document)
            
            return documents

        retrieval_chain = (
            {
                "query" : query,
                "retrieval_config" : lambda inputs: setup_retrieval_config(inputs)
            } |
            RunnableLambda(lambda inputs: retrieve_and_format(inputs))
            # RunnableLambda(lambda inputs: retriever.client.retrieve(retrievalQuery={"text": inputs["query"]}, knowledgeBaseId=knowledge_base_id, retrievalConfiguration=inputs["retrieval_config"]))
        )
        return retrieval_chain

retrieval_chain_2 = create_retrieval_chain()

Results

Begin by reading in the user data. This example data contains user answers to an online questionnaire about travel preferences. The user_data fields must match the metadata fields.

with open("data/user_data.json", "r") as file:
user_data = json.load(file)

print(json.dumps(user_data[:2], indent=2))

Here is a preview of the user_data.json file from which certain fields will be extracted as values for filters.

{
        "trip_id": 1,
        "desired_destination": "Bali, Indonesia",
        "stay_duration": 7,
        "age": 35,
        "gender": "male",
        "companion": "solo",
	"travelling_with_children": "no",
        "travelling_with_pets": "no"
    },
    {
        "trip_id": 2,
        "desired_destination": "Paris, France",
        "stay_duration": 5,
        "age": 28,
        "gender": "female",
        "companion": "solo",
	"travelling_with_children": "no",
        "travelling_with_pets": "yes"
    },

Test the code with filters turned on and off. Only use a few filtering criteria because restrictive filters might return zero documents.

filters_to_test: List = [
    {
        "andAll": [
            {
                "equals": {
                    "key": "desired_destination",
                    "value": "<UNKNOWN>"  # This will be overwritten with appropriate values at runtime
                }
            },
            {
                "equals": {
                    "key": "travelling_with_children",
                    "value": "<UNKNOWN>"  # This will be overwritten with appropriate values at runtime
                }
            }
        ]
    },
    None
]

Finally, run both retrieval chains through both sets of filters for each user:

retrieval_chains = [retrieval_chain_1, retrieval_chain_2]

results = []

for retrieval_chain_id, retrieval_chain in enumerate(retrieval_chains):
    logger.info(retrieval_chain_id)
    # Loop through each filter options
    for filter in filters_to_test:
        retrieval_config["vectorSearchConfiguration"]["filter"] = filter
        # Loop through each user data entry
        for user_entry in user_data:
            inputs = {
                    "user_data": user_entry,
                    "retrieval_config": retrieval_config
                }

            # Run the retrieval chain with the current user entry
            try:
                result = retrieval_chain.invoke(inputs)
                # print(f"Result for user entry {user_entry['trip_id']}: {result}")
                results.append(({'retrieval_chain_id': retrieval_chain_id, 'user': user_entry, 'documents': result}))

            except Exception as e:
                print(f"Error during retrieval for user entry {user_entry['trip_id']}: {e}")

When analyzing the results, you can see that the first half of the documents are identical to the second half. In addition, when metadata filters aren’t used, the documents retrieved are occasionally for the wrong location. For example, trip ID 2 is to Paris, but the retriever pulls documents about London.

Excerpt of output table for reference:

Retrieval Approach	Filter	Trip ID	Destination	Page Content	Metadata
Option_0	TRUE	2	Paris, France	As a 70-year-old retiree, I recently had the pleasure of visiting Paris for the first time. It was a trip I had been looking forward to for years, and I was not disappointed. Here are some of my favorite attractions and activities that I would recommend to other seniors visiting the city. First on my list is the Eiffel Tower…	{‘location’: {‘s3Location’: {‘uri’: ‘s3://{YOUR_S3_BUCKET}/travel_reviews_titan/Paris_6.txt‘}, ‘type’: ‘S3’}, ‘score’: 0.48863396, ‘source_metadata’: {‘x-amz-bedrock-kb-source-uri’: ‘s3://{YOUR_S3_BUCKET}/travel_reviews_titan/Paris_6.txt‘, ‘travelling_with_children’: ‘no’, ‘activities_interest’: [‘museums’, ‘palaces’, ‘strolling’, ‘boat tours’, ‘neighborhood tours’], ‘companion’: ‘unknown’, ‘x-amz-bedrock-kb-data-source-id’: {YOUR_KNOWLEDGE_BASE_ID}, ‘stay_duration’: ‘unknown’, ‘preferred_month’: [‘unknown’], ‘travelling_with_pets’: ‘unknown’, ‘age’: [’71’, ’80’], ‘x-amz-bedrock-kb-chunk-id’: ‘1%3A0%3AiNKlapMBdxcT3sYpRK-d’, ‘desired_destination’: ‘Paris, France’}}
Option_0	TRUE	2	Paris, France	As a 35-year-old traveling with my two dogs, I found Paris to be a pet-friendly city with plenty of attractions and activities for pet owners. Here are some of my top recommendations for traveling with pets in Paris: The Jardin des Tuileries is a beautiful park located between the Louvre Museum and the Place de la Concorde…	{‘location’: {‘s3Location’: {‘uri’: ‘s3://{YOUR_S3_BUCKET}/travel_reviews_titan/Paris_9.txt‘}, ‘type’: ‘S3’}, ‘score’: 0.474106, ‘source_metadata’: {‘x-amz-bedrock-kb-source-uri’: ‘s3://{YOUR_S3_BUCKET}/travel_reviews_titan/Paris_9.txt‘, ‘travelling_with_children’: ‘no’, ‘activities_interest’: [‘parks’, ‘museums’, ‘river cruises’, ‘neighborhood exploration’], ‘companion’: ‘pets’, ‘x-amz-bedrock-kb-data-source-id’: {YOUR_KNOWLEDGE_BASE_ID}, ‘stay_duration’: ‘unknown’, ‘preferred_month’: [‘unknown’], ‘travelling_with_pets’: ‘yes’, ‘age’: [’30’, ’31’, ’32’, ’33’, ’34’, ’35’, ’36’, ’37’, ’38’, ’39’, ’40’], ‘x-amz-bedrock-kb-chunk-id’: ‘1%3A0%3Aj52lapMBuHB13c7-hl-4’, ‘desired_destination’: ‘Paris, France’}}
Option_0	TRUE	2	Paris, France	If you are looking for something a little more active, I would suggest visiting the Bois de Boulogne. This large park is located on the western edge of Paris and is a great place to go for a walk or a bike ride with your pet. The park has several lakes and ponds, as well as several gardens and playgrounds…	{‘location’: {‘s3Location’: {‘uri’: ‘s3://{YOUR_S3_BUCKET}/travel_reviews_titan/Paris_5.txt‘}, ‘type’: ‘S3’}, ‘score’: 0.45283788, ‘source_metadata’: {‘x-amz-bedrock-kb-source-uri’: ‘s3://{YOUR_S3_BUCKET}/travel_reviews_titan/Paris_5.txt‘, ‘travelling_with_children’: ‘no’, ‘activities_interest’: [‘strolling’, ‘picnic’, ‘walk or bike ride’, ‘cafes and restaurants’, ‘art galleries and shops’], ‘companion’: ‘pet’, ‘x-amz-bedrock-kb-data-source-id’: ‘{YOUR_KNOWLEDGE_BASE_ID}, ‘stay_duration’: ‘unknown’, ‘preferred_month’: [‘unknown’], ‘travelling_with_pets’: ‘yes’, ‘age’: [’40’, ’41’, ’42’, ’43’, ’44’, ’45’, ’46’, ’47’, ’48’, ’49’, ’50’], ‘x-amz-bedrock-kb-chunk-id’: ‘1%3A0%3AmtKlapMBdxcT3sYpSK_N’, ‘desired_destination’: ‘Paris, France’}}
Option_0	FALSE	2	Paris, France	{ “metadataAttributes”: { “age”: [ “30” ], “desired_destination”: “London, United Kingdom”, “stay_duration”: “unknown”, “preferred_month”: [ “unknown” ], “activities_interest”: [ “strolling”, “sightseeing”, “boating”, “eating out” ], “companion”: “pets”, “travelling_with_children”: “no”, “travelling_with_pets”: “yes” } }	{‘location’: {‘s3Location’: {‘uri’: ‘s3://{YOUR_S3_BUCKET}/travel_reviews_titan/London_2.txt.metadata (1).json’}, ‘type’: ‘S3’}, ‘score’: 0.49567315, ‘source_metadata’: {‘x-amz-bedrock-kb-source-uri’: ‘s3://{YOUR_S3_BUCKET}/travel_reviews_titan/London_2.txt.metadata (1).json’, ‘x-amz-bedrock-kb-chunk-id’: ‘1%3A0%3A5tKlapMBdxcT3sYpYq_r’, ‘x-amz-bedrock-kb-data-source-id’: {YOUR_KNOWLEDGE_BASE_ID}}}
Option_0	FALSE	2	Paris, France	As a 35-year-old traveling with my two dogs, I found Paris to be a pet-friendly city with plenty of attractions and activities for pet owners. Here are some of my top recommendations for traveling with pets in Paris: The Jardin des Tuileries is a beautiful park located between the Louvre Museum and the Place de la Concorde…	{‘location’: {‘s3Location’: {‘uri’: ‘s3://{YOUR_S3_BUCKET}/travel_reviews_titan/Paris_9.txt‘}, ‘type’: ‘S3’}, ‘score’: 0.4741059, ‘source_metadata’: {‘x-amz-bedrock-kb-source-uri’: ‘s3://{YOUR_S3_BUCKET}/travel_reviews_titan/Paris_9.txt‘, ‘travelling_with_children’: ‘no’, ‘activities_interest’: [‘parks’, ‘museums’, ‘river cruises’, ‘neighborhood exploration’], ‘companion’: ‘pets’, ‘x-amz-bedrock-kb-data-source-id’: {YOUR_KNOWLEDGE_BASE_ID}, ‘stay_duration’: ‘unknown’, ‘preferred_month’: [‘unknown’], ‘travelling_with_pets’: ‘yes’, ‘age’: [’30’, ’31’, ’32’, ’33’, ’34’, ’35’, ’36’, ’37’, ’38’, ’39’, ’40’], ‘x-amz-bedrock-kb-chunk-id’: ‘1%3A0%3Aj52lapMBuHB13c7-hl-4’, ‘desired_destination’: ‘Paris, France’}}
Option_0	FALSE	2	Paris, France	If you are looking for something a little more active, I would suggest visiting the Bois de Boulogne. This large park is located on the western edge of Paris and is a great place to go for a walk or a bike ride with your pet. The park has several lakes and ponds, as well as several gardens and playgrounds…	{‘location’: {‘s3Location’: {‘uri’: ‘s3://{YOUR_S3_BUCKET}/travel_reviews_titan/Paris_5.txt‘}, ‘type’: ‘S3’}, ‘score’: 0.45283788, ‘source_metadata’: {‘x-amz-bedrock-kb-source-uri’: ‘s3://{YOUR_S3_BUCKET}/travel_reviews_titan/Paris_5.txt‘, ‘travelling_with_children’: ‘no’, ‘activities_interest’: [‘strolling’, ‘picnic’, ‘walk or bike ride’, ‘cafes and restaurants’, ‘art galleries and shops’], ‘companion’: ‘pet’, ‘x-amz-bedrock-kb-data-source-id’: {YOUR_KNOWLEDGE_BASE_ID}, ‘stay_duration’: ‘unknown’, ‘preferred_month’: [‘unknown’], ‘travelling_with_pets’: ‘yes’, ‘age’: [’40’, ’41’, ’42’, ’43’, ’44’, ’45’, ’46’, ’47’, ’48’, ’49’, ’50’], ‘x-amz-bedrock-kb-chunk-id’: ‘1%3A0%3AmtKlapMBdxcT3sYpSK_N’, ‘desired_destination’: ‘Paris, France’}}

Clean up

To avoid incurring additional charges, be sure to delete your knowledge base, OSS/vector store and the underlying S3 bucket.

Conclusion

Enabling dynamic filtering through Knowledge Base’s metadata filtering enhances document retrieval in RAG systems by tailoring outputs to user-specific needs, significantly improving the relevance and accuracy of LLM-generated responses. In the travel website example, filters make sure that retrieved documents closely matched user preferences.

This approach can be applied to other use cases, such as customer support, personalized recommendations, and content curation, where context-sensitive information retrieval is essential. Properly configured filters are crucial for maintaining accuracy across different applications, making this feature a powerful tool for refining LLM outputs in diverse scenarios.

Be sure to take advantage of this powerful and flexible solution in your application. For more information on metadata in Amazon Bedrock Knowledge Bases, see Amazon Bedrock Knowledge Bases now supports metadata filtering to improve retrieval accuracy. Also, Amazon Bedrock Knowledge Bases now provides autogenerated query filters.

Security Best Practices

For AWS IAM Policies:

Apply least-privilege permissions by being explicit with IAM actions and listing only required permissions rather than using wildcards
Use temporary credentials with IAM roles for workloads
Avoid using wildcards (*) in the Action element as this grants access to all actions for specific AWS services
Remove wildcards from the Resource element and explicitly list the specific resources that IAM entities should access
Review AWS managed policies carefully before using them and consider using customer managed policies if AWS managed policies grant more permissions than needed

For more detailed security best practices for AWS IAM, see Security best practices in IAM.

For Amazon S3:

Block Public Access unless explicitly required, make sure S3 buckets are not publicly accessible by using the S3 Block Public Access feature and implementing appropriate bucket policies
Enable encryption for data at rest (all S3 buckets have default encryption) and enforce encryption for data in transit using HTTPS/TLS
Grant only the minimum permissions required using IAM policies, bucket policies, and disable ACLs (Access Control Lists) which are no longer recommended for most modern use cases
Enable server access logging, AWS CloudTrail, and use AWS security services like GuardDuty, Macie, and IAM Access Analyzer to monitor and detect potential security issues

For more detailed security best practices for Amazon S3, see Security best practices for Amazon S3.

For Amazon Bedrock:

Use IAM roles and policies to control access to Bedrock resources and APIs.
Implement VPC endpoints to access Bedrock securely from within your VPC.
Encrypt data at rest and in transit when working with Bedrock to protect sensitive information.
Monitor Bedrock usage and access patterns using AWS CloudTrail for auditing purposes.

For more information on security in Amazon Bedrock, see Security in Amazon Bedrock.

For Amazon SageMaker:

Use IAM roles to control access to SageMaker resources and limit permissions based on job functions.
Encrypt SageMaker notebooks, training jobs, and endpoints using AWS KMS keys for data protection.
Implement VPC configurations for SageMaker resources to restrict network access and enhance security.
Use SageMaker private endpoints to access APIs without traversing the public internet.

About the Authors

Haley Tien is a Deep Learning Architect at AWS Generative AI Innovation Center. She has a Master’s degree in Data Science and assists customers in building generative AI solutions on AWS to optimize their workloads and achieve desired outcomes.

Adam Weinberger is a Applied Scientist II at AWS Generative AI Innovation Center. He has 10 years of experience in data science and machine learning. He holds a Master’s of Information and Data Science from the University of California, Berkeley.

Dan Ford is a Applied Scientist II at AWS Generative AI Innovation Center, where he helps public sector customers build state-of-the-art GenAI solutions.

📣 Submit to Speak at PyTorch Conference + Save on Registration

Step into the Future of AI at PyTorch Conference 2025.

The Call for Proposals for PyTorch Conference 2025 is officially open!

Join us in San Francisco from October 22–23, 2025, to showcase your expertise and innovations with PyTorch—the industry-leading, open-source machine learning framework powering innovations from bare-metal infrastructure to sophisticated application and agent layers. This is your opportunity to share insights, breakthroughs, and case studies with a global audience of AI and Generative AI practitioners, researchers, and developers.

Submit your proposals and prepare to engage, learn, and network alongside some of the brightest minds in the AI/ML community. We’re seeking sessions, Birds of a Feather discussions, lightning talks, and poster sessions on the following topics:

Core PyTorch Framework
PyTorch on Accelerator Hardware
PyTorch Ecosystem and Tools
AI Applications and Use Cases
AI in Research and Academia
AI in Industry and Enterprise Applications
AI Infrastructure and Scalability
Ethical AI, Governance, and Regulation
Training, Fine-Tuning, and Alignment
Inference, Deployment, and Serving
Performance Measurement and Benchmarking
Data Engineering and Management for AI
Generative AI and Large Language Models (LLMs)
Model Optimization and Efficiency
Open Source Collaboration, Education and Community Building
Edge AI and On-Device
DL Compilers and Kernel Authoring

Learn more and submit your talk by Sunday, June 1, at 11:59 PDT!

SUBMIT TO SPEAK

Save up to USD$500 with Super Early Bird Pricing!

Reserve your pass by 11:59 PM PDT on March 21 and score Super Early Bird pricing for just USD$499. That’s a savings of up to USD$500!
Student or faculty? Learn more about our discounted academic rate.
Need help covering travel costs? We offer discretionary travel funding for those community members who would otherwise not be able to attend. Learn more.

Become a Sponsor at PyTorch Conference 2025!

Seize your opportunity to influence the future of Generative AI and Machine Learning by sponsoring PyTorch Conference 2025. PyTorch is at the forefront of innovation—empowering rapid experimentation, flexible model development, and efficient deployment into production environments with its powerful, versatile ecosystem of tools and thriving community of dedicated users.

As a sponsor, you’ll gain more than visibility; you’ll strategically position your organization at the heart of a vibrant, global AI/ML ecosystem. Connect directly with 3,000+ expert attendees, researchers, engineers, and decision-makers, and actively shape the conversations driving the next generation of AI advancements.

BECOME A SPONSOR

For more details on CFP submissions, registration, and sponsorship, visit the PyTorch Conference Website.

Customize DeepSeek-R1 distilled models using Amazon SageMaker HyperPod recipes – Part 1

Increasingly, organizations across industries are turning to generative AI foundation models (FMs) to enhance their applications. To achieve optimal performance for specific use cases, customers are adopting and adapting these FMs to their unique domain requirements. This need for customization has become even more pronounced with the emergence of new models, such as those released by DeepSeek.

However, customizing DeepSeek models effectively while managing computational resources remains a significant challenge. Tuning model architecture requires technical expertise, training and fine-tuning parameters, and managing distributed training infrastructure, among others. This often forces companies to choose between model performance and practical implementation constraints, creating a critical need for more accessible and streamlined model customization solutions.

In this two-part series, we discuss how you can reduce the DeepSeek model customization complexity by using the pre-built fine-tuning workflows (also called “recipes”) for both DeepSeek-R1 model and its distilled variations, released as part of Amazon SageMaker HyperPod recipes.

In this first post, we will build a solution architecture for fine-tuning DeepSeek-R1 distilled models and demonstrate the approach by providing a step-by-step example on customizing the DeepSeek-R1 Distill Qwen 7b model using recipes, achieving an average of 25% on all the Rouge scores, with a maximum of 49% on Rouge 2 score with both SageMaker HyperPod and SageMaker training jobs. The second part of the series will focus on fine-tuning the DeepSeek-R1 671b model itself.

At the time of this writing, the DeepSeek-R1 model and its distilled variations for Llama and Qwen were the latest released recipe. Check out sagemaker-hyperpod-recipes on GitHub for the latest released recipes, including support for fine-tuning the DeepSeek-R1 671b parameter model.

Amazon SageMaker HyperPod recipes

At re:Invent 2024, we announced the general availability of Amazon SageMaker HyperPod recipes. SageMaker HyperPod recipes help data scientists and developers of all skill sets to get started training and fine-tuning popular publicly available generative AI models in minutes with state-of-the-art training performance. These recipes include a training stack validated by Amazon Web Services (AWS), which removes the tedious work of experimenting with different model configurations, minimizing the time it takes for iterative evaluation and testing. They automate several critical steps, such as loading training datasets, applying distributed training techniques, automating checkpoints for faster recovery from faults, and managing the end-to-end training loop.

Recipes, paired with the resilient infrastructure of AWS, (Amazon SageMaker HyperPod and Amazon SageMaker Model Training) provide a resilient training environment for fine-tuning FMs such as DeepSeek-R1 with out-of-the-box customization.

To help customers quickly use DeepSeek’s powerful and cost-efficient models to accelerate generative AI innovation, we released new recipes to fine-tune six DeepSeek models, including DeepSeek-R1 distilled Llama and Qwen models using supervised fine-tuning (SFT), Quantized Low-Rank Adaptation (QLoRA), Low-Rank Adaptation (LoRA) techniques. In this post, we introduce these new recipes and walk you through a solution to fine-tune a DeepSeek Qwen 7b model for an advanced medical reasoning use case.

Solution overview

At its core, as depicted in the following diagram, the recipe architecture implements a hierarchical workflow that begins with a recipe specification that covers a comprehensive configuration defining the training parameters, model architecture, and distributed training strategies. These recipes are processed through the HyperPod recipe launcher, which serves as the orchestration layer responsible for launching a job on the corresponding architecture. The launcher interfaces with underlying cluster management systems such as SageMaker HyperPod (Slurm or Kubernetes) or training jobs, which handle resource allocation and scheduling. It’s a familiar NeMo-style launcher with which you can choose a recipe and run it on your infrastructure of choice (SageMaker HyperPod or training).

For example, after choosing your recipe, you can pre-train or fine-tune a model by running python3 main.py recipes=recipe-name. Alternatively, you can use a launcher script, which is a bash script that is preconfigured to run the chosen training or fine-tuning job on your cluster. You can check out main.py (NeMo style launcher) and launcher scripts for DeepSeek on the GitHub repository hosting SageMaker HyperPod recipes.

A key component of this architecture is the HyperPod training adapter for NeMo, which is built on the NVIDIA NeMo framework and Neuronx Distributed training package, which loads data, creates models, and facilitates efficient data parallelism, model parallelism, and hybrid parallelism strategies, which enables optimal utilization of computational resources across the distributed infrastructure. The architecture’s modular design allows for scalability and flexibility, making it particularly effective for training LLMs that require distributed computing capabilities.

You can run these recipes using SageMaker HyperPod or as SageMaker training jobs. For organizations that require granular control over training infrastructure and extensive customization options, SageMaker HyperPod is the ideal choice. SageMaker training jobs, on the other hand, is tailored for organizations that want a fully managed experience for their training workflows. To learn more details about these service features, refer to Generative AI foundation model training on Amazon SageMaker.

In the next sections, we go over the solution architecture for these services before presenting a step-by-step implementation example for each.

SageMaker HyperPod

To submit jobs using SageMaker HyperPod, you can use the HyperPod recipes launcher, which provides an straightforward mechanism to run recipes on both Slurm and Kubernetes. After you choose your orchestrator, you can choose your recipe’s launcher and have it run on your HyperPod cluster. The launcher will interface with your cluster with Slurm or Kubernetes native constructs. For this post, we use the HyperPod recipes launcher mechanism to run the training on a Slurm cluster. The following image shows the solution architecture for SageMaker HyperPod.

SageMaker training jobs

The workflow for SageMaker training jobs begins with an API request that interfaces with the SageMaker control plane, which manages the orchestration of training resources. The system uses the training jobs launcher to efficiently run workloads on a managed cluster.

The architecture uses Amazon Elastic Container Registry (Amazon ECR) for container image management. Training jobs are executed across a distributed cluster, with seamless integration to multiple storage solutions, including Amazon Simple Storage Service (Amazon S3), Amazon Elastic File Storage (Amazon EFS), and Amazon FSx for Lustre. All of this runs under the SageMaker managed environment, providing optimal resource utilization and security.

This design simplifies the complexity of distributed training while maintaining the flexibility needed for diverse machine learning (ML) workloads, making it an ideal solution for enterprise AI development. The following image shows the solution architecture for SageMaker training jobs.

Solution walkthrough

For this solution, consider a use case for a healthcare industry startup that aims to create an accurate, medically verified chat assistant application that bridges complex medical information with patient-friendly explanations. By fine-tuning DeepSeek-R1 Distill Qwen 7b using the FreedomIntelligence/medical-o1-reasoning-SFT dataset, you can use its medical reasoning capabilities to produce content that maintains clinical accuracy.

Prerequisites

You need to complete the following prerequisites before you can run the DeepSeek-R1 Distill Qwen 7B model fine-tuning notebook.

Make the following quota increase requests for SageMaker. You need to request a minimum of one p4d.24xlarge instance (with 8 x NVIDIA A100 GPUs) ranging to a maximum of two p4d.24xlarge instances (depending on time-to-train and cost-to-train trade-offs for your use case).

On the Service Quotas console, request the following SageMaker quotas:

- P4 instances (p4d.24xlarge) for training job usage: 1–2
- P4 instances (p4d.24xlarge) for HyperPod clusters (“ml.p4d.24xlarge for cluster usage“): 1-2

If you choose to use HyperPod clusters to run your training, set up a HyperPod Slurm cluster following the documentation at Tutuorial for getting started with SageMaker HyperPod. Alternatively, you can use the AWS CloudFormation template provided in the AWS Workshop Studio at Amazon SageMaker HyperPod Own Account and follow the instructions to set up a cluster and a development environment to access and submit jobs to the cluster.
(Optional) If you choose to use SageMaker training jobs, you can create an Amazon SageMaker Studio domain (refer to Use quick setup for Amazon SageMaker AI) to access Jupyter notebooks with the preceding role. (You can use JupyterLab in your local setup, too.)

- Create an AWS Identity and Access Management (IAM) role with managed policies AmazonSageMakerFullAccess and AmazonS3FullAccess to give required access to SageMaker to run the examples.

Clone the GitHub repository with the assets for this deployment. This repository consists of a notebook that references training assets:

git clone https://github.com/aws-samples/sagemaker-distributed-training-workshop.git 
cd 18_sagemaker_training_recipes/ft_deepseek_qwen_lora

Next, we run the model_trainer_deepseek_r1_recipe_lora.ipynb notebook to fine-tune the DeepSeek-R1 model using QLoRA on SageMaker.

Prepare the dataset

To prepare the dataset, you need to load the FreedomIntelligence/medical-o1-reasoning-SFT dataset, tokenize and chunk the dataset, and configure the data channels for SageMaker training on Amazon S3. Complete the following steps:

Format the dataset by applying the prompt format for DeepSeek-R1 Distill Qwen 7B:

def generate_prompt(data_point):
    full_prompt = f"""
    Below is an instruction that describes a task, paired with an input that provides further context.
    Write a response that appropriately completes the request.
    Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

    ### Instruction:
    You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
    Please answer the following medical question.

    ### Question:
    {data_point["Question"]}

    ### Response:
    {data_point["Complex_CoT"]}

    """
    return {"prompt": full_prompt.strip()}

Load the FreedomIntelligence/medical-o1-reasoning-SFT dataset and split it into training and validation datasets:

# Load dataset from the hub
train_set = load_dataset(dataset_name, 'en', split="train[5%:]")
test_set = load_dataset(dataset_name, 'en', split="train[:5%]")

...

train_dataset = train_set.map(
    generate_and_tokenize_prompt,
    remove_columns=columns_to_remove,
    batched=False
)

test_dataset = test_set.map(
    generate_and_tokenize_prompt,
    remove_columns=columns_to_remove,
    batched=False
)

Load the DeepSeek-R1 Distill Qwen 7B tokenizer from the Hugging Face Transformers library and generate tokens for the train and validation datasets:

model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"
max_seq_length=1024

# Initialize a tokenizer by loading a pre-trained tokenizer configuration, using the fast tokenizer implementation if available.
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)

...

train_dataset = train_dataset.map(tokenize, remove_columns=["prompt"])
test_dataset = test_dataset.map(tokenize, remove_columns=["prompt"])

Prepare the training and validation datasets for SageMaker training by saving them as arrow files, which is required by SageMaker HyperPod recipes, and constructing the S3 paths where these files will be uploaded:

train_dataset_s3_path = f"s3://{bucket_name}/{input_path}/train"
val_dataset_s3_path = f"s3://{bucket_name}/{input_path}/test"

train_dataset.save_to_disk(train_dataset_s3_path)
val_dataset.save_to_disk(val_dataset_s3_path)

The dataset above will be used in the examples for both SageMaker training jobs and SageMaker HyerPod.

Option A: Fine-tune using SageMaker training jobs

To fine-tune the model using SageMaker training jobs with recipes, this example uses the ModelTrainer class.

The ModelTrainer class is a newer and more intuitive approach to model training that significantly enhances user experience and supports distributed training, Build Your Own Container (BYOC), and recipes. For additional information about ModelTrainer, you can refer to Accelerate your ML lifecycle using the new and improved Amazon SageMaker Python SDK – Part 1: ModelTrainer

To set up the fine-tuning workload, complete the following steps:

Select the instance type, the container image for the training job, and define the checkpoint path where the model will be stored:

instance_type = "ml.p4d.24xlarge"

image_uri = (
    f"658645717510.dkr.ecr.{sagemaker_session.boto_session.region_name}.amazonaws.com/smdistributed-modelparallel:2.4.1-gpu-py311-cu121"
)

checkpoint_s3_path = f"s3://{bucket_name}/deepseek-r1-distilled-qwen-7b-recipe-lora/checkpoints"

Create the ModelTrainer function to encapsulate the training setup from a selected recipe:

from sagemaker.modules.configs import CheckpointConfig, Compute, InputData, SourceCode, StoppingCondition
from sagemaker.modules.distributed import Torchrun
from sagemaker.modules.train import ModelTrainer

instance_count = 1

# Working override for custom dataset
recipe_overrides = {
    ...
    "trainer": {
        "num_nodes": instance_count,
        ...
    },
    ...
    "use_smp_model": False, # Required for PEFT
    "model": {
        "hf_model_name_or_path": model_id,
        "data": {
            "train_dir": "/opt/ml/input/data/train",
            "val_dir": "/opt/ml/input/data/test",
        },
    },
}

# Define the compute
compute_configs = Compute(
    instance_type=instance_type,
    instance_count=instance_count,
    keep_alive_period_in_seconds=0
)

model_trainer = ModelTrainer.from_recipe(
    training_image=image_uri,
    training_recipe="fine-tuning/deepseek/hf_deepseek_r1_distilled_qwen_7b_seq8k_gpu_lora",
    recipe_overrides=recipe_overrides,
    requirements="./requirements.txt",
    compute=compute_configs,
    ...
    checkpoint_config=CheckpointConfig(
        s3_uri=f"{checkpoint_s3_path}/{job_prefix}"
    ),
)

You can point to the specific recipe with the training_recipe argument and override the recipe arguments by providing a dictionary as argument of recipe_overrides. In the previous example:

num_nodes: Indicates the number of instances that will be used for the fine-tuning execution
checkpoint_dir: Location in the container where the job will save model checkpoints

The ModelTrainer class simplifies the experience by encapsulating code and training setup directly from the selected recipe. In this example:

training_recipe: hf_deepseek_r1_distilled_qwen_7b_seq8k_gpu_lora is defining fine-tuning setup for the LoRA technique

Set up the input channels for ModelTrainer by creating an InputData objects from the provided S3 bucket paths for the training and test and validation datasets
Submit the training job:

# starting the train job with our uploaded datasets as input
model_trainer.train(input_data_config=data, wait=True)

Option B: Fine-tune using SageMaker HyperPod with Slurm

To fine-tune the model using HyperPod, make sure your cluster is up and ready by following the prerequisites. To access the login or head node of the HyperPod Slurm cluster from your development environment, follow the login instructions at Log in to your cluster in the Amazon SageMaker HyperPod workshop.

Alternatively, you can also use AWS Systems Manager and run a command like the following to start the session. You can find the cluster ID, instance group name, and instance ID on the Amazon SageMaker console.

aws ssm start-session --target sagemaker-cluster:[cluster-id]_[instance-group-name]-[instance-id] --region region_name

In the cluster’s login or head node, run the following commands to set up the environment. Run sudo su - ubuntu to run the remaining commands as the root user unless you have a specific user ID to access the cluster and your POSIX user is created through a lifecycle script on the cluster. Refer to the multi-user setup for more details.

# create a virtual environment 
python3 -m venv ${PWD}/venv
source venv/bin/activate

# clone the recipes repository and set up the environment
git clone --recursive https://github.com/aws/sagemaker-hyperpod-recipes.git
cd sagemaker-hyperpod-recipes
pip3 install -r requirements.txt

Create a squash file using Enroot to run the job on the cluster. Enroot runtime offers GPU acceleration, rootless container support, and seamless integration with high performance computing (HPC) environments, making it ideal for running our workflows securely.

# create a squash file using Enroot
REGION=<region>
IMAGE="658645717510.dkr.ecr.${REGION}.amazonaws.com/smdistributed-modelparallel:2.4.1-gpu-py311-cu121"
aws ecr get-login-password --region "${REGION}" | docker login --username AWS --password-stdin 658645717510.dkr.ecr.${REGION}.amazonaws.com
enroot import -o $PWD/smdistributed-modelparallel.sqsh dockerd://${IMAGE}

After you’ve created the squash file, update the recipes_collection/config.yaml file with the absolute path to the squash file (created in the preceding step), and update the instance_type if needed. The final config file should have the following parameters:

...

cluster_type: slurm 
...

instance_type: p4d.24xlarge
...

container: /fsx/<path-to-smdistributed-modelparallel>.sqsh
...

Download the prepared dataset that you uploaded to S3 into the FSx for Lustre volume attached to the cluster. Run the following commands to download the files from Amazon S3:

aws s3 cp s3://{bucket_name}/{input_path}/train /fsx/ubuntu/deepseek/data/train --recursive
aws s3 cp s3://{bucket_name}/{input_path}/test /fsx/ubuntu/deepseek/data/test --recursive

Update the launcher script for fine-tuning the DeepSeek-R1 Distill Qwen 7B model. The launcher scripts serve as convenient wrappers for executing the training script main.py file), which streamlines the process of fine-tuning and parameter adjustment. For fine-tuning the DeepSeek-R1 Qwen 7B model, you can find the specific script at:

launcher_scripts/deepseek/run_hf_deepseek_r1_qwen_7b_seq16k_gpu_fine_tuning.sh

Before running the script, you need to modify the location of the training and validation files and update the HuggingFace model ID and optionally the access token for private models and datasets. The script should look like the following (update recipes.trainer.num_nodes if you’re using a multi-node cluster):

SAGEMAKER_TRAINING_LAUNCHER_DIR=${SAGEMAKER_TRAINING_LAUNCHER_DIR:-"$(pwd)"}

HF_MODEL_NAME_OR_PATH="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B" # HuggingFace pretrained model name or path
HF_ACCESS_TOKEN="hf_xxxx" # Optional HuggingFace access token

TRAIN_DIR="/fsx/ubuntu/deepseek/data/train" # Location of training dataset 
VAL_DIR="/fsx/ubuntu/deepseek/data/test" # Location of validation dataset

EXP_DIR="/fsx/ubuntu/deepseek/results" # Location to save experiment info including logging, checkpoints, etc

HYDRA_FULL_ERROR=1 python3 "${SAGEMAKER_TRAINING_LAUNCHER_DIR}/main.py" 
    recipes=fine-tuning/deepseek/hf_deepseek_r1_distilled_qwen_7b_seq16k_gpu_fine_tuning 
    base_results_dir="${SAGEMAKER_TRAINING_LAUNCHER_DIR}/results" 
    recipes.run.name="hf-deepseek-r1-distilled-qwen-7b-fine-tuning" 
    recipes.exp_manager.exp_dir="$EXP_DIR" 
    recipes.trainer.num_nodes=1 
    recipes.model.data.train_dir="$TRAIN_DIR" 
    recipes.model.data.val_dir="$VAL_DIR" 
    recipes.model.hf_model_name_or_path="$HF_MODEL_NAME_OR_PATH" 
    recipes.model.hf_access_token="$HF_ACCESS_TOKEN"

You can view the recipe for this fine-tuning task under, overriding any additional parameters as needed:

recipes_collection/recipes/fine-tuning/deepseek/hf_deepseek_r1_distilled_qwen_7b_seq16k_gpu_fine_tuning.yaml

Submit the job by running the launcher script:

bash launcher_scripts/deepseek/run_hf_deepseek_r1_qwen_7b_seq16k_gpu_fine_tuning.sh

You can monitor the job using Slurm commands such as squeue and scontrol show to view the status of the job and the corresponding logs. After the job is complete, the trained model will also be available in the results folder, as shown in the following code:

cd results
 ls -R
.:
checkpoints  experiment

./checkpoints:
full

./checkpoints/full:
steps_50

./checkpoints/full/steps_50:
config.json  pytorch_model.bin

./experiment:
...

Upload the fine-tuned model checkpoint to Amazon S3 for evaluating the model using the validation data:

aws s3 cp /fsx/<path_to_checkpoint> s3://{bucket_name}/{model_prefix}/qwen7b --recursive

Evaluate the fine-tuned model

To objectively evaluate your fine-tuned model, you can run an evaluation job on the validation portion of the dataset.

You can run a SageMaker training job and use ROUGE metrics (ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-L-Sum), which measure the similarity between machine-generated text and human-written reference text. The SageMaker training job will compute ROUGE metrics for both the base DeepSeek-R1 Distill Qwen 7B model and the fine-tuned one. You can access the code sample for ROUGE evaluation in the sagemaker-distributed-training-workshop on GitHub. Please refer this notebook for details.

Complete the following steps:

Define the S3 path where the fine-tuned checkpoints are stored, the instance_type, and the image uri to use in the training job:

trained_model = <S3_PATH>
instance_type = "ml.p4d.24xlarge"

image_uri = sagemaker.image_uris.retrieve(
    framework="pytorch",
    region=sagemaker_session.boto_session.region_name,
    version="2.4",
    instance_type=instance_type,
    image_scope="training"
)
#763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.4-gpu-py311

Create the ModelTrainer function to encapsulate the evaluation script and define the input data:

from sagemaker.modules.configs import Compute, InputData, OutputDataConfig, SourceCode, StoppingCondition
from sagemaker.modules.distributed import Torchrun
from sagemaker.modules.train import ModelTrainer

# Define the script to be run
source_code = SourceCode(
    source_dir="./scripts",
    requirements="requirements.txt",
    entry_script="evaluate_recipe.py",
)

# Define the compute
...

# Define the ModelTrainer
model_trainer = ModelTrainer(
    training_image=image_uri,
    source_code=source_code,
    compute=compute_configs,
    ...
    hyperparameters={
        "model_id": model_id,  # Hugging Face model id
        "dataset_name": dataset_name
    }
)

# Pass the input data
train_input = InputData(
   channel_name="adapterdir",
   data_source=trained_model,
)

test_input = InputData(
   channel_name="testdata",
   data_source=test_dataset_s3_path, # S3 path where training data is stored
)

# Check input channels configured
data = [train_input, test_input]

Submit the training job:

# starting the train job with our uploaded datasets as input
model_trainer.train(input_data_config=data, wait=True)

The following table shows the task output for the fine-tuned model and the base model.

Model	Rouge 1	Rouge 2	Rouge L	Rouge L Sum
Base	0.36362	0.08739	0.16345	0.3204
Fine-tuned	0.44232	0.13022	0.17769	0.38989
% Difference	21.64207	49.01703	8.7121	21.68871

Our fine-tuned model demonstrates remarkable efficiency, achieving about 22% overall improvement on the reasoning task after only one training epoch. The most significant gain appears in Rouge 2 scores—which measure bigram overlap—with about 49% increase, indicating better alignment between generated and reference summaries.

Notably, preliminary experiments suggest these results could be further enhanced by extending the training duration. Increasing the number of epochs shows promising potential for additional performance gains while maintaining computational efficiency.

Clean up

To clean up your resources to avoid incurring any more charges, follow these steps:

Delete any unused SageMaker Studio resources
(Optional) Delete the SageMaker Studio domain
Verify that your training job isn’t running anymore. To do so, on your SageMaker console, choose Training and check Training jobs.
If you created a HyperPod cluster, delete the cluster to stop incurring costs. If you created the networking stack from the HyperPod workshop, delete the stack as well to clean up the virtual private cloud (VPC) resources and the FSx for Lustre volume.

Conclusion

In the first post of this two-part DeepSeek-R1 series, we discussed how SageMaker HyperPod recipes provide a powerful yet accessible solution for organizations to scale their AI model training capabilities with large language models (LLMs) including DeepSeek. The architecture streamlines complex distributed training workflows through its intuitive recipe-based approach, reducing setup time from weeks to minutes.

We recommend starting your LLM customization journey by exploring our sample recipes in the Amazon SageMaker HyperPod documentation. The AWS AI/ML community offers extensive resources, including workshops and technical guidance, to support your implementation journey.

To begin using the SageMaker HyperPod recipes, visit the sagemaker-hyperpod-recipes repo on GitHub for comprehensive documentation and example implementations. Our team continues to expand the recipe ecosystem based on customer feedback and emerging ML trends, making sure that you have the tools needed for successful AI model training.

In our second post, we discuss how these recipes could further be used to fine-tune DeepSeek-R1 671b model. Stay tuned!

About the Authors

Kanwaljit Khurmi is a Principal Worldwide Generative AI Solutions Architect at AWS. He collaborates with AWS product teams, engineering departments, and customers to provide guidance and technical assistance, helping them enhance the value of their hybrid machine learning solutions on AWS. Kanwaljit specializes in assisting customers with containerized applications and high-performance computing solutions.

Bruno Pistone is a Senior World Wide Generative AI/ML Specialist Solutions Architect at AWS based in Milan, Italy. He works with AWS product teams and large customers to help them fully understand their technical needs and design AI and Machine Learning solutions that take full advantage of the AWS cloud and Amazon Machine Learning stack. His expertise includes: End-to-end Machine Learning, model customization, and generative AI. He enjoys spending time with friends, exploring new places, and traveling to new destinations.

Arun Kumar Lokanatha is a Senior ML Solutions Architect with the Amazon SageMaker team. He specializes in large language model training workloads, helping customers build LLM workloads using SageMaker HyperPod, SageMaker training jobs, and SageMaker distributed training. Outside of work, he enjoys running, hiking, and cooking.

Durga Sury is a Senior Solutions Architect on the Amazon SageMaker team. Over the past 5 years, she has worked with multiple enterprise customers to set up a secure, scalable AI/ML platform built on SageMaker.

Aman Shanbhag is an Associate Specialist Solutions Architect on the ML Frameworks team at Amazon Web Services, where he helps customers and partners with deploying ML training and inference solutions at scale. Before joining AWS, Aman graduated from Rice University with degrees in computer science, mathematics, and entrepreneurship.

Anirudh Viswanathan is a Sr Product Manager, Technical – External Services with the SageMaker AI Training team. He holds a Masters in Robotics from Carnegie Mellon University, an MBA from the Wharton School of Business, and is named inventor on over 40 patents. He enjoys long-distance running, visiting art galleries, and Broadway shows.

The latest AI news we announced in February

Here are Google’s latest AI updates from February 2025Read More

Survey Shows How AI Is Reshaping Healthcare and Life Sciences, From Lab to Bedside

From research and discovery to patient care and administrative tasks, AI is showing transformative potential across nearly every part of healthcare and life sciences.

For example, generative AI can be used to help automate repetitive, time-consuming tasks such as summarizing and creating documents and extracting and analyzing data from reports. It can also aid in drug discovery by finding new protein structures and offer assistance to patients through chatbots and AI assistants, easing the burden on clinical and administrative staff.

This wide range of applications was among key insights of NVIDIA’s inaugural “State of AI in Healthcare and Life Sciences” survey.

The survey — which polled more than 600 professionals across the globe from fields spanning digital healthcare, medical tools and technologies, pharmaceutical and biotech, and payers and practitioners — revealed robust AI adoption in the industry, with about two-thirds of respondents saying their companies are actively using the technology.

AI is also having a tangible impact on the industry’s bottom line, with 81% of respondents saying AI has helped increase revenue, and 45% percent realizing these benefits in less than a year after implementation.

Here are some of the key insights and use cases from the survey:

83% of overall respondents agreed with the statement that “AI will revolutionize healthcare and life sciences in the next three to five years”
73% said AI is helping to reduce operational costs
58% cited data analytics as the top AI workload, with generative AI second at 54%, and large language models third at 53%
59% of respondents from pharmaceutical and biotech companies cited drug discovery and development among their top AI use cases

Business Impact of AI in Healthcare and Life Sciences

The healthcare and life sciences industry is seeing how AI can help increase annual revenue and reduce operational costs. Forty-one percent of respondents indicated that the acceleration of research and development has had a positive impact. Thirty-six percent of respondents said AI has helped create a competitive advantage. And 35% have said it’s helped reduce project cycles, deliver better clinical or research insights, and enhance precision and accuracy, respectively.

Given the positive results across a broad range of AI use cases, it comes as no surprise that 78% of respondents said they intend to increase their budget for AI infrastructure this year. In addition, more than a third of respondents noted their investments in AI will increase by more than 10%.

The survey also revealed the top three spending priorities: identifying additional AI use cases (47%), optimizing workflow and production cycles (34%) and hiring more AI experts (26%).

AI Applied Across Healthcare

Each industry segment in the survey had differing priorities in AI implementation. For instance, in the payers and providers industry segment, which includes health insurance companies, hospitals, clinical services and home healthcare, 48% of respondents said their top AI use case was administrative tasks and workflow optimization.

For the medical tools and technologies field, 71% of respondents said their top AI use case was medical imaging and diagnostics, such as using AI to analyze MRI or CAT scans. And for digital healthcare, 54% of respondents said their top use case was clinical decision support, while 54% from the pharmaceutical and biotech fields prioritized drug discovery and development.

AI use cases expected to have the most significant impact in healthcare and life sciences in the next five years include advanced medical imaging and diagnostics (51%), virtual healthcare assistants (34%) and precision medicine — treatment tailored to individual patient characteristics — (29%).

A Growing Dose of Generative AI

Overall, 54% of survey respondents said they’re using generative AI. Of these users, 63% said they’re actively using it, with another 36% assessing the technology through pilots or trials.

Digital healthcare was the leader in generative AI use, according to 71% of respondents from the field. Second was pharmaceutical and biotech at 69%, then medical technologies at 60%, and payers and providers at 44%.

Among all generative AI use cases, coding and document summarization — specific to clinical notes — was the top use case, at 55%. Medical chatbots and AI agents were second, at 53%, and literature analysis was third, at 45%. One notable exception was within the pharmaceutical biotech industry segment, in which respondents stated that drug discovery was the top generative AI use case, at 62%.

Download the “State of AI in Healthcare and Life Sciences: 2025 Trends” report for in-depth results and insights.

Explore NVIDIA’s AI technologies and platforms for healthcare, and sign up for NVIDIA’s healthcare newsletter to stay up to date.

Scaling Well-Architected reviews using a generative AI-powered solution

Solution overview

Deploy the solution

Test the solution

Improving assessment quality

Clean up

Conclusion

About the Authors

Solution overview

Prerequisites

Create a dynamic filter

Option 1: Create a retriever each time

Option 2: Access the underlying Boto3 API

Results

Clean up

Conclusion

Security Best Practices

About the Authors

Amazon SageMaker HyperPod recipes

Solution overview

SageMaker HyperPod

SageMaker training jobs

Solution walkthrough

Prerequisites

Prepare the dataset

Option A: Fine-tune using SageMaker training jobs

Option B: Fine-tune using SageMaker HyperPod with Slurm

Evaluate the fine-tuned model

Clean up

Conclusion

About the Authors

Business Impact of AI in Healthcare and Life Sciences

AI Applied Across Healthcare

A Growing Dose of Generative AI

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.