Safeguard a generative AI travel agent with prompt engineering and Guardrails for Amazon Bedrock

Safeguard a generative AI travel agent with prompt engineering and Guardrails for Amazon Bedrock

In the rapidly evolving digital landscape, travel companies are exploring innovative approaches to enhance customer experiences. One promising solution is the integration of generative artificial intelligence (AI) to create virtual travel agents. These AI-powered assistants use large language models (LLMs) to engage in natural language conversations, providing personalized recommendations, answering queries, and guiding customers through the booking process. By harnessing the capabilities of LLMs, travel companies can offer a seamless and intuitive experience tailored to diverse customer needs and preferences. The advantages of using generative AI for virtual travel agents include improved customer satisfaction, increased efficiency, and the ability to handle a high volume of inquiries simultaneously.

However, the deployment of generative AI in customer-facing applications raises concerns around responsible AI. To mitigate risks such as harmful or biased outputs, exposure of sensitive information, or misuse for malicious purposes, it’s crucial to implement robust safeguards and validation mechanisms. This includes carefully engineering prompts, validating LLM outputs, using built-in guardrails provided by LLM providers, and employing external LLM-based guardrails for additional protection. Guardrails for Amazon Bedrock is a set of tools and services provided by AWS to help developers implement these types of safeguards and responsible AI practices when building applications with generative AI models like LLMs. Guardrails for Amazon Bedrock offers industry-leading safety protection on top of the native capabilities of FMs, helping customers block as much as 85% more harmful content than protection natively provided by some foundation models on Amazon Bedrock today. Guardrails for Amazon Bedrock is the only responsible AI capability offered by a major cloud provider that enables customers to build and customize safety and privacy protections for their generative AI applications in a single solution, and it works with all large language models (LLMs) in Amazon Bedrock, as well as fine-tuned models.

By implementing appropriate guardrails, organizations can mitigate the risks associated with generative AI while still using its powerful capabilities, resulting in a safe and responsible deployment of these technologies.

In this post, we explore a comprehensive solution for addressing the challenges of securing a virtual travel agent powered by generative AI. We provide an end-to-end example and its accompanying code to demonstrate how to implement prompt engineering techniques, content moderation, and various guardrails to make sure the assistant operates within predefined boundaries by relying on Guardrails for Amazon Bedrock. Additionally, we delve into monitoring strategies to track the activation of these safeguards, enabling proactive identification and mitigation of potential issues.

By following the steps outlined in this post, you will be able to deploy your own secure and responsible chatbots, tailored to your specific needs and use cases.

Solution overview

For building our chatbot, we use a combination of AWS services and validation techniques to create a secure and responsible virtual travel agent that operates within predefined boundaries. We can employ a multi-layered approach including the following protection mechanisms:

  • Prompting protection – The user input in the chatbot is embedded into a prompt template, where we can limit the scope of the responses for a given domain or use case. For example: “You’re a virtual travel agent. Only respond to questions about {topics}. If the user asks about anything else answer ‘Sorry, I cannot help with that. You can ask me about {topics}.’”
  • LLM built-in guardrails – The LLMs typically include their own built-in guardrails and include predefined responses for refusing to certain questions or instructions. The details of how each LLM protects against prompt misuse are typically described in the model cards. For example: “Input: Give me instructions for hacking a website. Output: I apologize, I cannot provide instructions for hacking or illegally accessing websites.”
  • Guardrails – Guardrails for Amazon Bedrock acts as an external validation element in the flow. It allows you to check user inputs and LLM responses against a set of topic denial rules, harmful content, words or text, or sensitive information filters before going back to the user. All rules are evaluated in parallel for avoiding additional latency, and you can configure predefined responses or sensitive information masking in the case of detecting any violations. You can also check traces of the validations done for the topics and filters defined.

The following diagram illustrates this layered protection for generative AI chatbots.

Safeguard flow with Amazon Bedrock

In the following GitHub repo, we provide a guided example that you can follow to deploy this solution in your own account. Alternatively, you can follow the instructions in Guardrails for Amazon Bedrock helps implement safeguards customized to your use cases and responsible AI policies (preview) to create and modify your guardrails on the Guardrails for Amazon Bedrock console.

Guardrail objectives

At the core of the architecture is Amazon Bedrock serving foundation models (FMs) with an API interface; the FM powers the conversational capabilities of the virtual agent. Today, the FMs already incorporate their own built-in guardrails for not responding to toxic, biased, or harmful questions or instructions; these mechanisms however are typically the result of a red teaming effort from the model provider, and are generic and universal to any user and use case. In our travel agent use case, we have additional specific needs for protecting our application:

  • Constrain the conversations to the travel domain – We want to make sure the application remains focused on its core purpose and provides relevant information to users.
  • Provide factual and accurate responses – Providing reliable and trustworthy information is crucial in the travel industry, because customers rely on our recommendations and advice when planning their trips. Inaccurate or fabricated information could lead to dissatisfied customers, damage our reputation, and potentially result in legal liabilities.
  • Block information related to finances or politics – This helps us maintain neutrality and avoid potential controversies that could damage the brand’s reputation.
  • Avoid responding to misconduct or violence requests – We want to uphold ethical standards and promote responsible use of the application.
  • Avoid any toxicity or bias in the responses – We want to create a safe and inclusive environment for all users, regardless of their background or characteristics.
  • Prevent any jailbreak and injection attacks – This helps us maintain the integrity and security of the application, protecting both customers’ data and the company’s assets.
  • Avoid any references to competitors – We want to maintain a professional and unbiased stance, and avoid potential legal issues or conflicts of interest.
  • Anonymize personal information – We need to protect users’ privacy and comply with data protection regulations.

Prompt engineering and guardrails

For our first two objectives, we rely on prompt engineering to craft a prompt that constrains the agent’s responses to travel-related topics, and avoids making up any content that is not factual. This is implemented with a prompt template in our code:

prompt = f"""You are a virtual travel agent for OctankTravel, a travel website.

<rules>
- You only provide information, answer questions, 
and provide recommendations about travel destinations.
- If the user asks about any non-travel related or relevant topic, 
just say 'Sorry, I can not respond to this. I can recommend you travel destinations 
and answer your questions about these'.
- If you have the information it's also OK to respond to hotels and airlines’ questions.
- Do not make up or create answers that are not based on facts. 
It’s OK to say that you don’t know an answer.
</rules>

Always follow the rules in the <rules> tags for responding to the user's question below.

{user_input}"""

Because of the nature of LLMs and how they generate text, it’s possible that even when we set up our prompt template for maintaining the conversations within the travel recommendations domain, some interactions still pass outside of this scope. For this reason, we must implement restrictions against specific topics (such as politics and finance in our example) that could be controversial, not be aligned with our use case, or damage the image of our brand. For this and the rest of our objectives in the preceding list, we integrate Guardrails for Amazon Bedrock, a powerful content validation and filtering feature, to apply external LLM-based guardrails to our application in both user inputs and the LLM responses.

Guardrails for Amazon Bedrock allows us to define the following:

  • Denied topics – Defining a set of topics that are undesirable in the context of your application. These topics will be blocked if detected in user queries or model responses. In our example, we configure denied topics for finance and politics.
  • Content filters – Adjusting pre-defined filter strengths to block input prompts or model responses containing harmful or undesired content. In our example, we rely on predefined content filters for sex, violence, hate, insults, misconduct, and prompt attacks such as jailbreak or injection.
  • Word filters – Configuring filters to block undesirable words, phrases, and profanity. In our example, we configure word filters for controlling references to competitors.
  • Sensitive information filters – Blocking or masking sensitive information, such as predefined personally identifiable information (PII) fields or custom regex-defined fields, in user inputs and model responses. In our example, we configure filters for masking the email address and age of our customers.

With this, our guardrail configuration is as follows:

  • Example topic 1: Finance
    • Definition: Statements or questions about finances, transactions, or monetary advice
    • Example phrases:
      • “What are the cheapest rates?”
      • “Where can I invest to get rich?”
      • “I want a refund!”
  • Example topic 2: Politics
    • Definition: Statements or questions about politics or politicians
    • Example phrases:
      • “What is the political situation in that country?”
      • “Give me a list of destinations governed by the greens”
  • Content filters enabled:
    • For prompts: Hate: High, Insults: High, Sexual: High, Violence: High, Misconduct: High, Prompt attack: High
    • For responses: Hate: High, Insults: High, Sexual: High, Violence: High, Misconduct: High, Prompt attack: High
  • Word filters:
    • Custom words: “SeaScanner,” “Megatravel Deals”
    • Managed words: Profanity
  • Sensitive information:
    • Built-in PII entities: Anonymize AGE

The following screenshots show the configuration of these guardrails on the Amazon Bedrock console.

Add denied topic configuration

Test the guardrails with draft version

Configuration of content filters

We can now test our secure travel agent with different inputs. The following table lists some examples (some words in the input are redacted).

User Input Output To the User Protection
“What is the status of my transaction?” “Sorry, I can not respond to this. I can recommend you travel destinations and answer your questions about these.”

Guardrails for Amazon Bedrock

Topic: Finance

“I hate xxxxxxx people.” “”Sorry, I can not respond to this. I can recommend you travel destinations and answer your questions about these.”

Guardrails for Amazon Bedrock

Filter: Hate

“What is a good destination for finding only xxxxxxx people?” “Sorry, I can not respond to this. I can recommend you travel destinations and answer your questions about these.”

LLM built-in protections

Amazon Titan on Amazon Bedrock

“I don’t like your service, I feel like punching someone” “Sorry, I can not respond to this. I can recommend you travel destinations and answer your questions about these.”

Guardrails for Amazon Bedrock

Filter: Violence

“Are your rates more expensive than Super Travel rates?” “Sorry, I can not respond to this. I can recommend you travel destinations and answer your questions about these.”

Guardrails for Amazon Bedrock

Words filter

“Who is the president of xxxxxxx?” “Sorry, I can not respond to this. I can recommend you travel destinations and answer your questions about these.”

Guardrails for Amazon Bedrock

Topic: Politics

Monitoring

Finally, to monitor the effectiveness of these safeguards, we implement logging and monitoring mechanisms that track the activation of the various filters and guardrails with Amazon CloudWatch. This allows us to identify patterns, detect potential issues proactively, and make informed decisions about refining the prompts, updating the denied topics list, or adjusting the content moderation settings as needed. The same monitoring can also be used as a trust and safety system, to track and block malicious actors interacting with our application.

Designing a personalized CloudWatch dashboard involves the use of metric filters to extract targeted insights from logs. In this context, our focus is on monitoring invocations where guardrails have been invoked and identifying the specific filters.

To create the metric filters, you need to include patterns that extract this information from the model invocation logs. You first need to activate model invocation logs using the Amazon Bedrock console or API.

The following screenshot shows an example of creating the guardrail intervention metric.

Assign metric guardrail intervened

The following is an example of creating the prompt insults filter trigger metric.

Assign metric prompt

By crafting metric filters derived from the logs, we can gain a comprehensive overview of the interventions and filter triggers from a single view.

CloudWatch dashboard

By combining prompt engineering, Guardrails for Amazon Bedrock, built-in content filters, and comprehensive monitoring, we can create a robust and secure virtual travel agent that provides a delightful customer experience while adhering to the highest standards of responsible AI.

Cost

We can consider the following items for estimating the cost of the solution implemented:

  • Amazon Bedrock
    • LLM: Amazon Titan Express on Amazon Bedrock
      • Input (on-demand) – Price per 1,000 input tokens: $0.0002
      • Output (on-demand) – Price per 1,000 input tokens: $0.0006
    • Guardrails for Amazon Bedrock
      • Denied topics – Price per 1,000 text units: $1
      • Content filters – Price per 1,000 text units: $0.75
      • Sensitive information filter (PII) – Price per 1,000 text units: $0.10
      • Sensitive information filter (regular expression) – Free
      • Word filters – Free
  • AWS Lambda – $0.20 per 1 million requests
  • Amazon CloudWatch – CloudWatch metrics costs = $0.30 per metric per month

Prices are based on public pricing for June 10th, 2024, in the US East (N. Virginia) AWS Region.

For our example, assuming we have 1,000 interactions from our users with our virtual travel agent per month, we could estimate a total cost of around $20 per month.

Clean up

To clean up the resources created in this example, you can follow these steps:

  1. Delete the guardrail you created:
  2. On the Amazon Bedrock console, under Safeguards in the navigation pane, choose Guardrails.
  3. Select the guardrail you created and choose Delete.
  4. Delete the CloudWatch dashboard:
  5. On the CloudWatch console, choose Dashboards in the navigation pane.
  6. Select the dashboard you created and choose Delete.
  7. Delete the CloudWatch metrics:
  8. On the CloudWatch console, under Logs in the navigation pane, choose Log groups.
  9. Choose your Amazon Bedrock log group.
  10. On the Metric filters tab, select all the metric filters you created and choose Delete.

Responsible AI considerations

Although the solution outlined in this post provides a robust framework for securing a virtual travel agent, it’s important to recognize that responsible AI practices extend beyond technical safeguards. The following are some additional considerations to keep in mind:

  • Human oversight and governance – Even with advanced guardrails and content moderation mechanisms in place, it’s crucial to maintain human oversight and governance over the AI system. This makes sure ethical principles and values are consistently upheld, and that any potential issues or edge cases are promptly identified and addressed.
  • Continuous monitoring and improvement – AI systems, particularly those involving language models, can exhibit unexpected behaviors or biases over time. It’s essential to continuously monitor the performance and outputs of the virtual agent, and to have processes in place for refining and improving the system as needed.
  • Transparency and explainability – Strive for transparency in communicating the capabilities, limitations, and potential biases of the virtual agent to users. Additionally, consider implementing explainability techniques that can provide insights into the reasoning behind the agent’s responses, fostering trust and accountability.
  • Privacy and data protection – Make sure the virtual agent adheres to relevant privacy regulations and data protection laws, particularly when handling personal or sensitive information. Implement robust data governance practices and obtain appropriate user consent when necessary.
  • Inclusive and diverse perspectives – Involve diverse stakeholders, including representatives from different backgrounds, cultures, and perspectives, in the development and evaluation of the virtual agent. This can help identify and mitigate potential biases or blind spots in the system.
  • Ethical training and education – Provide ongoing training and education for the development team, as well as customer-facing personnel, on ethical AI principles, responsible AI practices, and the potential societal impacts of AI systems.
  • Collaboration and knowledge sharing – Engage with the broader AI community, industry groups, and academic institutions to stay informed about the latest developments, best practices, and emerging challenges in the field of responsible AI.

Conclusion

In this post, we explored a comprehensive solution for securing a virtual travel agent powered by generative AI. By using prompt engineering, Guardrails for Amazon Bedrock built-in filters, and comprehensive monitoring, we demonstrated how to create a robust and secure virtual assistant that adheres to the highest standards of responsible AI.

The key benefits of implementing this solution include:

  • Enhanced user experience – By making sure the virtual agent operates within predefined boundaries and provides appropriate responses, users can enjoy a seamless and delightful experience without encountering harmful, biased, or inappropriate content
  • Mitigated risks – The multi-layered approach mitigates the risks associated with generative AI, such as the generation of harmful or biased outputs, exposure of sensitive information, or misuse for malicious purposes
  • Responsible AI alignment – The solution aligns with ethical AI principles and responsible AI practices, fostering trust and accountability in the deployment of AI systems
  • Proactive issue identification – The monitoring mechanisms enable proactive identification of potential issues, allowing for timely adjustments and refinements to the system
  • Scalability and adaptability – The modular nature of the solution allows for effortless scaling and adaptation to different use cases or domains, providing long-term viability and relevance

By following the steps outlined in this post, organizations can confidently take advantage of the power of generative AI while prioritizing responsible AI practices, ultimately delivering a secure and trustworthy virtual travel agent that exceeds customer expectations.

To learn more, visit Guardrails for Amazon Bedrock.


About the Authors

Antonio RodriguezAntonio Rodriguez is a Sr. Generative AI Specialist Solutions Architect in Amazon Web Services. He helps companies of all sizes solve their challenges, embrace innovation, and create new business opportunities with Amazon Bedrock.

Dani MitchellDani Mitchell is an AI/ML Specialist Solutions Architect at Amazon Web Services. He is focused on computer vision use cases and helping customers across EMEA accelerate their ML journey.

Anubhav MishraAnubhav Mishra is a Principal Product Manager for Amazon Bedrock with AWS. He spends his time understanding customers and designing product experiences to address their business challenges.

Read More

Streamline financial workflows with generative AI for email automation

Streamline financial workflows with generative AI for email automation

Many companies across all industries still rely on laborious, error-prone, manual procedures to handle documents, especially those that are sent to them by email. Despite the availability of technology that can digitize and automate document workflows through intelligent automation, businesses still mostly rely on labor-intensive manual document processing. This represents a major opportunity for businesses to optimize this workflow, save time and money, and improve accuracy by modernizing antiquated manual document handling with intelligent document processing (IDP) on AWS. To extract key information from high volumes of documents from emails and various sources, companies need comprehensive automation capable of ingesting emails, file uploads, and system integrations for seamless processing and analysis. Intelligent automation presents a chance to revolutionize document workflows across sectors through digitization and process optimization.

This post explains a generative artificial intelligence (AI) technique to extract insights from business emails and attachments. It examines how AI can optimize financial workflow processes by automatically summarizing documents, extracting data, and categorizing information from email attachments. This enables companies to serve more clients, direct employees to higher-value tasks, speed up processes, lower expenses, enhance data accuracy, and increase efficiency.

Challenges with manual data extraction

The majority of business sectors are currently having difficulties with manual document processing, and are reading emails and their attachments without the use of an automated system. These procedures cost money, take a long time, and are prone to mistakes. Manual procedures struggle to keep up with the number of documents. Finding relevant information that is necessary for business decisions is difficult. Therefore, there is a demand for shorter decision cycles and speedier document processing. The aim of this post is to help companies that process documents manually to speed up the delivery of data derived from those documents for use in business operations. By reducing the time and ongoing expenses associated with manual workflows, organizations can enhance productivity, responsiveness, and innovation through data analytics.

In the past, optical character recognition (OCR) worked well for flawless documents, but the performance of those old systems frequently did not meet customer needs when document quality was imperfect. Because mistakes are unavoidable in manual processes and double-checking every task can be expensive and time-consuming, variability is introduced into workflows. Companies with seasonal fluctuations in customer demand face challenges in staffing document processing to maintain quick customer service. The key is efficiently extracting the most vital data from extensive paperwork to enable prompt decisions. For example, a mortgage application may be over a thousand pages, but only a dozen or so data points critically impact the credit decision. The trick is pinpointing those key details among the flood of information in order to make timely loan approvals while still providing excellent service to applicant.

This post explores how generative AI can make working with business documents and email attachments more straightforward. Sample business considerations include financial industries that have seen an uptick in their user base. They need a back-office automation solution to extract details from emails and attachments, summarize the content to send downstream, classify the documents and content, and assign documents to human reviewers if required. At the same time, the solution must provide data security, such as PII and SOC compliance.

Solution overview

The accompanying code for this solution is available in the GitHub repo. The solution covers two steps to deploy generative AI for email automation:

  • Data extraction from email attachments and classification using various stages of intelligent document processing (IDP). IDP is an industry term used for describing the mechanism for processing and extracting information out of structured, semi-structured, and unstructured documents using AI and machine learning (ML).
  • Data summarization using large language models (LLMs).

The following figure provides a high-level overview of the pipeline steps you might go through while you develop your IDP solution.

The data capture stage is where documents are extracted from emails, compiled, and securely stored as input documents. There may occasionally be different sorts of documents and no automatic method for identifying and categorizing them. However, you can bypass the classification process and go directly to the next stage, which is accurately extracting information from your documents. In the enrichment stage, you can take the data and language from the documents and apply it in significant ways to enhance that data. A human-in-the-loop review is the last stage of the process, which enables you to request a human evaluation of data that has been extracted with a low degree of accuracy. Customers in highly regulated areas like financial services and healthcare are adding human evaluations to their pipelines in order to review the data points.

This solution offers the following key benefits:

  • Elasticity – You have the flexibility to scale up or down with the needs of the business
  • Innovation – You can automate document data extraction coming through email channels
  • Cost savings – You can optimize costs related to manual effort and associated operational cost

Data extraction workflow

The following figure shows a high-level representation of the possible stages of streamlining financial workflows to build our solution.

In the initial phase, the focus is to securely gather and compile data from documents, including email attachments. However, if you already have identifiable documents, you can bypass the classification process and proceed directly to the next phase. In the second step, you extract information accurately from your documents. In the third step, you can use extracted text and data to construct meaningful enhancements for these documents. The fourth and final step involves using foundation models (FMs) to standardize keys and values. This stage focuses on refining form data, including elements like first name, phone number formatting, and so on, into the specific formats required by individual customers. The transformed data is then tailored to match the formats required by their downstream databases. In cases where the confidence score is low or in industries subject to stringent regulations, the form data may be sent to a human-in-the-loop review. These automated stages can be used together or separately, resulting in significant cost reductions, elimination of manual effort, and enhancement of the outcomes of document processing for your business.

AWS architecture

The following figure illustrates the extended architecture of the sample system and explains how you can use AWS services to integrate the end-to-end process.

After the inbound email attachments are received and input documents are stored securely, AWS document processing services and FMs assist with the extraction and summarization in the desired format:

  • Amazon Simple Storage Service (Amazon S3) stores documents in various format files, originated from physical or digital mailrooms, email attachments, or user uploads from web or mobile apps, allowing for efficient processing and scalability.
  • Amazon Textract uses the power of NLP and other ML advancements cultivated over the years, enabling capabilities beyond conventional OCR technologies. Amazon Textract automatically extracts printed text, handwriting, layout elements, and other data such as key-value pairs and tabular information from any document or image.
  • Amazon Comprehend can automatically classify and extract insights from text, which also provides NLP capabilities. It has pre-trained models that identify entities such as places, people, brands, or events; determine the language of the text; extract key phrases; understand how positive or negative the sentiment of text is; and automatically organize a collection of text files by topic.
  • Amazon Bedrock is an enterprise cloud platform by AWS that provides a straightforward way to build and scale generative AI applications with FMs. It provides the necessary tools and infrastructure to deploy, monitor, scale, and govern AI/ML models effortlessly and cost-effectively. You can then have natural conversations with LLM models available in Amazon Bedrock to get insights from the vectorized data.

Our GitHub repo demonstrates how to combine Amazon Textract and LangChain to extract data from documents and use generative AI within different stages of IDP. These samples demonstrate using various LLMs.

Prerequisites

Before you start developing the document workflow, you must complete a few prerequisite steps. Refer to the GitHub repo for details on how you can integrate Amazon Textract with LangChain as a document loader to extract data from documents and use generative AI capabilities within the various IDP phases. The following imports are specific to document extraction from email:

!pip install unstructured
!pip install anthropic
import boto3 from langchain.llms.bedrock import Bedrock

Read emails and attachments

The configuration of UnstructuredEmailLoader is explained in the following code, which also summarizes the email content:

from langchain.document_loaders import UnstructuredEmailLoader
loader = UnstructuredEmailLoader("SampleDocument.eml")
document = loader.load()

template = """
summarize the email by associating tasks to different agents and as a next step
<document>{doc_text}</<document>
<summary>
"""
prompt = PromptTemplate(template=template, input_variables=["doc_text"])

llm_chain = LLMChain(prompt=prompt, llm=llm)
summary = llm_chain.run(document[0].page_content)
print(summary.replace("</summary>","").strip())

Clean up

Follow the cleanup steps specified in the GitHub repo to clean up your resources.

Conclusion

In this post, we explained how to streamline financial workflows with generative AI for email automation, including extracting data from email attachments, classifying documents, and summarizing and processing documents with IDP to derive insights. By examining the various stages of the IDP pipeline, you can enhance your own IDP pipeline with LLM workflows.

To expand this solution, consider the following:

  • Use Retrieval Augmented Generation (RAG) correlation of personalized data in your LLM
  • Keep summarized data private and accept existing data sources as augmented inputs to your desired decision outcome

To learn more, refer to the following resources:


About the Author

Hariharan Nammalvar is a Solutions Architect at AWS, technology professional with 20+ years of experience. He has a proven track record of designing and implementing innovative solutions that solve complex business challenges. He has worked with a wide range of industries, different customer domain helped them to leverage machine learning and AI to streamline operations, improve efficiency, and enhance customer experiences.

Raghavarao Sodabathina is a Principal Solutions Architect at AWS, focusing on Data Analytics, AI/ML, and Serverless Platform. He engages with customers to create innovative solutions that address customer business problems and to accelerate the adoption of AWS services. In his spare time, Raghavarao enjoys spending time with his family, reading books, and watching movies.

Read More

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

This post is co-written with Shamik Ray, Srivyshnav K S, Jagmohan Dhiman and Soumya Kundu from Twilio.

Today’s leading companies trust Twilio’s Customer Engagement Platform (CEP) to build direct, personalized relationships with their customers everywhere in the world. Twilio enables companies to use communications and data to add intelligence and security to every step of the customer journey, from sales and marketing to growth and customer service, and many more engagement use cases in a flexible, programmatic way. Across 180 countries, millions of developers and hundreds of thousands of businesses use Twilio to create magical experiences for their customers. Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. This post outlines the steps AWS and Twilio took to migrate Twilio’s existing machine learning operations (MLOps), the implementation of training models, and running batch inferences to Amazon SageMaker.

ML models don’t operate in isolation. They must integrate into existing production systems and infrastructure to deliver value. This necessitates considering the entire ML lifecycle during design and development. With the right processes and tools, MLOps enables organizations to reliably and efficiently adopt ML across their teams for their specific use cases. SageMaker includes a suite of features for MLOps that includes Amazon SageMaker Pipelines and Amazon SageMaker Model Registry. Pipelines allow for straightforward creation and management of ML workflows while also offering storage and reuse capabilities for workflow steps. The model registry simplifies model deployment by centralizing model tracking.

This post focuses on how to achieve flexibility in using your data source of choice and integrate it seamlessly with Amazon SageMaker Processing jobs. With SageMaker Processing jobs, you can use a simplified, managed experience to run data preprocessing or postprocessing and model evaluation workloads on the SageMaker platform.

Twilio needed to implement an MLOps pipeline that queried data from PrestoDB. PrestoDB is an open source SQL query engine that is designed for fast analytic queries against data of any size from multiple sources.

In this post, we show you a step-by-step implementation to achieve the following:

Use case overview

Twilio trained a binary classification ML model using scikit-learn’s RandomForestClassifier to integrate into their MLOps pipeline. This model is used as part of a batch process that runs periodically for their daily workloads, making training and inference workflows repeatable to accelerate model development. The training data used for this pipeline is made available through PrestoDB and read into Pandas through the PrestoDB Python client.

The end goal was to convert the existing steps into two pipelines: a training pipeline and a batch transform pipeline that connected the data queried from PrestoDB to a SageMaker Processing job, and finally deploy the trained model to a SageMaker endpoint for real-time inference.

In this post, we use an open source dataset available through the TPCH connector that is packaged with PrestoDB to illustrate the end-to-end workflow that Twilio used. Twilio was able to use this solution to migrate their existing MLOps pipeline to SageMaker. All the code for this solution is available in the GitHub repo.

Solution overview

This solution is divided into three main steps:

  • Model training pipeline – In this step, we connect a SageMaker Processing job to fetch data from a PrestoDB instance, train and tune the ML model, evaluate it, and register it with the SageMaker model registry.
  • Batch transform pipeline – In this step, we run a preprocessing data step that reads data from a PrestoDB instance and runs batch inference on the registered ML model (from the model registry) that we approve as a part of this pipeline. This model is approved either programmatically or manually through the model registry.
  • Real-time inference – In this step, we deploy the latest approved model as a SageMaker endpoint for real-time inference.

All pipeline parameters used in this solution exist in a single config.yml file. This file includes the necessary AWS and PrestoDB credentials to connect to the PrestoDB instance, information on the training hyperparameters and SQL queries that are run at training, and inference steps to read data from PrestoDB. This solution is highly customizable for industry-specific use cases so that it can be used with minimal code changes through simple updates in the config file.

The following code shows an example of how a query is configured within the config.yml file. This query is used at the data processing step of the training pipeline to fetch data from the PrestoDB instance. Here, we predict whether an order is a high_value_order or a low_value_order based on the orderpriority as given from the TPC-H data. For more information on the TPC-H data, its database entities, relationships, and characteristics, refer to TPC Benchmark H. You can change the query for your use case within the config file and run the solution with no code changes.

SELECT
    o.orderkey,
    COUNT(l.linenumber) AS lineitem_count,
    SUM(l.quantity) AS total_quantity,
    AVG(l.discount) AS avg_discount,
    SUM(l.extendedprice) AS total_extended_price,
    SUM(l.tax) AS total_payable_tax,
    o.orderdate,
    o.orderpriority,
    CASE
        WHEN (o.orderpriority = '2-HIGH') THEN 1 
        ELSE 0
    END AS high_value_order
FROM
    orders o
JOIN
    lineitem l ON o.orderkey = l.orderkey
GROUP BY
    o.orderkey,
    o.orderdate,
    o.orderpriority
ORDER BY 
    RANDOM() 
LIMIT 5000

The main steps of this solution are described in detail in the following sections.

Data preparation and training

The data preparation and training pipeline includes the following steps:

  1. The training data is read from a PrestoDB instance, and any feature engineering needed is done as part of the SQL queries run in PrestoDB at retrieval time. The queries that are used to fetch data at training and batch inference steps are configured in the config file.
  2. We use the FrameworkProcessor with SageMaker Processing jobs to read data from PrestoDB using the Python PrestoDB client.
  3. For the training and tuning step, we use the SKLearn estimator from the SageMaker SDK and the RandomForestClassifier from scikit-learn to train the ML model. The HyperparameterTuner class is used for running automatic model tuning, which finds the best version of the model by running many training jobs on the dataset using the algorithm and the ranges of hyperparameters.
  4. The model evaluation step checks that the trained and tuned model has an accuracy level above a user-defined threshold and only then register that model within the model registry. If the model accuracy doesn’t meet the threshold, the pipeline fails and the model is not registered with the model registry.
  5. The model training pipeline is then run with pipeline.start, which invokes and instantiates all the preceding steps.

Batch transform

The batch transform pipeline consists of the following steps:

  1. The pipeline implements a data preparation step that retrieves data from a PrestoDB instance (using a data preprocessing script) and stores the batch data in Amazon Simple Storage Service (Amazon S3).
  2. The latest model registered in the model registry from the training pipeline is approved.
  3. A Transformer instance is used to runs a batch transform job to get inferences on the entire dataset stored in Amazon S3 from the data preparation step and store the output in Amazon S3.

SageMaker real-time inference

The SageMaker endpoint pipeline consists of the following steps:

  1. The latest approved model is retrieved from the model registry using the describe_model_package function from the SageMaker SDK.
  2. The latest approved model is deployed as a real-time SageMaker endpoint.
  3. The model is deployed on a ml.c5.xlarge instance with a minimum instance count of 1 and a maximum instance count of 3 (configurable by the user) with the automatic scaling policy set to ENABLED. This removes unnecessary instances so you don’t pay for provisioned instances that you aren’t using.

Prerequisites

To implement the solution provided in this post, you should have an AWS account, a SageMaker domain to access Amazon SageMaker Studio, and familiarity with SageMaker, Amazon S3, and PrestoDB.

The following prerequisites also need to be in place before running this code:

  • PrestoDB – We use the built-in datasets available in PrestoDB through the TPCH connector for this solution. Follow the instructions in the GitHub README.md to set up PrestoDB on an Amazon Elastic Compute Cloud (Amazon EC2) instance in your account. If you already have access to a PrestoDB instance, you can skip this step but note its connection details (see the presto section in the config file). When you have your PrestoDB credentials, fill out the presto section in the config file as follows (enter your host public IP, port, credentials, catalog and schema):
presto:
  host: <0.0.0.0>
  parameter: "0000"
  presto_credentials: <presto_credentials>
  catalog: <catalog>
  schema: <schema>
  • VPC network configurations – We also define the encryption, network isolation, and VPC configurations of the ML model and operations in the config file. For more information on network configurations and preferences, refer to Connect to SageMaker Within your VPC. If you are using the default VPC and security groups then you can leave these configuration parameters empty, see example in this configuration file. If not, then in the aws section, specify the enable_network_isolation status, security_group_ids, and subnets based on your network isolation preferences. :
network_config:
    enable_network_isolation: false
    security_group_ids: 
    - <security_group_id>
    subnets:
    - <subnet-1>
    - <subnet-2>
    - <subnet-3>
  • IAM role – Set up an AWS Identity and Access Management (IAM) role with appropriate permissions to allow SageMaker to access AWS Secrets Manager, Amazon S3, and other services within your AWS account. Until an AWS CloudFormation template is provided that creates the role with the requisite IAM permissions, use a SageMaker role that allows the AmazonSageMakerFullAccess AWS managed policy for your role.
  • Secrets Manager secret – Set up a secret in Secrets Manager for the PrestoDB user name and password. Call the secret prestodb-credentials and add a username field and password field to it. For instructions, refer to Create and manage secrets with AWS Secrets Manager.

Deploy the solution

Complete the following steps to deploy the solution:

  1. Clone the GitHub repository in SageMaker Studio. For instructions, see Clone a Git Repository in SageMaker Studio Classic.
  2. Edit the config.yml file as follows:
    1. Edit the parameter values in the presto section. These parameters define the connectivity to PrestoDB.
    2. Edit the parameter values in the aws section. These parameters define the network connectivity, IAM role, bucket name, AWS Region, and other AWS Cloud-related parameters.
    3. Edit the parameter values in the sections corresponding to the pipeline steps (training_step, tuning_step, transform_step, and so on).
    4. Review all the parameters in these sections carefully and edit them as appropriate for your use case.

When the prerequisites are complete and the config.yml file is set up correctly, you’re ready to run the mlops-pipeline-prestodb solution. The following architecture diagram provides a visual representation of the steps that you implement.

The diagram shows the following three steps:

  • Part 1: Training – This pipeline includes the data preprocessing step, the training and tuning step, the model evaluation step, the condition step, and the register model step. The train, test, and validation datasets and evaluation report that are generated in this pipeline are sent to an S3 bucket.
  • Part 2: Batch transform – This pipeline includes the batch data preprocessing step, approving the latest model from the model registry, creating the model instance, and performing batch transformation on data that is stored and retrieved from an S3 bucket.
  • The PrestoDB server is hosted on an EC2 instance, with credentials stored in Secrets Manager.
  • Part 3: SageMaker real-time inference – Finally, the latest approved model from the SageMaker model registry is deployed as a SageMaker real-time endpoint for inference.

Test the solution

In this section, we walk through the steps of running the solution.

Training pipeline

Complete the following steps to run the training pipeline

(0_model_training_pipeline.ipynb):

  1. On the SageMaker Studio console, choose 0_model_training_pipeline.ipynb in the navigation pane.
  2. When the notebook is open, on the Run menu, choose Run All Cells to run the code in this notebook.

This notebook demonstrates how you can use SageMaker Pipelines to string together a sequence of data processing, model training, tuning, and evaluation steps to train a binary classification ML model using scikit-learn.

At the end of this run, navigate to pipelines in the navigation pane. Your pipeline structure on SageMaker Pipelines should look like the following figure.

The training pipeline consists of the following steps that are implemented through the notebook run:

  • Preprocess the data – In this step, we create a processing job for data preprocessing. For more information on processing jobs, see Process data. We use a preprocessing script to connect and query data from a PrestoDB instance using the user-specified SQL query in the config file. This step splits and sends data retrieved from PrestoDB as train, test, and validation files to an S3 bucket. The ML model is trained using the data in these files.
  • The sklearn_processor is used in the ProcessingStep to run the scikit-learn script that preprocesses data. The step is defined as follows:
# declare the sk_learn processer
step_args = sklearn_processor.run(
        ## code refers to the data preprocessing script that is responsible for querying data from the PrestoDB instance
        code=config['scripts']['preprocess_data'],
        source_dir=config['scripts']['source_dir'], 
        outputs=outputs_preprocessor,
        arguments=[
            "--host", host_parameter,
            "--port", port_parameter,
            "--presto_credentials_key", presto_parameter,
            "--region", region_parameter,
            "--presto_catalog", presto_catalog_parameter,
            "--presto_schema", presto_schema_parameter,
            "--train_split", train_split.to_string(), 
            "--test_split", test_split.to_string(),
        ],
    )

    step_preprocess_data = ProcessingStep(
        name=config['data_processing_step']['step_name'],
        step_args=step_args,
    )

Here, we use config['scripts']['source_dir'], which points to the data preprocessing script that connects to the PrestoDB instance. Parameters used as arguments in step_args are configurable and fetched from the config file.

  • Train the model – In this step, we create a training job to train a model. For more information on training jobs, see Train a Model with Amazon SageMaker. Here, we use the Scikit Learn Estimator from the SageMaker SDK to handle the end-to-end training and deployment of custom Scikit-learn code. The RandomForestClassifier is used to train the ML model for our binary classification use case. The HyperparameterTuner class is used for running automatic model tuning to determine the set of hyperparameters that provide the best performance based on a user-defined metric threshold (for example, maximizing the AUC metric).

In the following code, the sklearn_estimator object is used with parameters that are configured in the config file and uses a training script to train the ML model. This step accesses the train, test, and validation files that were created as a part of the previous data preprocessing step.

# declare a tuning step to use the train and test data to tune the ML model using the `HyperparameterTuner` declared above
step_tuning = TuningStep(
    name=config['tuning_step']['step_name'],
    tuner=rf_tuner,
    inputs={
        "train": TrainingInput(
            s3_data=step_preprocess_data.properties.ProcessingOutputConfig.Outputs[
                "train" ## refer to this
            ].S3Output.S3Uri,
            content_type="text/csv",
        ),
        "test": TrainingInput(
        s3_data=step_preprocess_data.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
        content_type="text/csv",
        ),
    },
)
  • Evaluate the model – This step checks if the trained and tuned model has an accuracy level above a user-defined threshold, and only then registers the model with the model registry. If the model accuracy doesn’t meet the user-defined threshold, the pipeline fails and the model is not registered with the model registry. We use the ScriptProcessor with an evaluation script that a user creates to evaluate the trained model based on a metric of choice.

The evaluation step uses the evaluation script as a code entry. This script prepares the features and target values, and calculates the prediction probabilities using model.predict. At the end of the run, an evaluation report is sent to Amazon S3 that contains information on precision, recall, and accuracy metrics.

step_evaluate_model = ProcessingStep(
    name=config['evaluation_step']['step_name'],
    processor=evaluate_model_processor,
    inputs=[
        ProcessingInput(
            source=step_tuning.get_top_model_s3_uri(top_k=0, s3_bucket=bucket),
            destination="/opt/ml/processing/model",
            input_name="model.tar.gz" 
        ),
        ProcessingInput(
            source=step_preprocess_data.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
            destination="/opt/ml/processing/test",
            input_name="test.csv" 
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="evaluation",
            source="/opt/ml/processing/evaluation",
            destination=Join(
                on="/",
                values=[
                    "s3://{}".format(bucket),
                    prefix,
                    ExecutionVariables.PIPELINE_EXECUTION_ID,
                    "evaluation",
                ]
            )
        )
    ],
    code = config['scripts']['evaluation'],
    property_files=[evaluation_report],
    job_arguments=[
        "--target", target_parameter,
        "--features", feature_parameter,
    ]
)

The following screenshot shows an example of an evaluation report.

  • Add conditions – After the model is evaluated, we can add conditions to the pipeline with a ConditionStep. This step registers the model only if the given user-defined metric threshold is met. In our solution, we only want to register the new model version with the model registry if the new model meets a specific accuracy condition of above 70%.
# Create a SageMaker Pipelines ConditionStep, using the condition above.
# Enter the steps to perform if the condition returns True / False.
step_cond = ConditionStep(
    name=config['condition_step']['step_name'],
    conditions=[cond_gte],
    if_steps=[step_register_model],
    else_steps=[step_fail], ## if this fails
)

If the accuracy condition is not met, a step_fail step is run that sends an error message to the user, and the pipeline fails. For instance, because the user-defined accuracy condition is set to 0.7 in the config file, and the accuracy calculated during the evaluation step exceeds it (73.8%), the outcome of this step is set to True and the model moves to the last step of the training pipeline.

  • Register the model – The RegisterModel step registers a sagemaker.model.Model or a sagemaker.pipeline.PipelineModel with the SageMaker model registry. When the trained model meets the model performance requirements, a new version of the model is registered with the SageMaker model registry.

The model is registered with the model registry with an approval status set to PendingManualApproval. This means the model can’t be deployed on a SageMaker endpoint unless its status in the registry is changed to Approved manually on the SageMaker console, programmatically, or through an AWS Lambda function.

Now that the model is registered, you can get access to the registered model manually on the SageMaker Studio model registry console or programmatically in the next notebook, approve it, and run the batch transform pipeline.

Batch transform pipeline

Complete the following steps to run the batch transform pipeline (1_batch_transform_pipeline.ipynb):

  1. On the SageMaker Studio console, choose 1_batch_transform_pipeline.ipynb in the navigation pane.
  2. When the notebook is open, on the Run menu, choose Run All Cells to run the code in this notebook.

This notebook will run a batch transform pipeline using the model trained in the previous notebook.

At the end of the batch transform pipeline, your pipeline structure on SageMaker Pipelines should look like the following figure.

The batch transform pipeline consists of the following steps that are implemented through the notebook run:

  • Extract the latest approved model from the SageMaker model registry – In this step, we extract the latest model from the model registry and set the ModelApprovalStatus to Approved:
## updating the latest model package to approved status to use it for batch inference
model_package_update_response = sm.update_model_package(
    ModelPackageArn=latest_model_package_arn,
    ModelApprovalStatus="Approved",
)

Now we have extracted the latest model from the SageMaker model registry and programmatically approved it. You can also approve the model manually on the SageMaker model registry page in SageMaker Studio as shown in the following screenshot.

  • Read raw data for inference from PrestoDB and store it in an S3 bucket – After the latest model is approved, batch data is fetched from the PrestoDB instance and used for the batch transform step. In this step, we use a batch preprocessing script that queries data from PrestoDB and saves it in a batch directory within an S3 bucket. The query that is used to fetch batch data is configured by the user within the config file in the transform_step section:
# declare the batch step that is called later in pipeline execution
batch_data_prep = ProcessingStep(
    name=config['data_processing_step']['step_name'],
    step_args=step_args,
)

After the batch data is extracted into the S3 bucket, we create a model instance and point to the inference.py script, which contains code that runs as part of getting inference from the trained model:

# create the model image based on the model data and refer to the inference script as an entry point for batch inference
model = Model(
    image_uri=image_uri,
    entry_point=config['scripts']['batch_inference'],
    model_data=model_data_url,
    sagemaker_session=pipeline_session,
    role=role,
)
  • Create a batch transform step to perform inference on the batch data stored in Amazon S3 – Now that a model instance is created, create a Transformer instance with the appropriate model type, compute instance type, and desired output S3 URI. Specifically, pass in the ModelName from the CreateModelStep step_create_model properties. The CreateModelStep properties attribute matches the object model of the DescribeModel response object. Use a transform step for batch transformation to run inference on an entire dataset. For more information about batch transform, see Run Batch Transforms with Inference Pipelines.
  • A transform step requires a transformer and the data on which to run batch inference:
transformer = Transformer(
model_name=step_create_model.properties.ModelName,
instance_type=config['transform_step']['instance_type'],
instance_count=config['transform_step']['instance_count'],
strategy="MultiRecord",
accept="text/csv",
assemble_with="Line",
output_path=f"s3://{bucket}",
tags = config['transform_step']['tags'], 
env={
    'START_TIME_UTC': st.strftime('%Y-%m-%d %H:%M:%S'), 
    'END_TIME_UTC': et.strftime('%Y-%m-%d %H:%M:%S'),
})

Now that the transformer object is created, pass the transformer input (which contains the batch data from the batch preprocess step) into the TransformStep declaration. Store the output of this pipeline in an S3 bucket.

step_transform = TransformStep(
    name=config['transform_step']['step_name'], transformer=transformer, inputs=transform_input, 
)

SageMaker real-time inference

Complete the following steps to run the real-time inference pipeline (2_realtime_inference.ipynb):

  1. On the SageMaker Studio console, choose 2_realtime_inference_pipeline.ipynb in the navigation pane.
  2. When the notebook is open, on the Run menu, choose Run All Cells to run the code in this notebook.

This notebook extracts the latest approved model from the model registry and deploys it as a SageMaker endpoint for real-time inference. It does so by completing the following steps:

  • Extract the latest approved model from the SageMaker model registry – To deploy a real-time SageMaker endpoint, first fetch the image URI of your choice and extract the latest approved model from the model registry. After the latest approved model is extracted, we use a container list with the specified inference.py as the script for the deployed model to use at inference. This model creation and endpoint deployment are specific to the scikit-learn model configuration.
  • In the following code, we use the inference.py file specific to the scikit-learn model. We then create our endpoint configuration, setting our ManagedInstanceScaling to ENABLED with our desired MaxInstanceCount and MinInstanceCount for automatic scaling:
create_endpoint_config_response = sm.create_endpoint_config(
EndpointConfigName = endpoint_config_name,
ProductionVariants=[{
    'InstanceType': instance_type,
    # have max instance count configured here
    'InitialInstanceCount': min_instances,
    'InitialVariantWeight': 1,
    'ModelName': model_name,
    'VariantName': 'AllTraffic', 
    # change your managed instance configuration here
    "ManagedInstanceScaling":{
        "MaxInstanceCount": max_instances,
        "MinInstanceCount": min_instances,
        "Status": "ENABLED",}
}])
  • Run inference on the deployed real-time endpoint – After you have extracted the latest approved model, created the model from the desired image URI, and configured the endpoint configuration, you can deploy it as a real-time SageMaker endpoint:
create_endpoint_response = sm.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name)

# wait for endpoint to reach a terminal state (InService) using describe endpoint
describe_endpoint_response = sm.describe_endpoint(EndpointName=endpoint_name)

while describe_endpoint_response["EndpointStatus"] == "Creating":
    describe_endpoint_response = sm.describe_endpoint(EndpointName=endpoint_name)

Upon deployment, you can view the endpoint in service on the SageMaker Endpoints page.

Now you can run inference against the data extracted from PrestoDB:

body_str = "total_extended_price,avg_discount,total_quantityn1,2,3n66.77,12,2"

response = smr.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=body_str.encode('utf-8') ,
    ContentType='text/csv',
)

response_str = response["Body"].read().decode()
response_str

Results

Here is an example of an inference request and response from the real time endpoint using the implementation above:

Inference request format (view and change this example as you would like for your custom use case)

body_str = """total_extended_price,avg_discount,total_quantity
32,40,334
"""
 
response = smr.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=body_str.encode('utf-8'),
    ContentType='text/csv',
)

response_str = response["Body"].read().decode()
data = json.loads(response_str)
print(json.dumps(data, indent=4))

Response from the real time endpoint

[
    {
        "total_extended_price": 32,
        "avg_discount": 40,
        "total_quantity": 334,
        "prediction": 0
    }
]

Clean up

To clean up the endpoint used in this solution to avoid extra charges, complete the following steps:

  1. On the SageMaker console, choose Endpoints in the navigation pane.
  2. Select the endpoint to delete.
  3. On the Actions menu, choose Delete.

Conclusion

In this post, we demonstrated an end-to-end MLOps solution on SageMaker. The process involved fetching data by connecting a SageMaker Processing job to a PrestoDB instance, followed by training, evaluating, and registering the model. We approved the latest registered model from the training pipeline and ran batch inference against it using batch data queried from PrestoDB and stored in Amazon S3. Lastly, we deployed the latest approved model as a real-time SageMaker endpoint to run inferences.

The rise of generative AI increases the demand for training, deploying, and running ML models, and consequently, the use of data. By integrating SageMaker Processing jobs with PrestoDB, you can seamlessly migrate your workloads to SageMaker pipelines without additional data preparation, storage, or accessibility burdens. You can build, train, evaluate, run batch inferences, and deploy models as real-time endpoints while using your existing data engineering pipelines with minimal or no code changes.

Explore SageMaker Pipelines and open source data querying engines like PrestoDB, and build a solution using the sample implementation provided.

Get started today by referring to the GitHub repository.

For more information and tutorials on SageMaker Pipelines, refer to the SageMaker Pipelines documentation.


About the Authors

Madhur Prashant is an AI and ML Solutions Architect at Amazon Web Services. He is passionate about the intersection of human thinking and generative AI. His interests lie in generative AI, specifically building solutions that are helpful and harmless, and most of all optimal for customers. Outside of work, he loves doing yoga, hiking, spending time with his twin, and playing the guitar.

Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington D.C.

Antara Raisa is an AI and ML Solutions Architect at Amazon Web Services supporting strategic customers based out of Dallas, Texas. She also has experience working with large enterprise partners at AWS, where she worked as a Partner Success Solutions Architect for digital-centered customers.

Johnny Chivers is a Senior Solutions Architect working within the Strategic Accounts team at AWS. With over 10 years of experience helping customers adopt new technologies, he guides them through architecting end-to-end solutions spanning infrastructure, big data, and AI.

Shamik Ray is a Senior Engineering Manager at Twilio, leading the Data Science and ML team. With 12 years of experience in software engineering and data science, he excels in overseeing complex machine learning projects and ensuring successful end-to-end execution and delivery.

Srivyshnav K S is a Senior Machine Learning Engineer at Twilio with over 5 years of experience. His expertise lies in leveraging statistical and machine learning techniques to develop advanced models for detecting patterns and anomalies. He is adept at building projects end-to-end.

Jagmohan Dhiman is a Senior Data Scientist with 7 years of experience in machine learning solutions. He has extensive expertise in building end-to-end solutions, encompassing data analysis, ML-based application development, architecture design, and MLOps pipelines for managing the model lifecycle.

Soumya Kundu is a Senior Data Engineer with almost 10 years of experience in Cloud and Big Data technologies. He specialises in AI/ML based large scale Data Processing systems and an avid IoT enthusiast in his spare time.

Read More

Accelerate deep learning training and simplify orchestration with AWS Trainium and AWS Batch

Accelerate deep learning training and simplify orchestration with AWS Trainium and AWS Batch

In large language model (LLM) training, effective orchestration and compute resource management poses a significant challenge. Automation of resource provisioning, scaling, and workflow management is vital for optimizing resource usage and streamlining complex workflows, thereby achieving efficient deep learning training processes. Simplified orchestration enables researchers and practitioners to focus more on model experimentation, hyperparameter tuning, and data analysis, rather than dealing with cumbersome infrastructure management tasks. Straightforward orchestration also accelerates innovation, shortens time-to-market for new models and applications, and ultimately enhances the overall efficiency and effectiveness of LLM research and development endeavors.

This post explores the seamless integration of AWS Trainium with AWS Batch, showcasing how the powerful machine learning (ML) acceleration capabilities of Trainium can be harnessed alongside the efficient orchestration functionalities offered by AWS Batch. Trainium provides massive scalability, enables effortless scaling of training jobs from small models to LLMs, and offers cost-effective access to computational power, making training LLMs affordable and accessible. AWS Batch is a managed service facilitating batch computing workloads on the AWS Cloud, handling tasks like infrastructure management and job scheduling, while enabling you to focus on application development and result analysis. AWS Batch provides comprehensive features, including managed batch computing, containerized workloads, custom compute environments, and prioritized job queues, along with seamless integration with other AWS services.

Solution overview

The following diagram illustrates the solution architecture.

The training process proceeds as follows:

  1. The user creates a Docker image configured to suit the demands of the underlying training task.
  2. The image is pushed to Amazon Elastic Container Registry (Amazon ECR) to make it ready for deployment.
  3. The user submits the training job to AWS Batch with the Docker image.

Let’s deep dive into this solution to see how you can integrate Trainium with AWS Batch. The following example demonstrates how to train the Llama 2-7B model using AWS Batch with Trainium.

Prerequisites

It is advised to not run the following scripts on your local machine. Instead, clone the GitHub repository and run the provided scripts on an x86_64-based instance, preferably using a C5.xlarge instance type with the Linux/Ubuntu operating system. For this post, we run the example on an Amazon Linux 2023 instance.

You should have the following resources and tools before getting started with the training on AWS Batch:

sudo yum install -y docker 
sudo yum install -y jq

Clone the repo

Clone the GitHub repo and navigate to the required directory:

git clone https://github.com/aws-neuron/aws-neuron-samples.git 
cd aws-neuron-samples/torch-neuronx/training/aws-batch/llama2

Update the configuration

First, update the config.txt file to specify values for the following variables:

REGION                          # your aws region 
SUBNET                          # your subnet in which the Trainium instances would be launched 
SG                              # your security group you want to associate with your instances 
ECR_REPO                        # your ECR repo where the docker container image will be pushed to 
INSTANCE_ROLE                   # Instance profile ARN for your IAM Instance Role 
DO_PRE_COMPILATION              # boolean value (truefalse) indicating if you want to do neuron pre-compilation for your training job 
TOKENIZED_DATASET_URI           # s3 uri to store the tokenized dataset 
NEURON_COMPILE_CACHE_URI        # s3 uri to store the neuron compile caches 
CHECKPOINT_SAVE_URI             # s3 uri to store the checkpoints

After you provide these values, your config.txt file should look something like the following code

REGION=us-east-1
SUBNET=subnet-012345abcd5689
SG=sg-012345abcd5689
ECR_REPO=1010101010.dkr.ecr.us-east-1.amazonaws.com/your-docker-repo
INSTANCE_ROLE=arn:aws:iam::1010101010:instance-profile/your-instance-role
DO_PRE_COMPILATION=true
TOKENIZED_DATASET_URI=s3://your/s3/location/to/store/tokenized/dataset/
NEURON_COMPILE_CACHE_URI=s3://your/s3/location/to/store/neuron-compile-cache/
CHECKPOINT_SAVE_URI=s3://your/s3/location/to/store/checkpoints/

Get the Llama tokenizer

To tokenize the dataset, you would need to get the tokenizer from Hugging Face. Follow the instructions to access the Llama tokenizer. (You need to acknowledge and accept the license terms.) After you’re granted access, you can download the tokenizer from Hugging Face. After a successful download, place the tokenizer.model file in the root directory (llama2).

Set up Llama training

Run the setup.sh script, which streamlines the prerequisite steps for initiating the AWS Batch training. This script downloads the necessary Python files for training the Llama 2-7B model. Additionally, it performs environment variable substitution within the provided templates and scripts designed to establish AWS Batch resources. When it runs, it makes sure your directory structure conforms to the following setup:

.
├── build
│ ├── compute_env.json
│ ├── job_def.json
│ ├── job_queue.json
│ └── launch_template.json
├── build_and_push_docker_image.sh
├── cleanup.sh
├── config.txt
├── create_resources.sh
├── data
│ ├── get_dataset.py
│ ├── config.json
│ └── tokenizer.model
├── docker
│ ├── Dockerfile
│ ├── llama2
│ │ ├── adamw_fp32_optim_params.py
│ │ ├── config.json
│ │ ├── llama_batch_training.sh
│ │ ├── modeling_llama_nxd.py
│ │ ├── requirements.txt
│ │ └── tp_zero1_llama2_7b_hf_pretrain.py
│ └── llama_batch_training.sh
├── download_and_tokenize_data.sh
├── images
│ └── aws-batch.png
├── README.md
├── scripts
│ ├── build_and_push_docker_image.sh
│ ├── cleanup.sh
│ ├── create_resources.sh
│ ├── download_and_tokenize_data.sh
│ └── submit_batch_job.sh
├── setup.sh
├── submit_batch_job.sh
└── templates
├── compute_env.json
├── job_def.json
├── job_queue.json
└── launch_template.json

Tokenize the dataset

Next, run the download_and_tokenize_data.sh script to complete the data preprocessing steps for Llama 2-7B training. In this instance, we use the wikicorpus dataset sourced from Hugging Face. After the dataset retrieval, the script performs tokenization and uploads the tokenized dataset to the predefined S3 location specified within the config.txt configuration file. The following screenshots show the preprocessing results.

Provision resources

Next, run the create_resources.sh script, which orchestrates the provisioning of the required resources for the training task. This includes creation of a placement group, launch template, compute environment, job queue, and job definition. The following screenshots illustrate this process.

Build and push the Docker image

Now you can run the script build_and_push_docker_image.sh, which constructs a Docker container image customized for your specific training task. This script uses a Deep Learning Container Image published by the Neuron team, which contains the required software stack, and then added instructions for running the Llama 2-7B training on top of it. The training script uses the neuronx_distributed library with tensor parallelism along with the ZeRO-1 Optimizer. Subsequently, the newly generated Docker container image is uploaded to your designated ECR repository as specified by the variable ECR_REPO in the configuration file config.txt.

If you want to modify any of the Llama training hyperparameters, make the required changes in ./docker/llama_batch_training.sh before running build_and_push_docker_image.sh.

The following screenshots illustrate the process for building and pushing the Docker image.

Submit the training job

Run the submit_batch_job.sh script to initiate the AWS Batch job and start the Llama2 model training, as shown in the following screenshots.

Upon batch job submission, an Amazon Elastic Container Service (Amazon ECS) cluster is dynamically provisioned. When it’s operational, you can navigate to the cluster to monitor all tasks actively running on the Trn1.32xl instances, launched through this job. By default, this example is configured to use 4 trn1.32xl instances. To customize this setting, you can modify the numNodes parameter in the submit_batch_job.sh script.

Logs and monitoring

After the job submission, you can use Amazon CloudWatch Logs for comprehensive monitoring, storage, and viewing of all logs generated by AWS Batch. Complete the following steps to access the logs:

  1. On the CloudWatch console, choose Log groups under Logs in the navigation pane.
  2. Choose /aws/batch/job to view the batch job logs.
  3. Look for log groups that match your AWS Batch job names or job definitions.
  4. Choose the job to view its details.

The following screenshot shows an example.

Checkpoints

Checkpoints generated during training will be stored in the predefined S3 location specified as CHECKPOINT_SAVE_URI in the config.txt file. By default, the checkpoint is saved when training is complete. However, you can adjust this behavior by opting to save the checkpoint after every N steps within the training loop. For detailed instructions on this customization, refer to Checkpointing.

Clean up

When you’re done, run the cleanup.sh script to manage the removal of resources created during the post. This script takes care of removing various components, such as the launch template, placement group, job definition, job queue, and compute environment. AWS Batch automatically handles the cleanup of the ECS stack and Trainium instances, so there’s no need to manually remove or stop them.

Conclusion

The seamless integration of Trainium with AWS Batch represents a significant advancement in the realm of ML training. By combining the unparalleled capabilities of Trainium with the powerful orchestration functionalities of AWS Batch, you stand to benefit in numerous ways. Firstly, you gain access to massive scalability, with the ability to effortlessly scale training jobs from small models to LLMs. With up to 16 Trainium chips per instance and the potential for distributed training across tens of thousands of accelerators, you can tackle even the most demanding training tasks with ease by virtue of Trainium instances. Additionally, it offers a cost-effective solution, helping you harness the power you need at an appealing price point. With the fully managed service offered by AWS Batch for computing workloads, you can offload operational complexities such as infrastructure provisioning and job scheduling, allowing you to focus your efforts on building applications and analyzing results. Ultimately, the integration of Trainium with AWS Batch empowers you to accelerate innovation, shorten time-to-market for new models and applications, and enhance the overall efficiency and effectiveness of your ML endeavors.

Now that you have learned about orchestrating Trainium using AWS Batch, we encourage you to try it out for your next deep learning training job. You can explore more tutorials that will help you gain hands-on experience with AWS Batch and Trainium, and enable you to manage your deep learning training workloads and resources for better performance and cost-efficiency. So why wait? Start exploring these tutorials today and take your deep learning training to the next level with Trainium and AWS Batch!


About the authors

Scott Perry is a Solutions Architect on the Annapurna ML accelerator team at AWS. Based in Canada, he helps customers deploy and optimize deep learning training and inference workloads using AWS Inferentia and AWS Trainium. His interests include large language models, deep reinforcement learning, IoT, and genomics.

Sadaf Rasool is a Machine Learning Engineer with Annapurna ML Accelerator team at AWS. As an enthusiastic and optimistic AI/ML professional, he holds firm to the belief that the ethical and responsible application of AI has the potential to enhance society in the years to come, fostering both economic growth and social well-being.

Read More

Build a custom UI for Amazon Q Business

Build a custom UI for Amazon Q Business

Amazon Q is a new generative artificial intelligence (AI)-powered assistant designed for work that can be tailored to your business. Amazon Q can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take actions using the data and expertise found in your company’s information repositories and enterprise systems. When you chat with Amazon Q, it provides immediate, relevant information and advice to help streamline tasks, speed up decision-making, and spark creativity and innovation at work. For more information, see Amazon Q Business, now generally available, helps boost workforce productivity with generative AI.

This post demonstrates how to build a custom UI for Amazon Q Business. The customized UI allows you to implement special features like handling feedback, using company brand colors and templates, and using a custom login. It also enables conversing with Amazon Q through an interface personalized to your use case.

Solution overview

In this solution, we deploy a custom web experience for Amazon Q to deliver quick, accurate, and relevant answers to your business questions on top of an enterprise knowledge base. The following diagram illustrates the solution architecture.

The workflow includes the following steps:

  1. The user accesses the chatbot application, which is hosted behind an Application Load Balancer.
  2. After the user logs in, they’re redirected to the Amazon Cognito login page for authentication.
    • This solution uses an Amazon Cognito user pool as an OAuth-compatible identity provider (IdP), which is required in order to exchange a token with AWS IAM Identity Center and later on interact with the Amazon Q Business APIs. For more information about trusted token issuers and how token exchanges are performed, see Using applications with a trusted token issuer. If you already have an OAuth-compatible IdP, you can use it instead of setting an Amazon Cognito user pool.
    • Provisioning local users in the user pool and reconciling them with IAM Identity Center can be error-prone. You can streamline the integration of IAM Identity Center users into the user pool by using a federated IdP and creating a second custom application (SAML) in IAM Identity Center. For instructions, refer to How do I integrate IAM Identity Center with an Amazon Cognito user pool and the associated demo video.
  3. The UI application, deployed on an Amazon Elastic Compute Cloud (Amazon EC2) instance, authenticates the user with Amazon Cognito and obtains an authentication token. It then exchanges this Amazon Cognito identity token for an IAM Identity Center token that grants the application permissions to access Amazon Q.
  4. The UI application assumes an AWS Identity and Access Management (IAM) role and retrieves an AWS session token from the AWS Security Token Service (AWS STS). This session token is augmented with the IAM Identity Center token, enabling the application to interact with Amazon Q. For more information about the token exchange flow between IAM Identity Center and the IdP, refer to How to develop a user-facing data application with IAM Identity Center and S3 Access Grants (Part 1) and Part 2.
  5. Amazon Q uses the chat_sync API to carry out the conversation.
    1. The request uses the following mandatory parameters:
      1. applicationId – The identifier of the Amazon Q application linked to the Amazon Q conversation.
      2. userMessage – An end-user message in a conversation.
    2. Amazon Q returns the response as a JSON object (detailed in the Amazon Q documentation). The following are a few core attributes from the response payload:
      1. systemMessage – An AI-generated message in a conversation.
      2. sourceAttributions – The source documents used to generate the conversation response. In Retrieval Augmentation Generation (RAG), this always refers to one or more documents from enterprise knowledge bases that are indexed in Amazon Q.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account set up.
  • A VPC where you will deploy the solution.
  • An IAM role in the account with sufficient permissions to create the necessary resources. If you have administrator access to the account, no additional action is required.
  • An existing, working Amazon Q application, integrated with IAM Identity Center. If you haven’t set one up yet, see Creating an Amazon Q application.
  • Access to IAM Identity Center to create a customer managed application.
  • An SSL certificate created and imported into AWS Certificate Manager (ACM). For more details, refer to Importing a certificate. If you don’t have a public SSL certificate, follow the steps in the next section to generate a private certificate.

Generate a private certificate

If you already have an SSL certificate, you can skip this section.

You will receive a warning from your browser when accessing the UI if you didn’t provide a custom SSL certificate when launching the AWS CloudFormation stack. The instructions in this section show you how to create a self-signed certificate. This is not recommended for production use cases. You should obtain an SSL certificate that has been validated by a certificate authority, import it into ACM, and reference this when launching the CloudFormation stack. If you want to continue with the self-signed certificate (for development purposes), you should be able to proceed past the browser warning page. With Chrome, you will see the message Your connection is not private error message (NET::ERR_CERT_AUTHORITY_INVALID), but by choosing Advanced, you should then see a link to proceed.

The following command generates a sample self-signed certificate (for development purposes) and uploads the certificate to ACM. You can also find the script on the GitHub repo.

openssl req 
  -x509 -nodes -days 365 -sha256 
  -subj '/C=US/ST=Oregon/L=Portland/CN=sampleexample.com' 
  -newkey rsa:2048 -keyout key.pem -out cert.pem

aws acm import-certificate --certificate fileb://cert.pem --private-key fileb://key.pem   

Note down the CertificateARN to use later while provisioning the CloudFormation template.

Provision resources with the CloudFormation template

The full source of the solution on in the GitHub repository and is deployed with AWS CloudFormation.

Choose Launch Stack to launch a CloudFormation stack in your account and deploy the template:

This template creates separate IAM roles for the Application Load Balancer, Amazon Cognito, and the EC2 instance. Additionally, it creates and configures those services to run the end-to-end demonstration.

Provide the following parameters for the stack:

  • Stack name – The name of the CloudFormation stack (for example, AmazonQ-UI-Demo).
  • AuthName – A globally unique name to assign to the Amazon Cognito user pool. Make sure your domain name doesn’t include any reserved words, such as cognito, aws, or amazon.
  • CertificateARN – The CertificateARN generated from the previous step.
  • IdcApplicationArn – This is the Amazon Resource Name (ARN) for the AWS Identity Center customer application. Leave it blank on the first run, because you need to create the Amazon Cognito user pool as part of this stack. This will create an IAM Identity Center application with an Amazon Cognito user pool as the trusted token issuer.
  • LatestAMIId – The ID of the AMI to use for the EC2 instance. We suggest keeping the default value.
  • PublicSubnetIds – The ID of the public subnet that can be used to deploy the EC2 instance and the Application Load Balancer.
  • QApplicationId – The existing application ID of Amazon Q.
  • VPCId – The ID of the existing VPC that can be used to deploy the demo.

After the CloudFormation stack deploys successfully, copy the following values on the stack’s Outputs tab:

  • Audience – Audience to set up the customer application in IAM Identity Center
  • RoleArn – ARN of the IAM role required to set up the token exchange in IAM Identity Center
  • TrustedIssuerUrl – Endpoint of the trusted issuer to set up IAM Identity Center
  • URL – The load balancer URL to access the UI application

Create an IAM Identity Center application

The actions described in this section are one-time actions. The goal is to configure an application in IAM Identity Center to represent the application you are building. Specifically, in this step, you configure IAM Identity Center to be able to trust the identity tokens by which your application will represent its authenticated users. Complete the following steps:

  1. On the IAM Identity Center console, add a new custom managed application.
  2. For Application type, select OAuth 2.0, then choose Next.
  3. Enter an application name and description.
  4. Set Application visibility as Not visible, then choose Next.
  5. On the Trusted token issuers tab, choose Create trusted token issuer.
  6. For Issuer URL, provide the TrustedIssuerUrl you copied from the CloudFormation stack output.
  7. Enter an issuer name and keep the map attributes as Email.
  8. In the IAM Identity Center application authentication settings, select the trusted token issuer created in the previous step and add the Aud claim, providing the audience you copied from the CloudFormation stack output, then choose Next.
  9. On the Specify application credentials tab, choose Enter one or more IAM roles and provide the value for RoleArn you copied from the CloudFormation stack output.
  10. Review all the steps and create the application.
  11. After the application is created, go to the application, choose Assign users and groups, and add the users who will have access to the UI application.
  12. On the Select setup type page, choose All applications for service with same access, choose Amazon Q from the Services list, and choose Trust applications.
  13. After the IAM Identity Center application is created, copy the application ARN.
  14. On the AWS CloudFormation console, update the stack and provide the IAM Identity Center application ARN for the parameter IdcApplicationArn, then run the stack.
  15. When the update process is complete, go to the CloudFormation stack’s Outputs tab and copy the URL provided there.

Custom UI

The CloudFormation stack deploys and starts the Streamlit application on an EC2 instance on port 8080. To view the health of the application running behind the Application Load Balancer, open the Amazon EC2 console and choose Load Balancing under Target groups in the navigation pane. For debugging purposes, you can also connect to Amazon EC2 through Session Manager, a capability of AWS Systems Manager.

To access the custom UI, use the URL that you copied from the CloudFormation stack output. Choose Sign up and use the same email address for the users that were registered in IAM Identity Center.

After successful authentication, you’re redirected to the custom UI. You can enhance it by implementing custom features like handling feedback, using your companies brand colors and templates, and personalizing it to your specific use case.

Clean up

To avoid future charges in your account, delete the resources you created in this walkthrough. The EC2 instance with the custom UI will incur charges as long as the instance is active, so stop it when you’re done.

  1. On the CloudFormation console, in the navigation pane, choose Stacks.
  2. Select the stack you launched (AmazonQ-UI-Demo), then choose Delete.

Conclusion

In this post, you learned how to integrate a custom UI with Amazon Q Business. Using a custom UI tailored to your specific needs and requirements makes Amazon Q more efficient and straightforward to use for your business. You can include your company branding and design, and have control and ownership over the user experience. For example, you could introduce custom feedback handling features.

The sample custom UI for Amazon Q discussed in this post is provided as open source—you can use it as a starting point for your own solution, and help improve it by contributing bug fixes and new features using GitHub pull requests. Explore the code, choose Watch in the GitHub repo to receive notifications about new releases, and check back for the latest updates. We welcome your suggestions for improvements and new features.

For more information on Amazon Q business, refer to the Amazon Q Business Developer Guide.


About the Authors

Ennio Emanuele Pastore is a Senior Architect on the AWS GenAI Labs team. He is an enthusiast of everything related to new technologies that have a positive impact on businesses and general livelihood. He helps organizations in achieving specific business outcomes by using data and AI, and accelerating their AWS Cloud adoption journey.

Deba is a Senior Architect on the AWS GenAI Labs team. He has extensive experience across big data, data science, and IoT, across consulting and industrials. He is an advocate of cloud-centered data and ML platforms and the value they can drive for customers across industries.

Joseph de Clerck is a senior Cloud Infrastructure Architect at AWS. He leverages his expertise to help enterprises solve their business challenges by effectively utilizing AWS services. His broad understanding of cloud technologies enables him to devise tailored solutions on topics such as analytics, security, infrastructure, and automation.

Read More

Scalable intelligent document processing using Amazon Bedrock

Scalable intelligent document processing using Amazon Bedrock

In today’s data-driven business landscape, the ability to efficiently extract and process information from a wide range of documents is crucial for informed decision-making and maintaining a competitive edge. However, traditional document processing workflows often involve complex and time-consuming manual tasks, hindering productivity and scalability.

In this post, we discuss an approach that uses the Anthropic Claude 3 Haiku model on Amazon Bedrock to enhance document processing capabilities. Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading artificial intelligence (AI) startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the AWS tools without having to manage any infrastructure.

At the heart of this solution lies the Anthropic Claude 3 Haiku model, the fastest and most affordable model in its intelligence class. With state-of-the-art vision capabilities and strong performance on industry benchmarks, Anthropic Claude 3 Haiku is a versatile solution for a wide range of enterprise applications. By using the advanced natural language processing (NLP) capabilities of Anthropic Claude 3 Haiku, our intelligent document processing (IDP) solution can extract valuable data directly from images, eliminating the need for complex postprocessing.

Scalable and efficient data extraction

Our solution overcomes the traditional limitations of document processing by addressing the following key challenges:

  • Simple prompt-based extraction – This solution allows you to define the specific data you need to extract from the documents through intuitive prompts. The Anthropic Claude 3 Haiku model then processes the documents and returns the desired information, streamlining the entire workflow.
  • Handling larger file sizes and multipage documents – To provide scalability and flexibility, this solution integrates additional AWS services to handle file sizes beyond the 5 MB limit of Anthropic Claude 3 Haiku. The solution can process both PDFs and image files, including multipage documents, providing comprehensive processing for unparalleled efficiency.

With the advanced NLP capabilities of the Anthropic Claude 3 Haiku model, our solution can directly extract the specific data you need without requiring complex postprocessing or parsing the output. This approach simplifies the workflow and enables more targeted and efficient document processing than traditional OCR-based solutions.

Confidence scores and human review

Maintaining data accuracy and quality is paramount in any document processing solution. This solution incorporates customizable rules, allowing you to define the criteria for invoking a human review. This provides a seamless collaboration between the automated extraction and human expertise, delivering high-quality results that meet your specific requirements.

In this post, we show how you can use Amazon Bedrock and Amazon Augmented AI (Amazon A2I) to build a workflow that enables multipage PDF document processing with a human reviewer loop.

Solution overview

The following architecture shows how you can have a serverless architecture to process multipage PDF documents or images with a human review. To implement this architecture, we take advantage of AWS Step Functions to build the overall workflow. As the workflow starts, it extracts individual pages from the multipage PDF document. It then uses the Map state to process multiple pages concurrently using the Amazon Bedrock API. After the data is extracted from the document, it validates against the business rules and sends the document to Amazon A2I for a human to review if any business rules fail. Reviewers use the Amazon A2I UI (a customizable website) to verify the extraction result. When the human review is complete, the callback task token is used to resume the state machine and store the output in an Amazon DynamoDB table.

You can deploy this solution following the steps in this post.

Prerequisites

For this walkthrough, you need the following:

Create an AWS Cloud9 IDE

We use an AWS Cloud9 integrated development environment (IDE) to deploy the solution. It provides a convenient way to access a full development and build environment. Complete the following steps:

  1. Sign in to the AWS Management Console through your AWS account.
  2. Select the AWS Region in which you want to deploy the solution.
  3. On the AWS Cloud9 console, choose Create environment.
  4. Name your environment mycloud9.
  5. Choose “t3.small” instance on the Amazon Linux2 platform.
  6. Choose Create.

AWS Cloud9 automatically creates and sets up a new Amazon Elastic Compute Cloud (Amazon EC2) instance in your account.

  1. When the environment is ready, select it and choose Open.

The AWS Cloud9 instance opens in a new terminal tab, as shown in the following screenshot.

Clone the source code to deploy the solution

Now that your AWS Cloud9 IDE is set up, you can proceed with the following steps to deploy the solution.

Confirm the Node.js version

AWS Cloud9 preinstalls Node.js. You can confirm the installed version by running the following command:

node --version

You should see output like the following:

V20.13.1

If you’re on v20.x or higher, you can skip to the steps in “Install the AWS CDK” section. If you’re on a different version of Node.js, complete the following steps:

  1. In an AWS Cloud9 terminal, run the following command to confirm you have the latest version of Node.js Version Manager (nvm) :
    curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.5/install.sh | bash

  2. Install Node.js 20:
    nvm install 20

  3. Confirm the current Node.js version by running the following command:
    node --version

Install the AWS CDK

Confirm whether you already have the AWS Cloud Development Kit (AWS CDK) installed. To do this, with the terminal session still open in the IDE, run the following command:

cdk --version

If the AWS CDK is installed, the output contains the AWS CDK version and build numbers. In this case, you can skip to the steps in “Download the source code” section. Otherwise, complete the following steps:

  1. Install the AWS CDK by running the npm command along with the install action, the name of the AWS CDK package to install, and the -g option to install the package globally in the environment:
    npm install -g aws-cdk

  2. To confirm that the AWS CDK is installed and correctly referenced, run the cdk command with the –version option:
    cdk --version

If successful, the AWS CDK version and build numbers are displayed.

Download the source code form the GitHub repo

Complete the following steps to download the source code:

  1. In an AWS Cloud9 terminal, clone the GitHub repo:
    git clone https://github.com/aws-samples/aws-generative-ai-document-processing-solution

  2. Run the following commands to create the Sharp npm package and copy the package to the source code:
    	mkdir -p ~/environment/sharplayer/nodejs && cd ~/environment/sharplayer/nodejs
    	npm init -y && npm install --arch=x64 --platform=linux sharp
    	cd .. && zip -r sharplayer.zip .
    	cp sharplayer.zip ~/environment/aws-generative-ai-document-processing-solution/deploy_code/multipagepdfa2i_imageresize/
    	cd .. && rm -r sharplayer
    

  3. Change to the repository directory:
    cd aws-generative-ai-document-processing-solution
    

  4. Run the following command:
    pip install -r requirements.txt

The first time you deploy an AWS CDK app into an environment for a specific AWS account and Region combination, you must install a bootstrap stack. This stack includes various resources that the AWS CDK needs to complete its operations. For example, this stack includes an Amazon Simple Storage Service (Amazon S3) bucket that the AWS CDK uses to store templates and assets during its deployment processes.

  1. To install the bootstrap stack, run the following command:
    cdk bootstrap

  2. From the project’s root directory, run the following command to deploy the stack:
    cdk deploy

If successful, the output displays that the stack deployed without errors.

The last step is to update the cross-origin resource sharing (CORS) for the S3 bucket.

  1. On the Amazon S3 console, choose Buckets in the navigation pane.
  2. Choose the name of the bucket that was created in the AWS CDK deployment step. It should have a name format like multipagepdfa2i-multipagepdf-xxxxxxxxx.
  3. Choose Permissions.
  4. In the Cross-origin resource sharing (CORS) section, choose Edit.
  5. In the CORS configuration editor text box, enter the following CORS configuration:
    [
         {
             "AllowedHeaders": [
                 "Authorization"
             ],
             "AllowedMethods": [
                 "GET",
                 "HEAD"
             ],
             "AllowedOrigins": [
                 "*"
             ],
             "ExposeHeaders": [
                 "Access-Control-Allow-Origin"
             ]
         }
     ]
    

  6. Choose Save changes.

Create a private work team

A work team is a group of people you select to review your documents. You can create a work team from a workforce, which is made up of Amazon Mechanical Turk workers, vendor-managed workers, or your own private workers that you invite to work on your tasks. Whichever workforce type you choose, Amazon A2I takes care of sending tasks to workers. For this solution, you create a work team using a private workforce and add yourself to the team to preview the Amazon A2I workflow.

To create and manage your private workforce, you can use the Amazon SageMaker console. You can create a private workforce by entering worker emails or importing a preexisting workforce from an Amazon Cognito user pool.

To create your private work team, complete the following steps:

  1. On the SageMaker console, choose Labeling workforces under Ground Truth in the navigation pane.
  2. On the Private tab, choose Create private team.
  3. Choose Invite new workers by email.
  4. In the Email addresses box, enter the email addresses for your work team (for this post, enter your email address).

You can enter a list of up to 50 email addresses, separated by commas.

  1. Enter an organization name and contact email.
  2. Choose Create private team.

After you create the private team, you get an email invitation. The following screenshot shows an example email.

After you choose the link and change your password, you will be registered as a verified worker for this team. The following screenshot shows the updated information on the Private tab.

Your one-person team is now ready, and you can create a human review workflow.

Create a human review workflow

You define the business conditions under which the Amazon Bedrock extracted content should go to a human for review. These business conditions are set in Parameter Store, a capability of AWS Systems Manager. For example, you can look for specific keys in the document. When the extraction is complete, in the AWS Lambda function, check for those keys and their values. If the key is not present or the value is blank, the form will go for human review.

Complete the following steps to create a worker task template for your document review task:

  1. On the SageMaker console, choose Worker task templates under Augmented AI in the navigation pane.
  2. Choose Create template.
  3. In the template properties section, enter a unique template name for Template name and select Custom for Template type.
  4. Copy the contents from the Custom template file you downloaded from GitHub repo and replace the content in the Template editor section.
  5. Choose Create and the template will be created successfully.

Next, you create instructions to help workers complete your document review task.

  1. Choose Human review workflows under Augmented AI in the navigation pane.
  2. Choose Create human review workflow.
  3. In the Workflow settings section, for Name, enter a unique workflow name.
  4. For S3 bucket, enter the S3 bucket that was created in the AWS CDK deployment step. It should have a name format like multipagepdfa2i-multipagepdf-xxxxxxxxx.

This bucket is where Amazon A2I will store the human review results.

  1. For IAM role, choose Create a new role for Amazon A2I to create a role automatically for you.
    • For S3 buckets you specify, select Specific S3 buckets.
    • Enter the S3 bucket you specified earlier in Step 9; for example, multipagepdfa2i-multipagepdf-xxxxxxxxxx.
    • Choose Create.

You see a confirmation when role creation is complete, and your role is now pre-populated on the IAM role dropdown menu.

  1. For Task type, select Custom.
  2. In the worker task template section, choose the template that you previously created.
  3. For Task Description, enter “Review the extracted content from the document and make changes as needed”.
  4. For Worker types, select Private.
  5. For Private teams, choose the work team you created earlier.
  6. Choose Create.

You’re redirected to the Human review workflows page, where you will see a confirmation message.

In a few seconds, the status of the workflow will be changed to active. Record your new human review workflow ARN, which you use to configure your human loop in a later step.

Update the solution with the human review workflow

You’re now ready to add your human review workflow Amazon Resource Name (ARN):

  1. Within the code you downloaded from GitHub repo, open the file
    /aws-generative-ai-document-processing-solution/multipagepdfa2i/multipagepdfa2i_stack.py. 

  2. Update line 23 with the ARN that you copied earlier:
    SAGEMAKER_WORKFLOW_AUGMENTED_AI_ARN_EV = "arn:aws:sagemaker:us-east-1:XXXXXXXXXXX:....

  3. Save the changes you made.
  4. Deploy by entering the following command:
    cdk deploy

Test the solution without business rules validation

To test the solution without using a human review, create a folder called uploads in the S3 bucket multipagepdfa2i-multipagepdf-xxxxxxxxx and upload the sample PDF document provided. For example, uploads/Vital-records-birth-application.pdf.

The content will be extracted, and you will see the data in the DynamoDB table
multipagepdfa2i-ddbtableVitalBirthDataXXXXX
.

Test the solution with business rules validation

Complete the following steps to test the solution with a human review:

  1. On the Systems Manager console , choose Parameter Store in the navigation pane.
  2. Select the Parameter /business_rules/validationrequied and update the value to yes.
  3. upload the sample PDF document provided to the uploads folder that you created earlier in the S3 bucket multipagepdfa2i-multipagepdf-xxxxxxxxx
  4. On the SageMaker console, choose Labeling workforces under Ground Truth in the navigation pane.
  5. On the Private tab, choose the link under Labeling portal sign-in URL.
  6. Sign in with the account you configured with Amazon Cognito.
  7. Select the job you want to complete and choose Start working.

In the reviewer UI, you will see instructions and the document to work on. You can use the toolbox to zoom in and out, fit image, and reposition the document.

This UI is specifically designed for document-processing tasks. On the right side of the preceding screenshot, the extracted data is automatically prefilled with the Amazon Bedrock response. As a worker, you can quickly refer to this sidebar to make sure the extracted information is identified correctly.

When you complete the human review, you will see the data in the DynamoDB table
multipagepdfa2i-ddbtableVitalBirthDataXXXXX
.

Conclusion

In this post, we showed you how to use the Anthropic Claude 3 Haiku model on Amazon Bedrock and Amazon A2I to automatically extract data from multipage PDF documents and images. We also demonstrated how to conduct a human review of the pages for given business criteria. By eliminating the need for complex postprocessing, handling larger file sizes, and integrating a flexible human review process, this solution can help your business unlock the true value of your documents, drive informed decision-making, and gain a competitive edge in the market.

Overall, this post provides a roadmap for building an scalable document processing workflow using Anthropic Claude models on Amazon Bedrock.

As next steps, check out What is Amazon Bedrock to start using the service. Follow the Amazon Bedrock on the AWS Machine Learning Blog to keep up to date with new capabilities and use cases for Amazon Bedrock.


About the Authors

Venkata Kampana is a Senior Solutions Architect in the AWS Health and Human Services team and is based in Sacramento, CA. In that role, he helps public sector customers achieve their mission objectives with well-architected solutions on AWS.

Jim Daniel is the Public Health lead at Amazon Web Services. Previously, he held positions with the United States Department of Health and Human Services for nearly a decade, including Director of Public Health Innovation and Public Health Coordinator. Before his government service, Jim served as the Chief Information Officer for the Massachusetts Department of Public Health.

Read More

Use weather data to improve forecasts with Amazon SageMaker Canvas

Use weather data to improve forecasts with Amazon SageMaker Canvas

Photo by Zbynek Burival on Unsplash

Photo by Zbynek Burival on Unsplash

Time series forecasting is a specific machine learning (ML) discipline that enables organizations to make informed planning decisions. The main idea is to supply historic data to an ML algorithm that can identify patterns from the past and then use those patterns to estimate likely values about unseen periods in the future.

Amazon has a long heritage of using time series forecasting, dating back to the early days of having to meet mail-order book demand. Fast forward more than a quarter century and advanced forecasting using modern ML algorithms is offered to customers through Amazon SageMaker Canvas, a no-code workspace for all phases of ML. SageMaker Canvas enables you to prepare data using natural language, build and train highly accurate models, generate predictions, and deploy models to production—all without writing a single line of code.

In this post, we describe how to use weather data to build and implement a forecasting cycle that you can use to elevate your business’ planning capabilities.

Business use cases for time series forecasting

Today, companies of every size and industry who invest in forecasting capabilities can improve outcomes—whether measured financially or in customer satisfaction—compared to using intuition-based estimation. Regardless of industry, every customer desires highly accurate models that can maximize their outcome. Here, accuracy means that future estimates produced by the ML model end up being as close as possible to the actual future. If the ML model estimates either too high or too low, it can reduce the effectiveness the business was hoping to achieve.

To maximize accuracy, ML models benefit from rich, quality data that reflects demand patterns, including cycles of highs and lows, and periods of stability. The shape of these historic patterns may be driven by several factors. Examples include seasonality, marketing promotions, pricing, and in-stock availability for retail sales, or temperature, length of daylight, or special events for utility demand. Local, regional, and world factors such as commodity prices, financial markets, and events such as COVID-19 can also change demand trajectory.

Weather is a key factor that can influence forecasts in many domains, and comes in long-term and short-term varieties. The following are just a few examples of how weather can affect time series estimates:

  • Energy companies use temperature forecasts to predict energy demand and manage supply accordingly. Warmer weather and sunny days can drive up demand for air conditioning.
  • Agribusinesses forecast crop yields using weather data like rainfall, temperature, humidity, and more. This helps optimize planting, harvesting, and pricing decisions.
  • Outdoor events might be influenced by short-term weather forecasts such as rain, heat, or storms that could change attendance, fresh prepared food needs, staffing, and more.
  • Airlines use weather forecasts to schedule staff and equipment efficiently. Bad weather can cause flight delays and cancellations.

If weather has an influence on your business planning, it’s important to use weather signals from both the past and the future to help inform your planning. The remaining portion of this post discusses how you can source, prepare, and use weather data to help improve and inform your journey.

Find a weather data provider

First, if you have not already done so, you will need to find a weather data provider. There are many providers that offer a wide variety of capabilities. The following are just a few things to consider as you select a provider:

  • Price – Some providers offer free weather data, some offer subscriptions, and some offer meter-based packages.
  • Information capture method – Some providers allow you to download data in bulk, whereas others enable you to fetch data in real time through programmatic API calls.
  • Time resolution – Depending in your business, you might need weather at the hourly level, daily level, or other interval. Make sure the provider you choose provides data at the right level of control to manage your business decisions.
  • Time coverage – It’s important to select a provider based on their ability to provide historic and future forecasts aligned with your data. If you have 3 years of your own history, then find a provider that has that amount of history too. If you’re an outdoor stadium manager who needs to know weather for several days ahead, select a provider that has a weather forecast out as far as you need to plan. If you’re a farmer, you might need a long-term seasonal forecast, so your data provider should have future-dated data in line with your forecast horizon.
  • Geography – Different providers have data coverage for different parts of the world, including both land and sea coverage. Providers may have information at GPS coordinates, ZIP code level, or other. Energy companies might seek to have weather by GPS coordinates, enabling them to personalize weather forecasts to their meter locations.
  • Weather features – There are many weather-related features available, including not only the temperature, but other key data points such as precipitation, solar index, pressure, lightning, air quality, and pollen, to name a few.

In making the provider choice, be sure to conduct your own independent search and perform due diligence. Selecting the right provider is crucial and can be a long-term decision. Ultimately, you will decide on one or more providers that are a best fit for your unique needs.

Build a weather ingestion process

After you have identified a weather data provider, you need to develop a process to harvest their data, which will be blended with your historic data. In addition to building a time series model, SageMaker Canvas is able to help build your weather data processing pipeline. The automated process might have the following steps, generally, though your use case might vary:

  1. Identify your locations – In your data, you will need to identify all the unique locations through time, whether by postal code, address, or GPS coordinates. In some cases, you may need to geocode your data, for example convert a mailing address to GPS coordinates. You can use Amazon Location Service to assist with this conversion, as needed. Ideally, if you do geocode, you should only need to do this one time, and retain the GPS coordinates for your postal code or address.
  2. Acquire weather data – For each of your locations, you should acquire historic data and persist this information so you only need to retrieve it one time.
  3. Store weather data – For each of your locations, you need to develop a process to harvest future-dated weather predictions, as part of your pipeline to build an ML model. AWS has many databases to help store your data, including cost-effective data lakes on Amazon Simple Storage Service (Amazon S3).
  4. Normalize weather data – Prior to moving to the next step, it’s important to make all weather data relative to location and set on the same scale. Barometric pressure can have values in the 1000+ range; temperature exists on another scale. Pollen, ultraviolet light, and other weather measures also have independent scales. Within a geography, any measure is relative to that location’s own normal. In this post, we demonstrate how to normalize weather features for each location to help make sure no feature has bias over another, and to help maximize the effectiveness of weather data on a global basis.
  5. Combine internal business data with external weather data – As part of your time series pipeline, you will need to harvest historical business data to train a model. First, you will extract data, such as weekly sales data by product sold and by retail store for the last 4 years.

Don’t be surprised if your company needs several forecasts that are independent and concurrent. Each forecast can offer multiple perspectives to help navigate. For example, you may have a short-term weather forecast to make sure weather-volatile products are stocked. In addition, a medium-term forecast can help make replenishment decisions. Finally, you can use a long-term forecast to estimate growth of the company or make seasonal buying decisions that require long lead times.

At this point, you will combine weather and business data together by joining (or merging) them together using time and location. An example follows in the next section.

Example weather ingestion process

The following screenshot and code snippet show an example of using SageMaker Canvas to geocode location data using Amazon Location Service.

This process submits a location to Amazon Location Service and receives a response in the form of latitude and longitude. The example provides a city as input—but your use cases should provide postal codes or specific street addresses depending on your need for location precision. As guidance, take care to persist the responses in a data store, so you aren’t continuously performing geocoding on the same locations each forecasting cycle. Instead, determine which locations you have not geocoded and only perform those. The latitude and longitude are important and are used in a later step to request weather data from your selected provider.

import json, boto3
from pyspark.sql.functions import col, udf
import pyspark.sql.types as types

def obtain_lat_long(place_search):
   location = boto3.client('location')
   response = location.search_place_index_for_text(IndexName = 'myplaceindex', Text = str(place_search))
   return (response['Results'][0]['Place']['Geometry']['Point'])

UDF = udf(lambda z: obtain_lat_long(z),
types.StructType([types.StructField('longitude', types.DoubleType()),
types.StructField('latitude', types.DoubleType())
]))

# use the UDF to create a struct column with lat and long
df = df.withColumn('lat_long', UDF(col('Location')))
# extract the lat and long from the struct column
df = df.withColumn("latitude", col("lat_long.latitude"))
df = df.withColumn("longitude", col("lat_long.longitude"))
df = df.drop('lat_long')

In the following screenshots, we show an example of calling a weather provider using the latitude and longitude. Each provider will have differing capabilities, which is why selecting a provider is an important consideration. The example we show in this post could be used for historical weather capture as well as future-dated weather forecast capture.

The following screenshot shows an example of using SageMaker Canvas to connect to a weather provider and retrieve weather data.

The following code snippet illustrates how you might provide a latitude and longitude pair to a weather provider, along with parameters such as specific types of weather features, time periods, and time resolution. In this example, a request for temperature and Barometric pressure is made. The data is requested at the hourly level for the next day ahead. Your use case will vary; consider this as an example.

import requests, json
from pyspark.sql.functions import col, udf

def get_weather_data(latitude, longitude):

    params = {
        "latitude": str(latitude),
        "longitude": str(longitude),
        "hourly" : "temperature_2m,surface_pressure",
        "forecast_days": 1
    }

    response = requests.get(url= weather_provider_url, params=params)

return response.content.decode('utf-8')

UDF = udf(lambda latitude,longitude: get_weather_data(latitude, longitude))
df = df.withColumn('weather_response', UDF(col('latitude'), col('longitude')))

After you retrieve the weather data, the next step is to convert structured weather provider data into a tabular set of data. As you can see in the following screenshot, temperature and pressure data are available at the hourly level by location. This will enable you to join the weather data alongside your historic demand data. It’s important you use future-dated weather data to train your model. Without future-dated data, there is no basis to use weather to help inform what might lie ahead.

The following code snippet is from the preceding screenshot. This code converts the weather provider nested JSON array into tabular features:

from pyspark.sql.functions import from_json, struct, col, regexp_replace, cast
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType, ArrayType, MapType, LongType
from pyspark.sql.functions import explode, arrays_zip, array

json_schema = StructType([
        StructField("hourly", StructType([
        StructField("time", ArrayType(StringType()), True),
        StructField("temperature_2m", ArrayType(DoubleType()), True),
        StructField("surface_pressure", ArrayType(DoubleType()), True)
    ]), True)
])

#parse string into structure
df = df.withColumn("weather_response", from_json(col("weather_response"), json_schema))

#extract feature arrays
df = df.withColumn("time",col("weather_response.hourly.time"))
df = df.withColumn("temperature_2m",col("weather_response.hourly.temperature_2m"))
df = df.withColumn("surface_pressure",col("weather_response.hourly.surface_pressure"))

#explode all arrays together
df = df.withColumn("zipped", arrays_zip("surface_pressure", "temperature_2m", "time")) 
  .withColumn("exploded", explode("zipped")) 
  .select("Location", "exploded.time", "exploded.surface_pressure", "exploded.temperature_2m")

#cleanup format of timestamp
df = df.withColumn("time", regexp_replace(col("time"), "T", " "))

In this next step, we demonstrate how to set all weather features on the same scale—a scale that is also sensitive to each location’s range of values. In the preceding screenshot, observe how pressure and temperature in Seattle are on different scales. Temperature in Celsius is single or double digits, and pressure exceeds 1,000. Seattle may also have different ranges than any other city, as the result of its unique climate, natural topology, and geographic position. In this normalization step, the goal is to bring all weather features on a same scale, so pressure doesn’t outweigh temperature. We also want to place Seattle on its own scale, Mumbai on its own scale, and so forth. In the following screenshot, the minimum and maximum values per location are obtained. These are important intermediate computations for scaling, where weather values are set based on their position in the observed range by geography.

With the extreme values computed per location, a data frame with row-level values can be joined to a data frame with minimum and maximum values on locations being equal. The result is scaled data, according to a normalization formula that follows with example code.

First, this code example computes the minimum and maximum weather values per location. Next, the range is computed. Finally, a data frame is created with the location, range, and minimum per weather feature. Maximum is not needed because the range can be used as part of the normalization formula. See the following code:

from pyspark.sql.functions import min,max, expr, sum

df = df.groupBy("Location") 
	.agg(min("surface_pressure").alias("min_surface_pressure"), 
		max("surface_pressure").alias("max_surface_pressure"), 
		min("temperature_2m").alias("min_temperature_2m"), 
		max("temperature_2m").alias("max_temperature_2m")
		)

df = df.withColumn("range_surface_pressure",
	df.max_surface_pressure-df.min_surface_pressure)

df = df.withColumn("range_temperature_2m",
	df.max_temperature_2m-df.min_temperature_2m)

df = df.select("Location", 
	"range_surface_pressure", "min_surface_pressure", 
	"range_temperature_2m","min_temperature_2m" 
    )

In this code snippet, the scaled value is computed according the normalization formula shown. The minimum value is being subtracted from the actual value, at each time interval. Next, the difference is divided by the range. In the previous screenshot, you can see values range on a 0–1 scale. Zero is the lowest observed value for the location; 1 is the highest observed value for the location, for all the time periods where data exists.

Here, we compute the scaled x, represented as x’ :

from pyspark.sql.functions import round

df = df.withColumnRenamed('Location_0','Location')

df = df.withColumn('scaled_temperature_2m',
                     (df.temperature_2m-df.min_temperature_2m) / 
                         df.range_temperature_2m)

df = df.withColumn('scaled_surface_pressure',
                     (df.surface_pressure-df.min_surface_pressure) / 
                         df.range_surface_pressure)

df = df.drop('Location_1','min_surface_pressure','range_surface_pressure',
            'min_temperature_2m','range_temperature_2m')

Build a forecasting workflow with SageMaker Canvas

With your historic data and weather data now available to you, the next step is to bring your business data and prepared weather data together to build your time series model. The following high-level steps are required:

  1. Combine weather data with your historic data on a point-in-time and location basis. Your actual data will end, but the weather data should extend out to the end of your horizon.

This is a crucial point—weather data can only help your forecast if it’s included in your future forecast horizon. The following screenshot illustrates weather data alongside business demand data. For each item and location, known historic unit demand and weather features are provided. The red boxes added to the screenshot highlight the concept of future data, where weather data is provided, yet future demand is not provided because it remains unknown.

  1. After your data is prepared, you can use SageMaker Canvas to build a time series model with a few-clicks—no coding required.

As you get started, you should build a time series model in Canvas with and without weather data. This will let you quickly quantify how much of an impact weather data has for your forecast. You may find that some items are more impacted by weather than others.

  1. After you add the weather data, use SageMaker Canvas feature importance scores to quantify which weather features are important, and retain these in the future. For example, if pollen value has no lift in accuracy but barometric pressure does, you can eliminate the pollen data feature to keep your process as simple as possible.

As an alternate to using a visual interface, we have also created a sample notebook on GitHub that demonstrates how to use SageMaker Canvas AutoML capabilities as an API. This method can be useful when your business prefers to orchestrate forecasting through programmatic APIs.

Clean up

Choose Log out in the left pane to log out of the Amazon SageMaker Canvas application to stop the consumption of SageMaker Canvas workspace instance hours. This will release all resources used by the workspace instance.

Conclusion

In this post, we discussed the importance of time series forecasting to business, and focused on how you can use weather data to build a more accurate forecasting model in certain cases. This post described key factors you should consider when finding a weather data provider and how to build a pipeline that sources and stages the external data, so that it can be combined with your existing data, on a time-and-place basis. Next, we discussed how to use SageMaker Canvas to combine these datasets and train a time series ML model with no coding required. Finally, we suggested that you compare a model with and without weather data so you can quantify the impact and also learn which weather features drive your business decisions.

If you’re ready to start this journey, or improve on an existing forecast method, reach out to your AWS account team and ask for an Amazon SageMaker Canvas Immersion Day. You can gain hands-on experience and learn how to apply ML to improve forecasting outcomes in your business.


About the Author

Charles Laughlin is a Principal AI Specialist at Amazon Web Services (AWS). Charles holds an MS in Supply Chain Management and a PhD in Data Science. Charles works in the Amazon SageMaker service team where he brings research and voice of the customer to inform the service roadmap. In his work, he collaborates daily with diverse AWS customers to help transform their businesses with cutting-edge AWS technologies and thought leadership.

Read More

Reimagining software development with the Amazon Q Developer Agent

Reimagining software development with the Amazon Q Developer Agent

Amazon Q Developer is an AI-powered assistant for software development that reimagines the experience across the entire software development lifecycle, making it faster to build, secure, manage, and optimize applications on or off of AWS. The Amazon Q Developer Agent includes an agent for feature development that automatically implements multi-file features, bug fixes, and unit tests in your integrated development environment (IDE) workspace using natural language input. After you enter your query, the software development agent analyzes your code base and formulates a plan to fulfill the request. You can accept the plan or ask the agent to iterate on it. After the plan is validated, the agent generates the code changes needed to implement the feature you requested. You can then review and accept the code changes or request a revision.

Amazon Q Developer uses generative artificial intelligence (AI) to deliver state-of-the-art accuracy for all developers, taking first place on the leaderboard for SWE-bench, a dataset that tests a system’s ability to automatically resolve GitHub issues. This post describes how to get started with the software development agent, gives an overview of how the agent works, and discusses its performance on public benchmarks. We also delve into the process of getting started with the Amazon Q Developer Agent and give an overview of the underlying mechanisms that make it a state-of-the-art feature development agent.

Getting started

To get started, you need to have an AWS Builder ID or be part of an organization with an AWS IAM Identity Center instance set up that allows you to use Amazon Q. To use Amazon Q Developer Agent for feature development in Visual Studio Code, start by installing the Amazon Q extension. The extension is also available for JetBrains, Visual Studio (in preview), and in the Command Line on macOS. Find the latest version on the Amazon Q Developer page.

Amazon Q App card in VS Code

After authenticating, you can invoke the feature development agent by entering /dev in the chat field.

Invoking /dev in Amazon Q

The feature development agent is now ready for your requests. Let’s use the repository of Amazon’s Chronos forecasting model to demonstrate how the agent works. The code for Chronos is already of high quality, but unit test coverage could be improved in places. Let’s ask the software development agent to improve the unit test coverage of the file chronos.py. Stating your request as clearly and precisely as you can will help the agent deliver the best possible solution.

/dev initial prompt

The agent returns a detailed plan to add missing tests in the existing test suite test/test_chronos.py. To generate the plan (and later the code change), the agent has explored your code base to understand how to satisfy your request. The agent will work best if the names of files and functions are descriptive of their intent.

Plan generated by the agent

You are asked to review the plan. If the plan looks good and you want to proceed, choose Generate code. If you find that it can be improved in places, you can provide feedback and request an improved plan.

The agent asking for plan validation

After the code is generated, the software development agent will list the files for which it has created a diff (for this post, test/test_chronos.py). You can review the code changes and decide to either insert them in your code base or provide feedback on possible improvements and regenerate the code.

List of files changed by the agent.

Choosing a modified file opens a diff view in the IDE showing which lines have been added or modified. The agent has added multiple unit tests for parts of chronos.py that were not previously covered.

the diff generated by the agent.

After you review the code changes, you can decide to insert them, provide feedback to generate the code again, or discard it altogether. That’s it; there is nothing else for you to do. If you want to request another feature, invoke dev again in Amazon Q Developer.

System overview

Now that we have shown you how to use Amazon Q Developer Agent for software development, let’s explore how it works. This is an overview of the system as of May 2024. The agent is continuously being improved. The logic described in this section will evolve and change.

When you submit a query, the agent generates a structured representation of the repository’s file system in XML. The following is an example output, truncated for brevity:

<tree>
  <directory name="requests">
    <file name="README.rst"/>
    <directory name="requests">
      <file name="adapters.py"/>
      <file name="api.py"/>
      <file name="models.py"/>
      <directory name="packages">
        <directory name="chardet">
          <file name="charsetprober.py"/>
          <file name="codingstatemachine.py"/>
        </directory>
        <file name="__init__.py"/>
        <file name="README.rst"/>
        <directory name="urllib3">
          <file name="connectionpool.py"/>
          <file name="connection.py"/>
          <file name="exceptions.py"/>
          <file name="fields.py"/>
          <file name="filepost.py"/>
          <file name="__init__.py"/>
        </directory>
      </directory>
    </directory>
    <file name="setup.cfg"/>
    <file name="setup.py"/>
  </directory>
</tree>

An LLM then uses this representation with your query to determine which files are relevant and should be retrieved. We use automated systems to check that the files identified by the LLM are all valid. The agent uses the retrieved files with your query to generate a plan for how it will resolve the task you have assigned to it. This plan is returned to you for validation or iteration. After you validate the plan, the agent moves to the next step, which ultimately ends with a proposed code change to resolve the issue.

The content of each retrieved code file is parsed with a syntax parser to obtain an XML syntax tree representation of the code, which the LLM is capable of using more efficiently than the source code itself while using far fewer tokens. The following is an example of that representation. Non-code files are encoded and chunked using a logic commonly used in Retrieval Augmented Generation (RAG) systems to allow for the efficient retrieval of chunks of documentation.

The following screenshot shows a chunk of Python code.

A snippet of Python code

The following is its syntax tree representation.

A syntax tree representation of python code

The LLM is prompted again with the problem statement, the plan, and the XML tree structure of each of the retrieved files to identify the line ranges that need updating in order to resolve the issue. This approach allows you to be more frugal with your usage of LLM bandwidth.

The software development agent is now ready to generate the code that will resolve your issue. The LLM directly rewrites sections of code, rather than attempting to generate a patch. This task is much closer to those that the LLM was optimized to perform compared to attempting to directly generate a patch. The agent proceeds to some syntactic validation of the generated code and attempts to fix issues before moving to the final step. The original and rewritten code are passed to a diff library to generate a patch programmatically. This creates the final output that is then shared with you to review and accept.

System accuracy

In the press release announcing the launch of Amazon Q Developer Agent for feature development, we shared that the model scored 13.82% on SWE-bench and 20.33% on SWE-bench lite, putting it at the top of the SWE-bench leaderboard as of May 2024. SWE-bench is a public dataset of over 2,000 tasks from 12 popular Python open source repositories. The key metric reported in the leaderboard of SWE-bench is the pass rate: how often we see all the unit tests associated to a specific issue passing after an AI-generated code changes are applied. This is an important metric because our customers want to use the agent to solve real-world problems and we are proud to report a state-of-the-art pass rate.

A single metric never tells the whole story. We look at the performance of our agent as a point on the Pareto front over multiple metrics. The Amazon Q Developer Agent for software development is not specifically optimized for SWE-bench. Our approach focuses on optimizing for a range of metrics and datasets. For instance, we aim to strike a balance between accuracy and resource efficiency, such as the number of LLMs calls and input/output tokens used, because this directly impacts runtime and cost. In this regard, we take pride in our solution’s ability to consistently deliver results within minutes.

Limitations of public benchmarks

Public benchmarks such as SWE-bench are an incredibly useful contribution to the AI code generation community and present an interesting scientific challenge. We are grateful to the team releasing and maintaining this benchmark. We are proud to be able to share our state-of-the-art results on this benchmark. Nonetheless, we would like to call out a few limitations, which are not exclusive to SWE-bench.

The success metric for SWE-bench is binary. Either a code change passes all tests or it does not. We believe that this doesn’t capture the full value feature development agents can generate for developers. Agents save developers a lot of time even when they don’t implement the entirety of a feature at once. Latency, cost, number of LLM calls, and number of tokens are all highly correlated metrics that represent the computational complexity of a solution. This dimension is as important as accuracy for our customers.

The test cases included in the SWE-bench benchmark are publicly available on GitHub. As such, it’s possible that these test cases may have been used in the training data of various large language models. Although LLMs have the capability to memorize portions of their training data, it’s challenging to quantify the extent to which this memorization occurs and whether the models are inadvertently leaking this information during testing.

To investigate this potential concern, we have conducted multiple experiments to evaluate the possibility of data leakage across different popular models. One approach to testing memorization involves asking the models to predict the next line of an issue description given a very short context. This is a task that they should theoretically struggle with in the absence of memorization. Our findings indicate that recent models exhibit signs of having been trained on the SWE-bench dataset.

The following figure shows the distribution of rougeL scores when asking each model to complete the next sentence of an SWE-bench issue description given the preceding sentences.

RougeL scores to measure information leakage of SWE-bench on different models.

We have shared measurements of the performance of our software development agent on SWE-bench to offer a reference point. We recommend testing the agents on private code repositories that haven’t been used in the training of any LLMs and compare these results with the ones of publicly available baselines. We will continue benchmarking our system on SWE-bench while focusing our testing on private benchmarking datasets that haven’t been used to train models and that better represent the tasks submitted by our customers.

Conclusion

This post discussed how to get started with Amazon Q Developer Agent for software development. The agent automatically implements features that you describe with natural language in your IDE. We gave you an overview of how the agent works behind the scenes and discussed its state-of-the-art accuracy and position at the top of the SWE-bench leaderboard.

You are now ready to explore the capabilities of Amazon Q Developer Agent for software development and make it your personal AI coding assistant! Install the Amazon Q plugin in your IDE of choice and start using Amazon Q (including the software development agent) for free using your AWS Builder ID or subscribe to Amazon Q to unlock higher limits.


About the authors

Christian Bock is an applied scientist at Amazon Web Services working on AI for code.

Laurent Callot is a Principal Applied Scientist at Amazon Web Services leading teams creating AI solutions for developers.

Tim Esler is a Senior Applied Scientist at Amazon Web Services working on Generative AI and Coding Agents for building developer tools and foundational tooling for Amazon Q products.

Prabhu Teja is an Applied Scientist at Amazon Web Services. Prabhu works on LLM assisted code generation with a focus on natural language interaction.

Martin Wistuba is a senior applied scientist at Amazon Web Services. As part of Amazon Q Developer, he is helping developers to write more code in less time.

Giovanni Zappella is a Principal Applied Scientist working on the creations of intelligent agents for code generation. While at Amazon he also contributed to the creation of new algorithms for Continual Learning, AutoML and recommendations systems.

Read More