The Weather Company enhances MLOps with Amazon SageMaker, AWS CloudFormation, and Amazon CloudWatch

The Weather Company enhances MLOps with Amazon SageMaker, AWS CloudFormation, and Amazon CloudWatch

This blog post is co-written with Qaish Kanchwala  from The Weather Company.

As industries begin adopting processes dependent on machine learning (ML) technologies, it is critical to establish machine learning operations (MLOps) that scale to support growth and utilization of this technology. MLOps practitioners have many options to establish an MLOps platform; one among them is cloud-based integrated platforms that scale with data science teams. AWS provides a full-stack of services to establish an MLOps platform in the cloud that is customizable to your needs while reaping all the benefits of doing ML in the cloud.

In this post, we share the story of how The Weather Company (TWCo) enhanced its MLOps platform using services such as Amazon SageMaker, AWS CloudFormation, and Amazon CloudWatch. TWCo data scientists and ML engineers took advantage of automation, detailed experiment tracking, integrated training, and deployment pipelines to help scale MLOps effectively. TWCo reduced infrastructure management time by 90% while also reducing model deployment time by 20%.

The need for MLOps at TWCo

TWCo strives to help consumers and businesses make informed, more confident decisions based on weather. Although the organization has used ML in its weather forecasting process for decades to help translate billions of weather data points into actionable forecasts and insights, it continuously strives to innovate and incorporate leading-edge technology in other ways as well. TWCo’s data science team was looking to create predictive, privacy-friendly ML models that show how weather conditions affect certain health symptoms and create user segments for improved user experience.

TWCo was looking to scale its ML operations with more transparency and less complexity to allow for more manageable ML workflows as their data science team grew. There were noticeable challenges when running ML workflows in the cloud. TWCo’s existing Cloud environment lacked transparency for ML jobs, monitoring, and a feature store, which made it hard for users to collaborate. Managers lacked the visibility needed for ongoing monitoring of ML workflows. To address these pain points, TWCo worked with the AWS Machine Learning Solutions Lab (MLSL) to migrate these ML workflows to Amazon SageMaker and the AWS Cloud. The MLSL team collaborated with TWCo to design an MLOps platform to meet the needs of its data science team, factoring present and future growth.

Examples of business objectives set by TWCo for this collaboration are:

  • Achieve quicker reaction to the market and faster ML development cycles
  • Accelerate TWCo migration of their ML workloads to SageMaker
  • Improve end user experience through adoption of manage services
  • Reduce time spent by engineers in maintenance and upkeep of the underlying ML infrastructure

Functional objectives were set to measure the impact of MLOps platform users, including:

  • Improve the data science team’s efficiency in model training tasks
  • Decrease the number of steps required to deploy new models
  • Reduce the end-to-end model pipeline runtime

Solution overview

The solution uses the following AWS services:

  • AWS CloudFormation – Infrastructure as code (IaC) service to provision most templates and assets.
  • AWS CloudTrail – Monitors and records account activity across AWS infrastructure.
  • Amazon CloudWatch – Collects and visualizes real-time logs that provide the basis for automation.
  • AWS CodeBuild – Fully managed continuous integration service to compile source code, runs tests, and produces ready-to-deploy software. Used to deploy training and inference code.
  • AWS CodeCommit – Managed sourced control repository that stores MLOps infrastructure code and IaC code.
  • AWS CodePipeline – Fully managed continuous delivery service that helps automate the release of pipelines.
  • Amazon SageMaker – Fully managed ML platform to perform ML workflows from exploring data, training, and deploying models.
  • AWS Service Catalog – Centrally manages cloud resources such as IaC templates used for MLOps projects.
  • Amazon Simple Storage Service (Amazon S3) – Cloud object storage to store data for training and testing.

The following diagram illustrates the solution architecture.

MLOps architecture for customer

This architecture consists of two primary pipelines:

  • Training pipeline – The training pipeline is designed to work with features and labels stored as a CSV-formatted file on Amazon S3. It involves several components, including Preprocess, Train, and Evaluate. After training the model, its associated artifacts are registered with the Amazon SageMaker Model Registry through the Register Model component. The Data Quality Check part of the pipeline creates baseline statistics for the monitoring task in the inference pipeline.
  • Inference pipeline – The inference pipeline handles on-demand batch inference and monitoring tasks. Within this pipeline, SageMaker on-demand Data Quality Monitor steps are incorporated to detect any drift when compared to the input data. The monitoring results are stored in Amazon S3 and published as a CloudWatch metric, and can be used to set up an alarm. The alarm is used later to invoke training, send automatic emails, or any other desired action.

The proposed MLOps architecture includes flexibility to support different use cases, as well as collaboration between various team personas like data scientists and ML engineers. The architecture reduces the friction between cross-functional teams moving models to production.

ML model experimentation is one of the sub-components of the MLOps architecture. It improves data scientists’ productivity and model development processes. Examples of model experimentation on MLOps-related SageMaker services require features like Amazon SageMaker Pipelines, Amazon SageMaker Feature Store, and SageMaker Model Registry using the SageMaker SDK and AWS Boto3 libraries.

When setting up pipelines, resources are created that are required throughout the lifecycle of the pipeline. Additionally, each pipeline may generate its own resources.

The pipeline setup resources are:

  • Training pipeline:
    • SageMaker pipeline
    • SageMaker Model Registry model group
    • CloudWatch namespace
  • Inference pipeline:
    • SageMaker pipeline

The pipeline run resources are:

  • Training pipeline:
    • SageMaker model

You should delete these resources when the pipelines expire or are no longer needed.

SageMaker project template

In this section, we discuss the manual provisioning of pipelines through an example notebook and automatic provisioning of SageMaker pipelines through the use of a Service Catalog product and SageMaker project.

By using Amazon SageMaker Projects and its powerful template-based approach, organizations establish a standardized and scalable infrastructure for ML development, allowing teams to focus on building and iterating ML models, reducing time wasted on complex setup and management.

The following diagram shows the required components of a SageMaker project template. Use Service Catalog to register a SageMaker project CloudFormation template in your organization’s Service Catalog portfolio.

The following diagram illustrates the required components of a SageMaker project template

To start the ML workflow, the project template serves as the foundation by defining a continuous integration and delivery (CI/CD) pipeline. It begins by retrieving the ML seed code from a CodeCommit repository. Then the BuildProject component takes over and orchestrates the provisioning of SageMaker training and inference pipelines. This automation delivers a seamless and efficient run of the ML pipeline, reducing manual intervention and speeding up the deployment process.

Dependencies

The solution has the following dependencies:

  • Amazon SageMaker SDK – The Amazon SageMaker Python SDK is an open source library for training and deploying ML models on SageMaker. For this proof of concept, pipelines were set up using this SDK.
  • Boto3 SDK – The AWS SDK for Python (Boto3) provides a Python API for AWS infrastructure services. We use the SDK for Python to create roles and provision SageMaker SDK resources.
  • SageMaker Projects – SageMaker Projects delivers standardized infrastructure and templates for MLOps for rapid iteration over multiple ML use cases.
  • Service Catalog – Service Catalog simplifies and speeds up the process of provisioning resources at scale. It offers a self-service portal, standardized service catalog, versioning and lifecycle management, and access control.

Conclusion

In this post, we showed how TWCo uses SageMaker, CloudWatch, CodePipeline, and CodeBuild for their MLOps platform. With these services, TWCo extended the capabilities of its data science team while also improving how data scientists manage ML workflows. These ML models ultimately helped TWCo create predictive, privacy-friendly experiences that improved user experience and explains how weather conditions impact consumers’ daily planning or business operations. We also reviewed the architecture design that helps maintain responsibilities between different users modularized. Typically data scientists are only concerned with the science aspect of ML workflows, whereas DevOps and ML engineers focus on the production environments. TWCo reduced infrastructure management time by 90% while also reducing model deployment time by 20%.

This is just one of many ways AWS enables builders to deliver great solutions. We encourage to you to get started with Amazon SageMaker today.


About the Authors

Qaish Kanchwala is a ML Engineering Manager and ML Architect at The Weather Company. He has worked on every step of the machine learning lifecycle and designs systems to enable AI use cases. In his spare time, Qaish likes to cook new food and watch movies.

Chezsal Kamaray is a Senior Solutions Architect within the High-Tech Vertical at Amazon Web Services. She works with enterprise customers, helping to accelerate and optimize their workload migration to the AWS Cloud. She is passionate about management and governance in the cloud and helping customers set up a landing zone that is aimed at long-term success. In her spare time, she does woodworking and tries out new recipes while listening to music.

Anila Joshi has more than a decade of experience building AI solutions. As an Applied Science Manager at the AWS Generative AI Innovation Center, Anila pioneers innovative applications of AI that push the boundaries of possibility and guides customers to strategically chart a course into the future of AI.

Kamran Razi is a Machine Learning Engineer at the Amazon Generative AI Innovation Center. With a passion for creating use case-driven solutions, Kamran helps customers harness the full potential of AWS AI/ML services to address real-world business challenges. With a decade of experience as a software developer, he has honed his expertise in diverse areas like embedded systems, cybersecurity solutions, and industrial control systems. Kamran holds a PhD in Electrical Engineering from Queen’s University.

Shuja Sohrawardy is a Senior Manager at AWS’s Generative AI Innovation Center. For over 20 years, Shuja has utilized his technology and financial services acumen to transform financial services enterprises to meet the challenges of a highly competitive and regulated industry. Over the past 4 years at AWS, Shuja has used his deep knowledge in machine learning, resiliency, and cloud adoption strategies, which has resulted in numerous customer success journeys. Shuja holds a BS in Computer Science and Economics from New York University and an MS in Executive Technology Management from Columbia University.

Francisco Calderon is a Data Scientist at the Generative AI Innovation Center (GAIIC). As a member of the GAIIC, he helps discover the art of the possible with AWS customers using generative AI technologies. In his spare time, Francisco likes playing music and guitar, playing soccer with his daughters, and enjoying time with his family.

Read More

Eviden scales AWS DeepRacer Global League using AWS DeepRacer Event Manager

Eviden scales AWS DeepRacer Global League using AWS DeepRacer Event Manager

Eviden is a next-gen technology leader in data-driven, trusted, and sustainable digital transformation. With a strong portfolio of patented technologies and worldwide leading positions in advanced computing, security, AI, cloud, and digital platforms, Eviden provides deep expertise for a multitude of industries in more than 47 countries. Eviden is an AWS Premier partner, bringing together 47,000 world-class talents and expanding the possibilities of data and technology across the digital continuum, now and for generations to come. Eviden is an Atos Group company with an annual revenue of over €5 billion.

We are passionate about our people improving their skills, and support the development of the next generation of cloud-centered talent. Although fundamental knowledge gained through training and certification is important, there’s no substitute for getting hands-on. We complement individual learning with hands-on opportunities, including Immersion Days, Gamedays, and using AWS DeepRacer.

AWS DeepRacer empowers users to train reinforcement learning models on the AWS Cloud and race them around a virtual track. Unlike traditional programming, where you define the desired output, AWS DeepRacer allows you to define rewards for specific behaviors, such as going faster or staying centered on the track. This hands-on experience gives learners an excellent opportunity to engage with the AWS Management Console and develop Python-based reward functions, fostering valuable skills in cloud-centered technologies and machine learning (ML).

To elevate the event experience and streamline the management of their global AWS DeepRacer series, Eviden adopted the open source AWS DeepRacer Event Manager (DREM) solution. In this post, we discuss the benefits of DREM and the experience for racers, event staff, and spectators.

Introducing AWS DeepRacer Event Manager

AWS DeepRacer Event Manager is an innovative application that Eviden has deployed within its own AWS environment. Comprised of AWS Cloud-centered services, DREM is designed to simplify the process of hosting in-person AWS DeepRacer events, while also delivering a more engaging and immersive experience for both participants and spectators.

With their prior experience hosting AWS DeepRacer events in the UK, Eviden sought to expand the reach of this exciting initiative globally. By adopting the DREM solution, Eviden’s experienced event staff in the UK were able to seamlessly support their counterparts in hosting AWS DeepRacer events for the first time in locations such as Bydgoszcz, Paris, and Pune. The DREM solution empowers Eviden to seamlessly configure and manage their global AWS DeepRacer events. Within the platform, each event location’s AWS DeepRacer cars are registered, enabling remote configuration and model uploads powered by AWS Systems Manager. Additionally, a Raspberry Pi device is registered at each location to serve as an integrated timing solution, which uses DREM’s data-driven racing capabilities to capture and report critical performance metrics for each racer, such as best lap time, average lap time, and total laps completed.

Furthermore, the deep integration between DREM’s timing solution and the event’s streaming overlay has enabled Eviden to deliver a significantly more engaging experience for both in-person attendees and remote viewers, so everyone can stay fully informed and immersed in the action throughout the event.

The following diagram illustrates the DREM architecture and components.

Architecture diagram for DREM

Racer experience

For racers, the DREM experience begins with registration, during which they are encouraged to upload their AWS DeepRacer models in advance of the event. Authentication is seamlessly handled by Amazon Cognito, with out-of-the-box support for a local Amazon Cognito identity store, as well as the flexibility to integrate with a corporate identity provider solution if required. After the registered racers have uploaded their models, DREM automatically scans for any suspicious content, quarantining it as necessary, before making the verified models available for the racing competition.

screenshot of the racer experiencescreenshot of model management in DREM for participants

Event staff experience

For the event staff, the DREM solution greatly simplifies the process of running an AWS DeepRacer competition. User management, model uploading, and naming conventions are all handled seamlessly within the platform, eliminating any potential confusion around model ownership. To further enforce the integrity of the competition, DREM applies MD5 hashing to the uploaded models, preventing any unauthorized model sharing between racers. Additionally, the DREM interface makes it remarkably straightforward and efficient to upload multiple models to the AWS DeepRacer cars, providing a far superior experience compared to the cars’ native graphical UI. DREM also simplifies the management of the AWS DeepRacer car fleets and Raspberry Pi timing devices, empowering the event staff to remotely remove models, restart the AWS DeepRacer service, and even print user-friendly labels for the cars, making it effortless for racers to connect to them using the provided tablets.

event staff experience devices listscreenshot of the tablet interface for event staff

To further streamline the event management process, DREM provides pre-built scripts that enable seamless registration of devices, which can then be fully managed remotely. The timekeeping functionality is automatically handled within DREM, using Raspberry Pi devices, pressure sensors, and pressure sensor trimming—either through custom DIY modifications or the use of the excellent Digital Racing Kings boards. The DREM timekeeping system represents a significant improvement over the alternative solutions Eviden has used in the past. It captures critical race metrics, including remaining time, all lap times, and the fastest lap. Additionally, the system provides an option to invalidate a lap, such as in cases where a car went off-track. After a racer has completed their runs, the data is securely stored in DREM, and the leaderboard is automatically updated to reflect the latest results.

timekeeper screenshot and results view

Spectator experience

As a global systems integrator, Eviden was determined to deliver a truly spectacular experience, not only for the onsite participants in their AWS DeepRacer finals, but also for those in the room observing the event, as well as remote viewers racing using a proxy or simply interested in watching the competition unfold. To achieve this, Eviden took advantage of the DREM solution’s seamlessly integrated streaming overlay and leaderboard capabilities, which kept all attendees, both in-person and online, fully engaged and informed of the current standings throughout the event.

In previous AWS DeepRacer events, participants had to rely on someone in the room to verbally communicate lap times and remaining race time. However, with DREM, both the racers and spectators have instant access to all the critical timing information, keeping everyone fully up to date. This was especially beneficial for remote participants, who could now clearly see which cars were on the track and follow the progress of their fellow racers, with the streaming overlay dynamically updating to display the top positions on the leaderboard.

Impact of lighting on the deepracer trackImpact of lighting on the deepracer track

In addition, DREM offers a dedicated webpage displaying the complete leaderboard, which can be conveniently shown on screens in the event space, as well as allow remote attendees to follow the competition progress from other locations throughout the day.

Eviden Deepracer Bydgoszcz results Eviden Deepracer London results Eviden Deepracer Paris results

The significant improvements to the event experience were clearly reflected in the feedback received from this year’s participants:

  • “Joining remotely was superb.”
  • “Remote interaction over teams was much improved this year.”
  • “The physical event was excellent, well attended, and ran very smoothly.”
  • “Loved it, even though I was attending remote.”
  • “The event itself was fantastic, with each stage being well-planned and organized. The on-site atmosphere exceeded my expectations, making it an incredible event to be part of.”

deepracer event participant feedback

Well-architected

The DREM solution has been meticulously designed with a well-architected approach. From the event organizer’s perspective, the peace of mind of knowing that DREM is secured using AWS WAF, Amazon CloudFront, and AWS Shield Standard, with user management seamlessly handled by Amazon Cognito, is invaluable. Additionally, the platform’s role-based access control (RBAC) is managed through AWS Identity and Access Management (IAM), applying least-privilege policies for enhanced security.

The DREM solution is built entirely using AWS Cloud-centered technologies, delivering inherent performance efficiency and reliability. When Eviden is not actively hosting events, there is minimal ongoing activity in the DREM environment, consisting primarily of standing items such as AWS WAF rules, CloudFront distributions, Amazon Simple Storage Service (Amazon S3) buckets, Amazon DynamoDB tables, and Systems Manager fleet configurations. However, during event periods, DREM seamlessly scales to use AWS Lambda, Amazon EventBridge, AWS Step Functions, and other serverless services, dynamically meeting the demands of the event hosting requirements.

Owing to the well-architected nature of the DREM solution, the platform is remarkably cost-effective. When not actively hosting events, the DREM environment incurs a minimal cost of around $6–8 per month, the majority of which is attributed to the AWS WAF protection. During event periods, the costs increase based on the number of users and models uploaded, but typically only rise to around $15 per month. To further optimize ongoing costs, DREM incorporates measures such as an Amazon S3 lifecycle policy that automatically removes uploaded models after a period of 2 weeks.

Conclusion

Are you interested in elevating your own AWS DeepRacer events and delivering a more engaging experience for your participants? We encourage you to explore the AWS DeepRacer Event Manager solution and see how it can transform your event management process.

To get started, visit the GitHub repo to learn more about the solution’s features and architecture. You can also reach out to the Eviden team or your local AWS Solutions Architect to discuss how DREM can be tailored to your specific event requirements.

Don’t miss out on the opportunity to take your AWS DeepRacer initiatives to the next level. Explore DREM and join us at an upcoming AWS DeepRacer event today!


About the authors

Sathya Paduchuri is a Senior Partner Solution Architect(PSA) at Amazon Web Services. Sathya helps partners run optimised workloads on AWS, build and develop their cloud practice(s) and develop new offerings.

Mark Ross is a Chief Architect at Eviden and has specialised in AWS for the past 8 years, gaining and maintaining all AWS certifications since 2021. Mark is passionate about helping customers build, migrate to and exploit AWS.  Mark has created and grown a large AWS community within Eviden.

Read More

Generate unique images by fine-tuning Stable Diffusion XL with Amazon SageMaker

Generate unique images by fine-tuning Stable Diffusion XL with Amazon SageMaker

Stable Diffusion XL by Stability AI is a high-quality text-to-image deep learning model that allows you to generate professional-looking images in various styles. Managed versions of Stable Diffusion XL are already available to you on Amazon SageMaker JumpStart (see Use Stable Diffusion XL with Amazon SageMaker JumpStart in Amazon SageMaker Studio) and Amazon Bedrock (see Stable Diffusion XL in Amazon Bedrock), allowing you to produce creative content in minutes. The base version of Stable Diffusion XL 1.0 assists with the creative process using generic subjects in the image, which enables use cases such as game character design, creative concept generation, film storyboarding, and image upscaling. However, for use cases that require generating images with a unique subject, you can fine-tune Stable Diffusion XL with a custom dataset by using a custom training container with Amazon SageMaker. With this personalized image generation model, you can incorporate your custom subject into the powerful image generation process that is provided by the Stable Diffusion XL base model.

In this post, we provide step-by-step instructions to create a custom, fine-tuned Stable Diffusion XL model using SageMaker to generate unique images. This automated solution helps you get started quickly by providing all the code and configuration necessary to generate your unique images—all you need is images of your subject. This is useful for use cases across various domains such as media and entertainment, games, and retail. Examples include using your custom subject for marketing material for film, character creation for games, and brand-specific images for retail. To explore more AI use cases, visit the AI Use Case Explorer.

Solution overview

The solution is composed of three logical parts:

  • The first part creates a Docker container image with the necessary framework and configuration for the training container.
  • The second part uses the training container to perform model training on your dataset, and outputs a fine-tuned custom Low-Rank Adaptation (LoRA) model. LoRA is an efficient fine-tuning method that doesn’t require adjusting the base model parameters. Instead, it adds a smaller number of parameters that are applied to the base model temporarily.
  • The third part takes the fine-tuned custom model and allows you to generate creative and unique images.

The following diagram illustrates the solution architecture.

architecture diagram

The workflow to create the training container consists of the following services:

  • SageMaker uses Docker containers throughout the ML lifecycle. SageMaker is flexible and allows you to bring your own container to use for model development, training, and inference. For this post, we build a custom container with the appropriate dependencies that will perform the fine-tuning.
  • Kohya SS is a framework that allows you to train Stable Diffusion models. Kohya SS works with different host environments. This solution uses the Docker on Linux environment option. Kohya SS can be used with a GUI. However, this solution uses the equivalent GUI parameters as a pre-configured TOML file to automate the entire Stable Diffusion XL fine-tuning process.
  • AWS CodeCommit is a fully managed source control service that hosts private Git repositories. We use CodeCommit to store the code that is necessary to build the training container (Dockerfile, buildspec.yml), and the training script (train.py) that is invoked when model training is initiated.
  • Amazon EventBridge is a serverless event bus, used to receive, filter, and route events. EventBridge captures any changes to the CodeCommit repository files, and invokes a new Docker container image to be built.
  • Amazon Elastic Container Registry (Amazon ECR) is a fully managed container hosting registry. We use it to store the custom training container image.
  • AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces deployable software packages. We use it to build the custom training container image. CodeBuild then pushes this image to Amazon ECR.

Various methods exist to fine-tune your model. Compared to methods that require training a new full model, the LoRA fine-tuning method doesn’t modify the original model. Instead, think of it as a layer on top of the base model. Not having to train and produce a full model for each subject has its advantages. This lowers the compute requirements for training, reduces the storage size of the models, and decreases the training time required, making the process more cost-effective at scale. In this post, we demonstrate how to create a LoRA model, based on the Stable Diffusion XL 1.0 base model, using your own subject.

The training workflow uses the following services and features:

  • Amazon Simple Storage Service (Amazon S3) is a highly durable and scalable object store. Your custom dataset and configuration file will be uploaded to Amazon S3, and then retrieved by the custom Docker container to train on those images.
  • Amazon SageMaker Model Training is a feature of SageMaker that allows you to standardize and manage your training jobs at scale, without the need to manage infrastructure. When the container starts up as part of a training job, the train.py file is invoked. When the training process is complete, the output model that resides in the /opt/ml/model directory is automatically uploaded to the S3 bucket specified in the training job configuration.
  • Amazon SageMaker Pipelines is a workflow orchestration service that allows you to automate ML processes, from data preprocessing to model monitoring. This allows you to initiate a training pipeline, taking in as input the Amazon S3 location of your dataset and configuration file, ECR container image, and infrastructure specifications for the training job.

Now you’re ready to prompt your fine-tuned model to generate unique images. SageMaker gives you the flexibility to bring your own container for inference. You can use SageMaker hosting services with your own custom inference container to configure an inference endpoint. However, to demonstrate the Automatic1111 Stable Diffusion UI, we show you how to run inference on an Amazon Elastic Compute Cloud (Amazon EC2) instance (or locally on your own machine).

This solution fully automates the creation of a fine-tuned LoRA model with Stable Diffusion XL 1.0 as the base model. In the following sections, we discuss how to satisfy the prerequisites, download the code, and use the Jupyter notebook in the GitHub repository to deploy the automated solution using an Amazon SageMaker Studio environment.

The code for this end-to-end solution is available in the GitHub repository.

Prerequisites

This solution has been tested in the AWS Region us-west-2, but applies to any Region where these services are available. Make sure you have the following prerequisites:

Download the necessary code in SageMaker Studio

In this section, we walk through the steps to download the necessary code in SageMaker Studio and set up your notebook.

Navigate to the terminal in SageMaker Studio JupyterLab

Complete the following steps to open the terminal:

  1. Log in to your AWS account and open the SageMaker Studio console.
  2. Select your user profile and choose Open Studio to open SageMaker Studio.
  3. Choose JupyterLab to open the JupyterLab application. This environment is where you will run the commands.
  4. If you already have a space created, choose Run to open the space.
  5. If you don’t have a space, choose Create JupyterLab space. Enter a name for the space and choose Create space. Leave the default values and choose Run space.
  6. When the environment shows a status of Running, choose Open JupyterLab to open the new space.
  7. In the JupyterLab Launcher window, choose Terminal.

jupyterlab terminal

Download the code to your SageMaker Studio environment

Run the following commands from the terminal. For this post, you check out just the required directories of the GitHub repo (so you don’t have to download the entire repository).

git clone --no-checkout https://github.com/aws/amazon-sagemaker-examples.git
cd amazon-sagemaker-examples/
git sparse-checkout set use-cases/text-to-image-fine-tuning
git checkout

If successful, you should see the output Your branch is up to date with 'origin/main'.

Open the notebook in SageMaker Studio JupyterLab

Complete the following steps to open the notebook:

  1. In JupyterLab, choose File Browser in the navigation pane.
  2. Navigate to the project directory named amazon-sagemaker-examples/use-cases/text-to-image-fine-tuning.
  3. Open the Jupyter notebook named kohya-ss-fine-tuning.ipynb.
  4. Choose your runtime kernel (it’s set to use Python 3 by default).
  5. Choose Select.

jupyterlab kernel

You now have a kernel that is ready to run commands. In the following steps, we use this notebook to create the necessary resources.

Train a custom Stable Diffusion XL model

In this section, we walk through the steps to train a custom Stable Diffusion XL model.

Set up AWS infrastructure with AWS CloudFormation

For your convenience, an AWS CloudFormation template has been provided to create the necessary AWS resources. Before you create the resources, configure AWS Identity and Access Management (IAM) permissions for your SageMaker IAM role. This role is used by the SageMaker environment, and grants permissions to run certain actions. As with all permissions, make sure you follow the best practice of only granting the permissions necessary to perform your tasks.

  1. On the IAM console, choose Roles in the navigation pane.
  2. Choose the role named AmazonSageMaker-ExecutionRole-<id>. This should be the role that is assigned to your domain.
  3. In the Permissions policies section, choose the policy named AmazonSageMaker-ExecutionPolicy-<id>.
  4. Choose Edit to edit the customer managed policy.
  5. Add the following permissions to the policy, then choose Next.
  6. Choose Save changes to confirm your added permissions.

You now have the proper permissions to run commands in your SageMaker environment.

  1. Navigate back to your notebook named kohya-ss-fine-tuning.ipynb in your JupyterLab environment.
  2. In the notebook step labeled Step One – Create the necessary resources through AWS CloudFormation, run the code cell to create the CloudFormation stack.

Wait for the CloudFormation stack to finish creating before moving on. You can monitor the status of the stack creation on the AWS CloudFormation console. This step should take about 2 minutes.

Set up your custom images and fine-tuning configuration file

In this section, you first upload your fine-tuning configuration file to Amazon S3. The configuration file is specific to the Kohya program. Its purpose is to specify the configuration settings programmatically rather than manually using the Kohya GUI.

This file is provided with opinionated values. You can modify the configuration file with different values if desired. For information about what the parameters mean, refer to LoRA training parameters. You will need to experiment to achieve the desired result. Some parameters rely on underlying hardware and GPU (for example, mixed_precision=bf16 or xformers). Make sure your training instance has the proper hardware configuration to support the parameters you select.

You also need to upload a set of images to Amazon S3. If you don’t have your own dataset and decide to use images from public sources, make sure to adhere to copyright and license restrictions.

The structure of the S3 bucket is as follows:

bucket/0001-dataset/kohya-sdxl-config.toml

bucket/0001-dataset/<asset-folder-name>/     (images and caption files go here)

bucket/0002-dataset/kohya-sdxl-config.toml

bucket/0002-dataset/<asset-folder-name>/     (images and captions files go here)

...

The asset-folder-name uses a special naming convention, which is defined later in this post. Each xxxx-dataset prefix can contain separate datasets with different config file contents. Each pipeline takes a single dataset as input. The config file and asset folder will be downloaded by the SageMaker training job during the training step.

Complete the following steps:

  1. Navigate back to your notebook named kohya-ss-fine-tuning.ipynb in your JupyterLab environment.
  2. In the Notebook step labeled Step Two – Upload the fine-tuning configuration file, run the code cell to upload the config file to Amazon S3.
  3. Verify that you have an S3 bucket named sagemaker-kohya-ss-fine-tuning-<account id>, with a 0001-dataset prefix containing the kohya-sdxl-config.tomlfile.

Next, you create an asset folder and upload your custom images and caption files to Amazon S3. The asset-folder-name must be named according to the required naming convention. This naming convention is what defines the number of repetitions and the trigger word for the prompt. The trigger word is what identifies your custom subject. For example, a folder name of 60_dwjz signifies 60 repetitions with the trigger prompt word dwjz. Consider using initials or abbreviations of your subject for the trigger word so it doesn’t collide with existing words. For example, if your subject is a tiger, you could use the trigger word tgr. More repetitions don’t always translate to better results. Experiment to achieve your desired result.

  1. On the S3 console, navigate to the bucket named sagemaker-kohya-ss-fine-tuning-<account id>.
  2. Choose the prefix named 0001-dataset.
  3. Choose Create folder.
  4. Enter a folder name for your assets using the naming convention (for example, 60_dwjz) and choose Create folder.
  5. Choose the prefix. This is where your images and caption files go.
  6. Choose Upload.
  7. Choose Add files, choose your image files, then choose Upload.

When selecting images to use, favor quality over quantity. Some preprocessing of your image assets might be beneficial, such as cropping a person if you are fine-tuning a human subject. For this example, we used approximately 30 images for a human subject with great results. Most of them were high resolution, and cropped to include the human subject only—head and shoulders, half body, and full body images were included but not required.

Optionally, you can use caption files to assist your model in understanding your prompts better. Caption files have the .caption extension, and its contents describe the image (for example, dwjz wearing a vest and sunglasses, serious facial expression, headshot, 50mm). The image file names should match the corresponding (optional) caption file names. Caption files are highly encouraged. Upload your caption files to the same prefix as your images.

At the end of your upload, your S3 prefix structure should look similar to the following:

bucket/0001-dataset/kohya-sdxl-config.toml

bucket/0001-dataset/60_dwjz/

bucket/0001-dataset/60_dwjz/1.jpg

bucket/0001-dataset/60_dwjz/1.caption

bucket/0001-dataset/60_dwjz/2.jpg

bucket/0001-dataset/60_dwjz/2.caption

...

There are many variables to fine-tuning, and as of this writing there are no definitive recommendations for generating great results. To achieve good results, include enough steps in the training, good resolution assets, and enough images.

Set up the required code

The code required for this solution is provided and will be uploaded to the CodeCommit repository that was created by the CloudFormation template. This code is used to build the custom training container. Any updates to the code in this repository will invoke the container image to be built and pushed to Amazon ECR through an EventBridge rule.

The code consists of the following components:

  • buildspec.yml – Creates the container image by using the GitHub repository for Kohya SS, and pushes the training image to Amazon ECR
  • Dockerfile – Used to override the Dockerfile in the Kohya SS project, which is slightly modified to be used with SageMaker training
  • train.py – Initiates the Kohya SS program to do the fine-tuning, and is invoked when the SageMaker training job runs

Complete the following steps to create the training container image:

  1. Navigate back to your notebook named kohya-ss-fine-tuning.ipynb in your JupyterLab environment.
  2. In the step labeled Step Three – Upload the necessary code to the AWS CodeCommit repository, run the code cell to upload the required code to the CodeCommit repository.

This event will initiate the process that creates the training container image and uploads the image to Amazon ECR.

  1. On the CodeBuild console, locate the project named kohya-ss-fine-tuning-build-container.

Latest build status should display as In progress. Wait for the build to finish and the status to change to Succeeded. The build takes about 15 minutes.

A new training container image is now available in Amazon ECR. Every time you make a change to the code in the CodeCommit repository, a new container image will be created.

Initiate the model training

Now that you have a training container image, you can use SageMaker Pipelines with a training step to train your model. SageMaker Pipelines enables you to build powerful multi-step pipelines. There are many step types provided for you to extend and orchestrate your workflows, allowing you to evaluate models, register models, consider conditional logic, run custom code, and more. The following steps are used in this pipeline:

  • Condition step – Evaluate input parameters. If successful, proceed with the training step. If not successful, proceed with the fail step. This step validates that the training volume size is at least 50 GB. You could extend this logic to only allow specific instance types, to only allow specific training containers, and add other guardrails if applicable.
  • Training Step – Run a SageMaker training job, given the input parameters.
  • Fail step – Stop the pipeline and return an error message.

Complete the following steps to initiate model training:

  1. On the SageMaker Studio console, in the navigation pane, choose Pipelines.
  2. Choose the pipeline named kohya-ss-fine-tuning-pipeline.
  3. Choose Create to create a pipeline run.
  4. Enter a name, description (optional), and any desired parameter values.
  5. You can keep the default settings of using the 0001-dataset for the input data and an ml.g5.8xlarge instance type for training.
  6. Choose Create to invoke the pipeline.

sagemaker pipeline execution

  1. Choose the current pipeline run to view its details.
  2. In the graph, choose the pipeline step named TrainNewFineTunedModel to access the pipeline run information.

The Details tab displays metadata, logs, and the associated training job. The Overview tab displays the output model location in Amazon S3 when training is complete (note this Amazon S3 location for use in later steps). SageMaker processes the training output by uploading the model in the /opt/ml/model directory of the training container to Amazon S3, in the location specified by the training job.

sagemaker pipeline

Wait for the pipeline status to show as Succeeded before proceeding to the next step.

Run inference on a custom Stable Diffusion XL model

There are many options for model hosting. For this post, we demonstrate how to run inference with Automatic1111 Stable Diffusion web UI running on an EC2 instance. This tool enables you to use various image generation features through a user interface. It’s a straightforward way to learn the parameters available in a visual format and experiment with supplementary features. For this reason, we demonstrate using this tool as part of this post. However, you can also use SageMaker to host an inference endpoint, and you have the option to use your own custom inference container.

Install the Automatic1111 Stable Diffusion web UI on Amazon EC2

Complete the following steps to install the web UI:

  1. Create an EC2 Windows instance and connect to it. For instructions, see Get started with Amazon EC2.
  2. Choose Windows Server 2022 Base Amazon Machine Image, a g5.8xlarge instance type, a key pair, and 100 GiB of storage. Alternatively, you can use your local machine.
  3. Install NVIDIA drivers to enable the GPU. This solution has been tested with the Data Center Driver for Windows version 551.78.
  4. Install the Automatic1111 Stable Diffusion web UI using the instructions in the Automatic Installation on Windows section in the GitHub repo. This solution has been tested with version 1.9.3. The last step of installation will ask you to run webui-user.bat, which will install and launch the Stable Diffusion UI in a web browser.

automatic1111 ui

  1. Download the Stable Diffusion XL 1.0 Base model from Hugging Face.
  2. Move the downloaded file sd_xl_base_1.0.safetensors to the directory ../stable-diffusion-webui/models/Stable-diffusion/.
  3. Scroll to the bottom of the page and choose Reload UI.
  4. Choose sd_xl_base_1.0.safetensors on the Stable Diffusion checkpoint dropdown menu.
  5. Adjust the default Width and Height values to 1024 x 1024 for better results.
  6. Experiment with the remaining parameters to achieve your desired result. Specifically, try adjusting the settings for Sampling method, Sampling steps, CFG Scale, and Seed.

The input prompt is extremely important to achieve great results. You can add extensions to assist with your creative workflow. This style selector extension is great at supplementing prompts.

  1. To install this extension, navigate to the Extensions tab, choose Install from URL, enter the style selector extension URL, and choose Install.
  2. Reload the UI for changes to take effect.

You will notice a new section called SDXL Styles, which you can select from to add to your prompts.

  1. Download the fine-tuned model that was created by the SageMaker pipeline training step.

The model is stored in Amazon S3 with the file name model.tar.gz.

  1. You can use the Share with a presigned URL option to share as well.

s3 model location

  1. Unzip the contents of the model.tar.gz file (twice) and copy the custom_lora_model.safetensors LoRA model file to the directory ../stable-diffusion-webui/models/Lora.
  2. Choose the Refresh icon on the Lora tab to verify that your custom_lora_model is available.

automatic1111 lora ui

  1. Choose custom_lora_model, and it will populate the prompt input box with the text <lora:custom_lora_model:1>.
  2. Append a prompt to the text (see examples in the next section).
  3. You can decrease or increase the multiplier of your LoRA model by changing the 1 value. This adjusts the influence of your LoRA model accordingly.
  4. Choose Generate to run inference against your fine-tuned LoRA model.

Example results

These results are from a fine-tuned model trained on 39 high-resolution images of the author, using the provided code and configuration files in this solution. Caption files were written for each of these images, using the trigger word aallzz.

generated image result 1

Prompt: concept art <lora:custom_lora_model:1.0> aallzz professional headshot, cinematic, bokeh, dramatic lighting, shallow depth of field, vignette, highly detailed, high budget, 8k, cinemascope, moody, epic, gorgeous, digital artwork, illustrative, painterly, matte painting

Negative Prompt: photo, photorealistic, realism, anime, abstract, glitch

Sampler: DPM2

Sampling Steps: 90

CFG Scale: 8.5

Width/Height: 1024×1024

generated image result 2

Prompt: cinematic film still <lora:custom_lora_model:1> aallzz eating a burger, cinematic, bokeh, dramatic lighting, shallow depth of field, vignette, highly detailed, high budget, cinemascope, moody, epic, gorgeous, film grain, grainy

Negative Prompt: anime, cartoon, graphic, painting, graphite, abstract, glitch, mutated, disfigured

Sampler: DPM2

Sampling Steps: 70

CFG Scale: 8

Width/Height: 1024×1024

generated image result 3

Prompt: concept art <lora:custom_lora_model:1> aallzz 3D profile picture avatar, vector icon, character, mountain background, sun backlight, digital artwork, illustrative, painterly, matte painting, highly detailed

Negative Prompt: photo, photorealistic, realism, glitch, mutated, disfigured, glasses

Sampler: DPM2

Sampling Steps: 100

CFG Scale: 9

Width/Height: 1024×1024

generated image result 4

Prompt: concept art <lora:custom_lora_model:1> aallzz 3D profile picture avatar, vector icon, vector illustration, vector art, realistic cartoon character, professional attire, digital artwork, illustrative, painterly, matte painting, highly detailed

Negative Prompt: photo, photorealistic, realism, glitch, mutated, disfigured, glasses, hat

Sampler: DPM2

Sampling Steps: 100

CFG Scale: 10

Width/Height: 1024×1024

generated image result 5

Prompt: cinematic photo <lora:custom_lora_model:1> aallzz portrait, sitting, magical elephant with large tusks, wearing safari clothing, majestic scenery in the background, river, natural lighting, 50mm, highly detailed, photograph, film, bokeh, professional, 4k, highly detailed

Negative Prompt: drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, glitch, mutated, disfigured, glasses, hat

Sampler: DPM2

Sampling Steps: 100

CFG Scale: 9.5

Width/Height: 1024×1024

Clean up

To avoid incurring charges, delete the resources you created as part of this solution:

  1. Delete the objects in your S3 bucket. You must delete the objects before deleting the stack.
  2. Delete your container image in Amazon ECR. You must delete the image before deleting the stack.
  3. On the AWS CloudFormation console, delete the stack named kohya-ss-fine-tuning-stack.
  4. If you created an EC2 instance for running inference, stop or delete the instance.
  5. Stop or delete your SageMaker Studio instances, applications, and spaces.

Conclusion

Congratulations! You have successfully fine-tuned a custom LoRA model to be used with Stable Diffusion XL 1.0. We created a custom training Docker container, fine-tuned a custom LoRA model to be used with Stable Diffusion XL, and used the resulting model to generate creative and unique images. The end-to-end training solution was fully automated with a CloudFormation template to help you get started quickly. Now, try creating a custom model with your own subject. To explore more AI use cases, visit the AI Use Case Explorer.


About the Author

Alen Zograbyan is a Sr. Solutions Architect at Amazon Web Services. He currently serves media and entertainment customers, and has expertise in software engineering, DevOps, security, and AI/ML. He has a deep passion for learning, teaching, and photography.

Read More

Build your multilingual personal calendar assistant with Amazon Bedrock and AWS Step Functions

Build your multilingual personal calendar assistant with Amazon Bedrock and AWS Step Functions

Foreigners and expats living outside of their home country deal with a large number of emails in various languages daily. They often find themselves struggling with language barriers when it comes to setting up reminders for events like business gatherings and customer meetings. To solve this problem, this post shows you how to apply AWS services such as Amazon Bedrock, AWS Step Functions, and Amazon Simple Email Service (Amazon SES) to build a fully-automated multilingual calendar artificial intelligence (AI) assistant. It understands the incoming messages, translates them to the preferred language, and automatically sets up calendar reminders.

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that’s best suited for your use case. With Amazon Bedrock, you can get started quickly, privately customize FMs with your own data, and easily integrate and deploy them into your applications using AWS tools without having to manage any infrastructure.

AWS Step Functions is a visual workflow service that helps developers build distributed applications, automate processes, orchestrate microservices, and create data and machine learning (ML) pipelines. It lets you orchestrate multiple steps in the pipeline. The steps could be AWS Lambda functions that generate prompts, parse foundation models’ output, or send email reminders using Amazon SES. Step Functions can interact with over 220 AWS services, including optimized integrations with Amazon Bedrock. Step Functions pipelines can contain loops, map jobs, parallel jobs, conditions, and human interaction, which can be useful for AI-human interaction scenarios.

This post shows you how to quickly combine the flexibility and capability of both Amazon Bedrock FMs and Step Functions to build a generative AI application in a few steps. You can reuse the same design pattern to implement more generative AI applications with low effort. Both Amazon Bedrock and Step Functions are serverless, so you don’t need to think about managing and scaling the infrastructure.

The source code and deployment instructions are available in the Github repository.

Overview of solution

Figure 1: Solution architecture

Figure 1: Solution architecture

As shown in Figure 1, the workflow starts from the Amazon API Gateway, then goes through different steps in the Step Functions state machine. Pay attention to how the original message flows through the pipeline and how it changes. First, the message is added to the prompt. Then, it is transformed into structured JSON by the foundation model. Finally, this structured JSON is used to carry out actions.

  1. The original message (example in Norwegian) is sent to a Step Functions state machine using API Gateway.
  2. Lambda function generates a prompt that includes system instructions, the original message, and other needed information such as the current date and time. (Here’s the generated prompt from the example message).
    • Sometimes, the original message might not specify the exact date but instead says something like “please RSVP before this Friday,” implying the date based on the current context. Therefore, the function inserts the current date into the prompt to assist the model in interpreting the correct date for this Friday.
  3. Invoke the Bedrock FM to run the following tasks, as outlined in the prompt, and pass the output to the next step to the parser:
    • Translate and summarize the original message in English.
    • Extract events information such as subject, location, and time from the original message.
    • Generate an action plan list for events. For now, the instruction only asks the FM to generate action plan for sending calendar reminder emails for attending an event.
  4. Parse the FM output to ensure it has a valid schema. (Here’s the parsed result of the sample message.)
    • Anthropic Claude on Amazon Bedrock can control the output format and generate JSON, but it might still produce the result as “this is the json {…}.” To enhance robustness, we implement an output parser to ensure adherence to the schema, thereby strengthening this pipeline.
  5. Iterate through the action-plan list and perform step 6 for each item. Every action item follows the same schema:
    {
              "tool_name": "create-calendar-reminder",
              "parameters": {
                "body": "Jeff, the CEO, invites employees to ...",
                "raw_body": "Kjære ansatte,nnVi ..",
                "subject": "Winter fun and team building event",
                "start_datetime": "2024-05-30T10:00:00Z",
                "end_datetime": "2024-05-30T23:00:00Z",
                "location": "Holmenkollbakken, Oslo"
              }
    }

  6. Choose the right tool to do the job:
    • If the tool_name equals create-calendar-reminder, then run sub-flow A to send out a calendar reminder email using Lambda Function.
    • For future support of other possible jobs, you can expand the prompt to create a different action plan (assign different values to tool_name), and run the appropriate action outlined in sub-flow B.
  7.  Done.

Prerequisites

To run this solution, you must have the following prerequisites:

Deployment and testing

Thanks to AWS Cloud Development Kit (AWS CDK), you can deploy the full stack with a single command line by following the deployment instructions from the Github repository. The deployment will output the API Gateway endpoint URL and an API key.

Use a tool such as curl to send messages in different languages to API Gateway for testing:

apigw=[THE_VALUE_OF_GenaiCalendarAgentStack.APIUrl]
apikey=[THE_VALUE_OF_GenaiCalendarAgentStack.GeneratedAPIKeyValue]
curl -v $apigw --header "Content-Type: application/json" --header "x-api-key:$apikey" -d @./doc/sample-inputs/norsk1.json 

Within 1–2 minutes, email invitations should be sent to the recipient from your sender email address, as shown in Figure 2.

Figure 2: Email generated by the solution

Figure 2: Email generated by the solution

Cleaning up

To avoid incurring future charges, delete the resources by running the following command in the root path of the source code:

$ cdk destroy

Future extension of the solution

In the current implementation, the solution only sends out calendar reminder emails; the prompt only instructs the foundation model to generate action items where tool_name equals create-calendar-reminder. You can extend the solution to support more actions. For example, automatically send an email to the event originator and politely decline it if the event is in July (summer vacation for many):

  1. Modify the prompt instruction: If the event date is in July, create an action item and set the value of tool_name to send-decline-mail.
  2. Similar to the sub-flow A, create a new sub-flow C where tool_name matches send-decline-mail:
    1. Invoke the Amazon Bedrock FM to generate email content explaining that you cannot attend the event because it’s in July (summer vacation).
    2. Invoke a Lambda function to send out the decline email with the generated content.

In addition, you can experiment with different foundation models on Amazon Bedrock, such as Meta Llma 3 or Mistral AI, for better performance or lower cost. You can also explore Agents for Amazon Bedrock, which can orchestrate and run multistep tasks.

Conclusion

In this post, we explored a solution pattern for using generative AI within a workflow. With the flexibility and capabilities offered by both Amazon Bedrock FMs and AWS Step Functions, you can build a powerful generative AI assistant in a few steps. This assistant can streamline processes, enhance productivity, and handle various tasks efficiently. You can easily modify or upgrade its capacity without being burdened by the operational overhead of managed services.

You can find the solution source code in the Github repository and deploy your own multilingual calendar assistant by following the deployment instructions.

Check out the following resources to learn more:


About the Author

Feng Lu is a Senior Solutions Architect at AWS with 20 years professional experience. He is passionate about helping organizations to craft scalable, flexible, and resilient architectures that address their business challenges. Currently, his focus lies in leveraging Artificial Intelligence (AI) and Internet of Things (IoT) technologies to enhance the intelligence and efficiency of our physical environment.

Read More

Medical content creation in the age of generative AI

Medical content creation in the age of generative AI

Generative AI and transformer-based large language models (LLMs) have been in the top headlines recently. These models demonstrate impressive performance in question answering, text summarization, code, and text generation. Today, LLMs are being used in real settings by companies, including the heavily-regulated healthcare and life sciences industry (HCLS). The use cases can range from medical information extraction and clinical notes summarization to marketing content generation and medical-legal review automation (MLR process). In this post, we explore how LLMs can be used to design marketing content for disease awareness.

Marketing content is a key component in the communication strategy of HCLS companies.  It’s also a highly non-trivial balance exercise, because the technical content should be as accurate and precise as possible, yet engaging and empowering for the target audience. The main goal of the marketing content is to raise awareness about certain health conditions and disseminate knowledge of possible therapies among patients and healthcare providers. By accessing up-to-date and accurate information, healthcare providers can adapt their patients’ treatment in a more informed and knowledgeable way. However, medical content being highly sensitive, the generation process can be relatively slow (from days to weeks), and may go through numerous peer-review cycles, with thorough regulatory compliance and evaluation protocols.

Could LLMs, with their advanced text generation capabilities, help streamline this process by assisting brand managers and medical experts in their generation and review process?

To answer this question, the AWS Generative AI Innovation Center recently developed an AI assistant for medical content generation. The system is built upon Amazon Bedrock and leverages LLM capabilities to generate curated medical content for disease awareness. With this AI assistant, we can effectively reduce the overall generation time from weeks to hours, while giving the subject matter experts (SMEs) more control over the generation process. This is accomplished through an automated revision functionality, which allows the user to interact and send instructions and comments directly to the LLM via an interactive feedback loop. This is especially important since the revision of content is usually the main bottleneck in the process.

Since every piece of medical information can profoundly impact the well-being of patients, medical content generation comes with additional requirements and hinges upon the content’s accuracy and precision. For this reason, our system has been augmented with additional guardrails for fact-checking and rules evaluation. The goal of these modules is to assess the factuality of the generated text and its alignment with pre-specified rules and regulations. With these additional features, you have more transparency and control over the underlying generative logic of the LLM.

This post walks you through the implementation details and design choices, focusing primarily on the content generation and revision modules. Fact-checking and rules evaluation require special coverage and will be discussed in an upcoming post.

Image 1: High-level overview of the AI-assistant and its different components

Image 1: High-level overview of the AI-assistant and its different components

Architecture

The overall architecture and the main steps in the content creation process are illustrated in Image 2. The solution has been designed using the following services:

Image 2: Content generation steps

Image 2: Content generation steps

The workflow is as follows:

  • In step 1, the user selects a set of medical references and provides rules and additional guidelines on the marketing content in the brief.
  • In step 2, the user interacts with the system through a Streamlit UI, first by uploading the documents and then by selecting the target audience and the language.
  • In step 3, the frontend sends the HTTPS request via the WebSocket API and API gateway and triggers the first Amazon Lambda function.
  • In step 5, the lambda function triggers the Amazon Textract to parse and extract data from pdf documents.
  • The extracted data is stored in an S3 bucket and then used as in input to the LLM in the prompts, as shown in steps 6 and 7.
  • In step 8, the Lambda function encodes the logic of the content generation, summarization, and content revision.
  • Optionally, in step 9, the content generated by the LLM can be translated to other languages using the Amazon Translate.
  • Finally, the LLM generates new content conditioned on the input data and the prompt. It sends it back to the WebSocket via the Lambda function.

Preparing the generative pipeline’s input data

To generate accurate medical content, the LLM is provided with a set of curated scientific data related to the disease in question, e.g. medical journals, articles, websites, etc. These articles are chosen by brand managers, medical experts and other SMEs with adequate medical expertise.

The input also consists of a brief, which describes the general requirements and rules the generated content should adhere to (tone, style, target audience, number of words, etc.). In the traditional marketing content generation process, this brief is usually sent to content creation agencies.

It is also possible to integrate more elaborate rules or regulations, such as the HIPAA privacy guidelines for the protection of health information privacy and security. Moreover, these rules can either be general and universally applicable or they can be more specific to certain cases. For example, some regulatory requirements may apply to some markets/regions or a particular disease. Our generative system allows a high degree of personalization so you can easily tailor and specialize the content to new settings, by simply adjusting the input data.

The content should be carefully adapted to the target audience, either patients or healthcare professionals. Indeed, the tone, style, and scientific complexity should be chosen depending on the readers’ familiarity with medical concepts. The content personalization is incredibly important for HCLS companies with a large geographical footprint, as it enables synergies and yields more efficiencies across regional teams.

From a system design perspective, we may need to process a large number of curated articles and scientific journals. This is especially true if the disease in question requires sophisticated medical knowledge or relies on more recent publications. Moreover, medical references contain a variety of information, structured in either plain text or more complex images, with embedded annotations and tables. To scale the system, it is important to seamlessly parse, extract, and store this information. For this purpose, we use Amazon Textract, a machine learning (ML) service for entity recognition and extraction.

Once the input data is processed, it is sent to the LLM as contextual information through API calls. With a context window as large as 200K tokens for Anthropic Claude 3, we can choose to either use the original scientific corpus, hence improving the quality of the generated content (though at the price of increased latency), or summarize the scientific references before using them in the generative pipeline.

Medical reference summarization is an essential step in the overall performance optimization and is achieved by leveraging LLM summarization capabilities. We use prompt engineering to send our summarization instructions to the LLM. Importantly, when performed, summarization should preserve as much article’s metadata as possible, such as the title, authors, date, etc.

Image 3: A simplified version of the summarization prompt

Image 3: A simplified version of the summarization prompt

To start the generative pipeline, the user can upload their input data to the UI. This will trigger the Textract and optionally, the summarization Lambda functions, which, upon completion, will write the processed data to an S3 bucket. Any subsequent Lambda function can read its input data directly from S3. By reading data from S3, we avoid throttling issues usually encountered with Websockets when dealing with large payloads.

Image 4: A high-level schematic of the content generation pipeline

Image 4: A high-level schematic of the content generation pipeline

Content Generation

Our solution relies primarily on prompt engineering to interact with Bedrock LLMs. All the inputs (articles, briefs and rules) are provided as parameters to the LLM via a LangChain PrompteTemplate object. We can guide the LLM further with few-shot examples illustrating, for instance, the citation styles. Fine-tuning – in particular, Parameter-Efficient Fine-Tuning techniques – can specialize the LLM further to the medical knowledge and will be explored at a later stage.

Image 5: A simplified schematic of the content generation prompt

Image 5: A simplified schematic of the content generation prompt

Our pipeline is multilingual in the sense it can generate content in different languages. Claude 3, for example, has been trained on dozens of different languages besides English and can translate content between them. However, we recognize that in some cases, the complexity of the target language may require a specialized tool, in which case, we may resort to an additional translation step using Amazon Translate.

Image 6: Animation showing the generation of an article on Ehlers-Danlos syndrome, its causes, symptoms, and complications

Content Revision

Revision is an important capability in our solution because it enables you to further tune the generated content by iteratively prompting the LLM with feedback. Since the solution has been designed primarily as an assistant, these feedback loops allow our tool to seamlessly integrate with existing processes, hence effectively assisting SMEs in the design of accurate medical content. The user can, for instance, enforce a rule that has not been perfectly applied by the LLM in a previous version, or simply improve the clarity and accuracy of some sections. The revision can be applied to the whole text.  Alternatively, the user can choose to correct individual paragraphs. In both cases, the revised version and the feedback are appended to a new prompt and sent to the LLM for processing.

Image 7: A simplified version of the content revision prompt

Image 7: A simplified version of the content revision prompt

Upon submission of the instructions to the LLM, a Lambda function triggers a new content generation process with the updated prompt. To preserve the overall syntactic coherence, it is preferable to re-generate the whole article, keeping the other paragraphs untouched. However, one can improve the process by re-generating only those sections for which feedback has been provided. In this case, proper attention should be paid to the consistency of the text. This revision process can be applied recursively, by improving upon the previous versions, until the content is deemed satisfactory by the user.

Image 8: Animation showing the revision of the Ehlers-Danlos article. The user can ask, for example, for additional information

Conclusion

With the recent improvements in the quality of LLM-generated text, generative AI has become a transformative technology with the potential to streamline and optimize a wide range of processes and businesses.

Medical content generation for disease awareness is a key illustration of how LLMs can be leveraged to generate curated and high-quality marketing content in hours instead of weeks, hence yielding a substantial operational improvement and enabling more synergies between regional teams. Through its revision feature, our solution can be seamlessly integrated with existing traditional processes, making it a genuine assistant tool empowering medical experts and brand managers.

Marketing content for disease awareness is also a landmark example of a highly regulated use case, where precision and accuracy of the generated content are critically important. To enable SMEs to detect and correct any possible hallucination and erroneous statements, we designed a factuality checking module with the purpose of detecting potential misalignment in the generated text with respect to source references.

Furthermore, our rule evaluation feature can help SMEs with the MLR process by automatically highlighting any inadequate implementation of rules or regulations. With these complementary guardrails, we ensure both scalability and robustness of our generative pipeline, and consequently, the safe and responsible deployment of AI in industrial and real-world settings.

Bibliography

  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, & Illia Polosukhin. (2023). Attention Is All You Need.
  • Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, & Dario Amodei. (2020). Language Models are Few-Shot Learners.
  • Mesko, B., & Topol, E. (2023). The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ digital medicine, 6, 120.
  • Clusmann, J., Kolbinger, F.R., Muti, H.S. et al. The future landscape of large language models in medicine. Commun Med 3, 141 (2023). https://doi.org/10.1038/s43856-023-00370-1
  • Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, & Erik Cambria. (2023). A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics.
  • Mu W, Muriello M, Clemens JL, Wang Y, Smith CH, Tran PT, Rowe PC, Francomano CA, Kline AD, Bodurtha J. Factors affecting quality of life in children and adolescents with hypermobile Ehlers-Danlos syndrome/hypermobility spectrum disorders. Am J Med Genet A. 2019 Apr;179(4):561-569. doi: 10.1002/ajmg.a.61055. Epub 2019 Jan 31. PMID: 30703284; PMCID: PMC7029373.
  • Berglund B, Nordström G, Lützén K. Living a restricted life with Ehlers-Danlos syndrome (EDS). Int J Nurs Stud. 2000 Apr;37(2):111-8. doi: 10.1016/s0020-7489(99)00067-x. PMID: 10684952.

About the authors

Sarah Boufelja Y. is a Sr. Data Scientist with 8+ years of experience in Data Science and Machine Learning. In her role at the GenAII Center, she worked with key stakeholders to address their Business problems using the tools of machine learning and generative AI. Her expertise lies at the intersection of Machine Learning, Probability Theory and Optimal Transport.

Liza (Elizaveta) Zinovyeva is an Applied Scientist at AWS Generative AI Innovation Center and is based in Berlin. She helps customers across different industries to integrate Generative AI into their existing applications and workflows. She is passionate about AI/ML, finance and software security topics. In her spare time, she enjoys spending time with her family, sports, learning new technologies, and table quizzes.

Nikita Kozodoi is an Applied Scientist at the AWS Generative AI Innovation Center, where he builds and advances generative AI and ML solutions to solve real-world business problems for customers across industries. In his spare time, he loves playing beach volleyball.

Marion Eigner is a Generative AI Strategist who has led the launch of multiple Generative AI solutions. With expertise across enterprise transformation and product innovation, she specializes in empowering businesses to rapidly prototype, launch, and scale new products and services leveraging Generative AI.

Nuno Castro is a  Sr. Applied Science Manager at AWS Generative AI Innovation Center. He leads Generative AI customer engagements, helping AWS customers find the most impactful use case from ideation, prototype through to production. He’s has 17 years experience in the field in industries such as finance, manufacturing, and travel, leading ML teams for 10 years.

Aiham Taleb, PhD, is an Applied Scientist at the Generative AI Innovation Center, working directly with AWS enterprise customers to leverage Gen AI across several high-impact use cases. Aiham has a PhD in unsupervised representation learning, and has industry experience that spans across various machine learning applications, including computer vision, natural language processing, and medical imaging.

Read More

Introducing guardrails in Knowledge Bases for Amazon Bedrock

Introducing guardrails in Knowledge Bases for Amazon Bedrock

Knowledge Bases for Amazon Bedrock is a fully managed capability that helps you securely connect foundation models (FMs) in Amazon Bedrock to your company data using Retrieval Augmented Generation (RAG). This feature streamlines the entire RAG workflow, from ingestion to retrieval and prompt augmentation, eliminating the need for custom data source integrations and data flow management.

We recently announced the general availability of Guardrails for Amazon Bedrock, which allows you to implement safeguards in your generative artificial intelligence (AI) applications that are customized to your use cases and responsible AI policies. You can create multiple guardrails tailored to various use cases and apply them across multiple FMs, standardizing safety controls across generative AI applications.

Today’s launch of guardrails in Knowledge Bases for Amazon Bedrock brings enhanced safety and compliance to your generative AI RAG applications. This new functionality offers industry-leading safety measures that filter harmful content and protect sensitive information in your documents, improving user experience and aligning with organizational standards.

Solution overview

Knowledge Bases for Amazon Bedrock allows you to configure your RAG applications to query your knowledge base using the RetrieveAndGenerate API, generating responses from the retrieved information.

By default, knowledge bases allow your RAG applications to query the entire vector database, accessing all records and retrieving relevant results. This may lead to the generation of inappropriate or undesirable content or provide sensitive information, which could potentially violate certain policies or guidelines set by your company. Integrating guardrails with your knowledge base provides a mechanism to filter and control the generated output, complying with predefined rules and regulations.

The following diagram illustrates an example workflow.

guardrails in knowledge bases workflow

When you test the knowledge base using the Amazon Bedrock console or call the RetrieveAndGenerate API using one of the AWS SDKs, the system generates a query embedding and performs a semantic search to retrieve similar documents from the vector store.

The query is then augmented to have the retrieved document chunks, prompt, and guardrails configuration. Guardrails are applied to check for denied topics and filter out harmful content before the augmented query is sent to the InvokeModel API. Finally, the InvokeModel API generates a response from the large language model (LLM), making sure the output is free of any undesirable content.

In the following sections, we demonstrate how to create a knowledge base with guardrails. We also compare query results using the same knowledge base with and without guardrails.

Use cases for guardrails with Knowledge Bases for Amazon Bedrock

The following are common use cases for integrating guardrails in the knowledge base:

  • Internal knowledge management for a legal firm — This helps legal professionals search through case files, legal precedents, and client communications. Guardrails can prevent the retrieval of confidential client information and filter out inappropriate language. For instance, a lawyer might ask, “What are the key points from the latest case law on intellectual property?” and guardrails will make sure no confidential client details or inappropriate language are included in the response, maintaining the integrity and confidentiality of the information.
  • Conversational search for financial services — This enables financial advisors to search through investment portfolios, transaction histories, and market analyses. Guardrails can prevent the retrieval of unauthorized investment advice and filter out content that violates regulatory compliance. An example query could be, “What are the recent performance metrics for our high-net-worth clients?” with guardrails making sure only permissible information is shared.
  • Customer support for an ecommerce platform — This allows customer service representatives to access order histories, customer queries, and product details. Guardrails can block sensitive customer data (like names, emails, or addresses) from being exposed in responses. For example, when a representative asks, “Can you summarize the recent complaints about our new product line?” guardrails will redact any personally identifiable information (PII), enforcing privacy and compliance with data protection regulations.

Prepare a dataset for Knowledge Bases for Amazon Bedrock

For this post, we use a sample dataset containing multiple fictional emergency room reports, such as detailed procedural notes, preoperative and postoperative diagnoses, and patient histories. These records illustrate how to integrate knowledge bases with guardrails and query them effectively.

  1. If you want to follow along in your AWS account, download the file. Each medical record is a Word document.
  2. We store the dataset in an Amazon Simple Storage Service (Amazon S3) bucket. For instructions to create a bucket, see Creating a bucket.
  3. Upload the unzipped files to this S3 bucket.

Create a knowledge base for Amazon Bedrock

For instructions to create a new knowledge base, see Create a knowledge base. For this example, we use the following settings:

  1. On the Configure data source page, under Amazon S3, choose the S3 bucket with your dataset.
  2. Under Chunking strategy, select No chunking because the documents in the dataset are preprocessed to be within a certain length.
  3. In the Embeddings model section, choose model Titan G1 Embeddings – Text.
  4. In the Vector database section, choose Quick create a new vector store.

Create knowledge bases

Synchronize the dataset with the knowledge base

After you create the knowledge base, and your data files are in an S3 bucket, you can start the incremental ingestion. For instructions, see Sync to ingest your data sources into the knowledge base.

While you wait for the sync job to finish, you can move on to the next section, where you create guardrails.

Create a guardrail on the Amazon Bedrock console

Complete the following steps to create a guardrail:

  1. On the Amazon Bedrock console, choose Guardrails in the navigation pane.
  2. Choose Create guardrail.
  3. On the Provide guardrail details page, under Guardrail details, provide a name and optional description for the guardrail.
  4. In the Denied topics section, add the information for two topics as shown in the following screenshot.
  5. In the Add sensitive information filters section, under PII types, add all the PII types.
  6. Choose Create guardrail.

Create guardrails

Query the knowledge base on the Amazon Bedrock console

Let’s now test our knowledge base with guardrails:

  1. On the Amazon Bedrock console, choose Knowledge bases in the navigation pane.
  2. Choose the knowledge base you created.
  3. Choose Test knowledge base.
  4. Choose the Configurations icon, then scroll down to Guardrails.

The following screenshots show some side-by-side comparisons of querying a knowledge base without (left) and with (right) guardrails.

The first example illustrates querying against denied topics.

Query with and without guardrails knowledge bases

Next, we query data that contains PII.

Query with and without guardrails knowledge bases

Finally, we query about another denied topic.

Query with and without guardrails knowledge bases

Query the knowledge base with using the AWS SDK

You can use the following sample code to query the knowledge base with guardrails using the AWS SDK for Python (Boto3):

import boto3
client = boto3.client('bedrock-agent-runtime')
response = client.retrieve_and_generate(
    input={
        'text': 'Example input text'
    },
    retrieveAndGenerateConfiguration={
        'knowledgeBaseConfiguration': {
            'generationConfiguration': {
                'guardrailConfiguration': {
                    'guardrailId': 'your-guardrail-id',
                    'guardrailVersion': 'your-guardrail-version'
                }
            },
            'knowledgeBaseId': 'your-knowledge-base-id',
            'modelArn': 'your-model-arn'
        },
        'type': 'KNOWLEDGE_BASE'
    },
    sessionId='your-session-id'
)

Clean up

To clean up your resources, complete the following steps:

  1. Delete the knowledge base:
    1. On the Amazon Bedrock console, choose Knowledge bases under Orchestration in the navigation pane.
    2. Choose the knowledge base you created.
    3. Take note of the AWS Identity and Access Management (IAM) service role name in the Knowledge base overview
    4. In the Vector database section, take note of the Amazon OpenSearch Serverless collection ARN.
    5. Choose Delete, then enter delete to confirm.
  2. Delete the vector database:
    1. On the Amazon OpenSearch Service console, choose Collections under Serverless in the navigation pane.
    2. Enter the collection ARN you saved in the search bar.
    3. Select the collection and chose Delete.
    4. Enter confirm in the confirmation prompt, then choose Delete.
  3. Delete the IAM service role:
    1. On the IAM console, choose Roles in the navigation pane.
    2. Search for the role name you noted earlier.
    3. Select the role and choose Delete.
    4. Enter the role name in the confirmation prompt and delete the role.
  4. Delete the sample dataset:
    1. On the Amazon S3 console, navigate to the S3 bucket you used.
    2. Select the prefix and files, then choose Delete.
    3. Enter permanently delete in the confirmation prompt to delete.

Conclusion

In this post, we covered the integration of guardrails with Knowledge Bases for Amazon Bedrock. With this, you can benefit from a robust and customizable safety framework that aligns with your application’s unique requirements and responsible AI practices. This integration aims to enhance the overall security, compliance, and responsible usage of foundation models within the knowledge base ecosystem, providing you with greater control and confidence in your AI-driven applications.

For pricing information, visit Amazon Bedrock Pricing. To get started using Knowledge Bases for Amazon Bedrock, refer to Create a knowledge base. For deep-dive technical content and to learn how our Builder communities are using Amazon Bedrock in their solutions, visit our community.aws website.


About the Authors

Hardik Vasa is a Senior Solutions Architect at AWS. He focuses on Generative AI and Serverless technologies, helping customers make the best use of AWS services. Hardik shares his knowledge at various conferences and workshops. In his free time, he enjoys learning about new tech, playing video games, and spending time with his family.

Bani Sharma is a Sr Solutions Architect with Amazon Web Services (AWS), based out of Denver, Colorado. As a Solutions Architect, she works with a large number of Small and Medium businesses, and provides technical guidance and solutions on AWS. She has an area of depth in Containers, modernization and currently working on gaining depth in Generative AI. Prior to AWS, Bani worked in various technical roles for a large Telecom provider and worked as a Senior Developer for a multi-national bank.

Read More

Prompt engineering techniques and best practices: Learn by doing with Anthropic’s Claude 3 on Amazon Bedrock

Prompt engineering techniques and best practices: Learn by doing with Anthropic’s Claude 3 on Amazon Bedrock

You have likely already had the opportunity to interact with generative artificial intelligence (AI) tools (such as virtual assistants and chatbot applications) and noticed that you don’t always get the answer you are looking for, and that achieving it may not be straightforward. Large language models (LLMs), the models behind the generative AI revolution, receive instructions on what to do, how to do it, and a set of expectations for their response by means of a natural language text called a prompt. The way prompts are crafted greatly impacts the results generated by the LLM. Poorly written prompts will often lead to hallucinations, sub-optimal results, and overall poor quality of the generated response, whereas good-quality prompts will steer the output of the LLM to the output we want.

In this post, we show how to build efficient prompts for your applications. We use the simplicity of Amazon Bedrock playgrounds and the state-of-the-art Anthropic’s Claude 3 family of models to demonstrate how you can build efficient prompts by applying simple techniques.

Prompt engineering

Prompt engineering is the process of carefully designing the prompts or instructions given to generative AI models to produce the desired outputs. Prompts act as guides that provide context and set expectations for the AI. With well-engineered prompts, developers can take advantage of LLMs to generate high-quality, relevant outputs. For instance, we use the following prompt to generate an image with the Amazon Titan Image Generation model:

An illustration of a person talking to a robot. The person looks visibly confused because he can not instruct the robot to do what he wants.

We get the following generated image.

Let’s look at another example. All the examples in this post are run using Claude 3 Haiku in an Amazon Bedrock playground. Although the prompts can be run using any LLM, we discuss best practices for the Claude 3 family of models. In order to get access to the Claude 3 Haiku LLM on Amazon Bedrock, refer to Model access.

We use the following prompt:

What is 10 + 10?

Claude 3 Haiku’s response:


10 + 10 is 20

The request prompt is actually very ambiguous. 10 + 10 may have several valid answers; in this case, Claude 3 Haiku, using its internal knowledge, determined that 10 + 10 is 20. Let’s change the prompt to get a different answer for the same question:

1 + 1 is an addition
1 - 1 is a substraction
1 * 1 is multiplication
1 / 1 is a division

What is 10 + 10?

Claude 3 Haiku’s response:

10 + 10 is an addition. The answer is 20.

The response changed accordingly by specifying that 10 + 10 is an addition. Additionally, although we didn’t request it, the model also provided the result of the operation. Let’s see how, through a very simple prompting technique, we can obtain an even more succinct result:

1 + 1 is an addition
1 - 1 is a substraction
1 * 1 is multiplication
1 / 1 is a division

What is 10 + 10?

Answer only as in the examples provided and 
provide no additional information.

Claude 3 Haiku response:

10 + 10 is an addition.

Well-designed prompts can improve user experience by making AI responses more coherent, accurate, and useful, thereby making generative AI applications more efficient and effective.

The Claude 3 model family

The Claude 3 family is a set of LLMs developed by Anthropic. These models are built upon the latest advancements in natural language processing (NLP) and machine learning (ML), allowing them to understand and generate human-like text with remarkable fluency and coherence. The family is comprised of three models: Haiku, Sonnet, and Opus.

Haiku is the fastest and most cost-effective model on the market. It is a fast, compact model for near-instant responsiveness. For the vast majority of workloads, Sonnet is two times faster than Claude 2 and Claude 2.1, with higher levels of intelligence, and it strikes the ideal balance between intelligence and speed—qualities especially critical for enterprise use cases. Opus is the most advanced, capable, state-of-the-art foundation model (FM) with deep reasoning, advanced math, and coding abilities, with top-level performance on highly complex tasks.

Among the key features of the model’s family are:

  • Vision capabilities – Claude 3 models have been trained to not only understand text but also images, charts, diagrams, and more.
  • Best-in-class benchmarks – Claude 3 exceeds existing models on standardized evaluations such as math problems, programming exercises, and scientific reasoning. Specifically, Opus outperforms its peers on most of the common evaluation benchmarks for AI systems, including undergraduate level expert knowledge (MMLU), graduate level expert reasoning (GPQA), basic mathematics (GSM8K), and more. It exhibits high levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence.
  • Reduced hallucination – Claude 3 models mitigate hallucination through constitutional AI techniques that provide transparency into the model’s reasoning, as well as improved accuracy. Claude 3 Opus shows an estimated twofold gain in accuracy over Claude 2.1 on difficult open-ended questions, reducing the likelihood of faulty responses.
  • Long context window – Claude 3 models excel at real-world retrieval tasks with a 200,000-token context window, the equivalent of 500 pages of information.

To learn more about the Claude 3 family, see Unlocking Innovation: AWS and Anthropic push the boundaries of generative AI together, Anthropic’s Claude 3 Sonnet foundation model is now available in Amazon Bedrock, and Anthropic’s Claude 3 Haiku model is now available on Amazon Bedrock.

The anatomy of a prompt

As prompts become more complex, it’s important to identify its various parts. In this section, we present the components that make up a prompt and the recommended order in which they should appear:

  1. Task context: Assign the LLM a role or persona and broadly define the task it is expected to perform.
  2. Tone context: Set a tone for the conversation in this section.
  3. Background data (documents and images): Also known as context. Use this section to provide all the necessary information for the LLM to complete its task.
  4. Detailed task description and rules: Provide detailed rules about the LLM’s interaction with its users.
  5. Examples: Provide examples of the task resolution for the LLM to learn from them.
  6. Conversation history: Provide any past interactions between the user and the LLM, if any.
  7. Immediate task description or request: Describe the specific task to fulfill within the LLMs assigned roles and tasks.
  8. Think step-by-step: If necessary, ask the LLM to take some time to think or think step by step.
  9. Output formatting: Provide any details about the format of the output.
  10. Prefilled response: If necessary, prefill the LLMs response to make it more succinct.

The following is an example of a prompt that incorporates all the aforementioned elements:

Human: You are a solutions architect working at Amazon Web Services (AWS) 
named John Doe.

Your goal is to answer customers' questions regarding AWS best architectural
practices and principles.
Customers may be confused if you don't respond in the character of John. You should maintain a friendly customer service tone. Answer the customers' questions using the information provided below <context>{{CONTEXT}}</context> Here are some important rules for the interaction: - Always stay in character, as John, a solutions architect that
work at AWS.
- If you are unsure how to respond, say "Sorry, I didn't understand that.
Could you repeat the question?"
- If someone asks something irrelevant, say, "Sorry, I am John and I give AWS
architectural advise. Do you have an AWS architecture question today I can
help you with?"
Here is an example of how to respond in a standard interaction:
<example>
User: Hi, what do you do?
John: Hello! My name is John, and I can answer your questions about best
architectural practices on AWS. What can I help you with today?
</example>

Here is the conversation history (between the user and you) prior to the
question. It could be empty if there is no history:
<history>{{HISTORY}}</history>

Here is the user's question: <question>{{QUESTION}}</question>
How do you respond to the user's question?

Think about your answer first before you respond. Put your response in <response></responses> Assistant: <response>

Best prompting practices with Claude 3

In the following sections, we dive deep into Claude 3 best practices for prompt engineering.

Text-only prompts

For prompts that deal only with text, follow this set of best practices to achieve better results:

  • Mark parts of the prompt with XLM tags – Claude has been fine-tuned to pay special attention to XML tags. You can take advantage of this characteristic to clearly separate sections of the prompt (instructions, context, examples, and so on). You can use any names you prefer for these tags; the main idea is to delineate in a clear way the content of your prompt. Make sure you include <> and </> for the tags.
  • Always provide good task descriptions – Claude responds well to clear, direct, and detailed instructions. When you give an instruction that can be interpreted in different ways, make sure that you explain to Claude what exactly you mean.
  • Help Claude learn by example – One way to enhance Claude’s performance is by providing examples. Examples serve as demonstrations that allow Claude to learn patterns and generalize appropriate behaviors, much like how humans learn by observation and imitation. Well-crafted examples significantly improve accuracy by clarifying exactly what is expected, increase consistency by providing a template to follow, and boost performance on complex or nuanced tasks. To maximize effectiveness, examples should be relevant, diverse, clear, and provided in sufficient quantity (start with three to five examples and experiment based on your use case).
  • Keep the responses aligned to your desired format – To get Claude to produce output in the format you want, give clear directions, telling it exactly what format to use (like JSON, XML, or markdown).
  • Prefill Claude’s response – Claude tends to be chatty in its answers, and might add some extra sentences at the beginning of the answer despite being instructed in the prompt to respond with a specific format. To improve this behavior, you can use the assistant message to provide the beginning of the output.
  • Always define a persona to set the tone of the response – The responses given by Claude can vary greatly depending on which persona is provided as context for the model. Setting a persona helps Claude set the proper tone and vocabulary that will be used to provide a response to the user. The persona guides how the model will communicate and respond, making the conversation more realistic and tuned to a particular personality. This is especially important when using Claude as the AI behind a chat interface.
  • Give Claude time to think – As recommended by Anthropic’s research team, giving Claude time to think through its response before producing the final answer leads to better performance. The simplest way to encourage this is to include the phrase “Think step by step” in your prompt. You can also capture Claude’s step-by-step thought process by instructing it to “please think about it step-by-step within <thinking></thinking> tags.”
  • Break a complex task into subtasks – When dealing with complex tasks, it’s a good idea to break them down and use prompt chaining with LLMs like Claude. Prompt chaining involves using the output from one prompt as the input for the next, guiding Claude through a series of smaller, more manageable tasks. This improves accuracy and consistency for each step, makes troubleshooting less complicated, and makes sure Claude can fully focus on one subtask at a time. To implement prompt chaining, identify the distinct steps or subtasks in your complex process, create separate prompts for each, and feed the output of one prompt into the next.
  • Take advantage of the long context window – Working with long documents and large amounts of text can be challenging, but Claude’s extended context window of over 200,000 tokens enables it to handle complex tasks that require processing extensive information. This feature is particularly useful with Claude Haiku because it can help provide high-quality responses with a cost-effective model. To take full advantage of this capability, it’s important to structure your prompts effectively.
  • Allow Claude to say “I don’t know” – By explicitly giving Claude permission to acknowledge when it’s unsure or lacks sufficient information, it’s less likely to generate inaccurate responses. This can be achieved by adding a preface to the prompt, such as, “If you are unsure or don’t have enough information to provide a confident answer, simply say ‘I don’t know’ or ‘I’m not sure.’”

Prompts with images

The Claude 3 family offers vision capabilities that can process images and return text outputs. It’s capable of analyzing and understanding charts, graphs, technical diagrams, reports, and other visual assets. The following are best practices when working with images with Claude 3:

  • Image placement and size matters – For optimal performance, when working with Claude 3’s vision capabilities, the ideal placement for images is at the very start of the prompt. Anthropic also recommends resizing an image before uploading and striking a balance between image clarity and image size. For more information, refer to Anthropic’s guidance on image sizing.
  • Apply traditional techniques – When working with images, you can apply the same techniques used for text-only prompts (such as giving Claude time to think or defining a role) to help Claude improve its responses.

Consider the following example, which is an extraction of the picture “a fine gathering” (Author: Ian Kirck, https://en.m.wikipedia.org/wiki/File:A_fine_gathering_(8591897243).jpg).

We ask Claude 3 to count how many birds are in the image:

You are a bird expert who has perfect vision and pay a lot of attention to details. 
How many birds are in this picture? Before providing the number of birds in <answer>
tags, take the time to think in <thinking> tags analyzing each part of the picture 
from left to right.

Claude 3 Haiku’s response:

<thinking>
Alright, let me carefully examine this image as a bird expert with keen visual acuity.
Starting from the left, I see three birds with mottled brown and white plumage, 
which appear to be some type of shorebird like a sandpiper or plover. 
Next, there is a group of four more birds with similar mottled plumage. 
In the center, there is a striking black-headed gull or tern with a white body. 
To the right of the gull, there is another bird with the same distinctive black head and white body. 
Finally, at the far right, I can make out the silhouette of another small bird.
</thinking>

<answer>9 birds</answer>

In this example, we asked Claude to take some time to think and put its
reasoning in an XML tag and the final answer in another. Also, we gave Claude time to think and clear instructions to pay attention to details, which helped Claude to provide the correct response.

  • Take advantage of visual prompts – The ability to use images also enables you to add prompts directly within the image itself instead of providing a separate prompt.

Let’s see an example with the following image:

In this case, the image itself is the prompt:
Claude 3 Haiku’s response:

To solve for the missing side length X, we can use the Pythagorean theorem. 
The Pythagorean theorem states that in a right-angled triangle, the square of the 
length of the hypotenuse (the longest side) is equal to the sum of the squares 
of the lengths of the other two sides.
 
Given:
- One side is 12 centimeters.
- The other side is 10 centimeters.

Let's denote the missing side length as X.

Using the Pythagorean theorem:

(12)^2 + (10)^2 = X^2
144 + 100 = X^2
244 = X^2
X = √244
X ≈ 15.63 centimeters

Therefore, the length of the missing side X is approximately 15.63 centimeters.
  • Examples are also valid using images – You can provide multiple images in the same prompt and take advantage of Claude’s vision capabilities to provide examples and additional valuable information using the images. Make sure you use image tags to clearly identify the different images. Because this question is a reasoning and mathematical question, set the temperature to 0 for a more deterministic response.

Let’s look at the following example:

Prompt:

You are an expert at logical reasoning and counting figures. When you identify figures 
in a given image, you always pay special attention to all the lines and vertices that can 
form distinct smaller figures within the overall large shape.

Using the following images, image 1 and image 2

<image1>

</image1>

<image2>

</image2>

Please tell me how many triangles do you see in <image1> using as reference <image2> 
to identify the vertices of the triangles.

Here is an example of how to list out the identified triangles using the vertices:

Large triangle: 1 (ABC)

Please take the time to analyze the image in detail in <thinking> tags and put your 
response in <response> tags.

Claude 3 Haiku’s response:

<thinking>

Analyzing the image 1 in the tags <image1>, and using the image 2 in the tags <image2> as 
a reference for identifying the vertices, I can see the following triangles:

1. The large triangle formed by the outer lines (vertices A, B, C)
2. The smaller triangle formed by the lines within the larger triangle (vertices C, D, A)
3. The second smaller triangle formed by the lines within the larger triangle (vertices D, B, C)
4. The third smaller triangle formed by the lines, with vertices A, B, and D

So in total, there are 4 distinct triangles visible in the image 1.
</thinking>

<response>

In the image 1 in the tags <image1>, there are 4 triangles:

1. The large triangle ABC
2. The smaller triangle ACD
3. The second smaller triangle DBC
4. The third smaller triangle ABD

</response>
  • Use detailed descriptions when working with complicated charts or graphics – Working with charts or graphics is a relatively straightforward task when using Claude’s models. We simply take advantage of Claude’s vision capabilities, pass the charts or graphics in image format, and then ask questions about the provided images. However, when working with complicated charts that have lots of colors (which look very similar) or a lot of data points, it’s a good practice to help Claude better understand the information with the following methods:
    • Ask Claude to describe in detail each data point that it sees in the image.
    • Ask Claude to first identify the HEX codes of the colors in the graphics to clearly see the difference in colors.

Let’s see an example. We pass to Claude the following map chart in image format (source: https://ourworldindata.org/co2-and-greenhouse-gas-emissions), then we ask about Japan’s greenhouse gas emissions.

Prompt:

What are the greenhouse gas emissions for Japan in 2021? Prior to provide your response, 
please take your time to think in <thinking> tags, identifying the nine color used and 
their corresponding legend using HEX code. Then, provide your response in <response> tags.

Claude 3 Haiku’s response:

<thinking>

Analyzing the distinct colors used in the color map and their corresponding legend:

The color scale on the map uses the following 9 distinct colors:

1. Light yellow: HEX code #FFFFD0, corresponding to "0 t" emissions range
2. Pale orange: HEX code #FFD580, corresponding to "10 million t" emissions range
3. Slightly darker orange: HEX code #FFA852, corresponding to "30 million t" emissions range
4. Bright orange: HEX code #FFA040, corresponding to "100 million t" to "300 million t" emissions range
5. Darker orange: HEX code #FF7F2A, corresponding to "300 million t" to "1 billion t" emissions range
6. Reddish-orange: HEX code #FF6018, corresponding to "1 billion t" to "3 billion t" emissions range
7. Darker reddish-orange: HEX code #E64700, corresponding to "3 billion t" to "10 billion t" emissions range
8. Maroon: HEX code #8B0000, corresponding to "10 billion t" emissions range
9. No data: Shown as a gray color

The color used for Japan is the darker orange shade with HEX code #FF7F2A, which 
corresponds to the "300 million t" to "1 billion t" emissions range.

</thinking>

<response>

Based on the color analysis and the corresponding legend, the greenhouse gas emissions for 
Japan in 2021 are in the range of 300 million to 1 billion tonnes of carbon dioxide-equivalents.

</response>
  • Use narrations when working with slide decks – Similar to the use case with charts and graphics, when you want to work with slide decks, the first step is to convert the original files that you have into images, with the consideration that you need to create one image per slide. Then you’re ready to ask questions. You can also increase the accuracy of the responses or ask questions about the entire content of the slide deck. It’s a good practice to first narrate each slide and consolidate the distinct narration in a final narration. This way, you can have a narration that covers the entire content of the slide deck, which can later serve to respond to questions using a regular text workflow. This includes vector search.

Let’s see an example of narration with the following image (source: Sustainable Development Goals Report 2023, https://unstats.un.org/sdgs/report/2023/The-Sustainable-Development-Goals-Report-2023.pdf):

Prompt:
<image>

</image>

You are presenting the 2023 sustainable development goals report from OECD 
and you are currently in slide 9.

Please narrate the slide 9 in <image> tags that provides information about the ninth 
goal as if you were the presenter.

Avoid discussing any topics, particularly acronyms, unless you are completely confident 
in your understanding of their meaning and do not introduce information that is not 
explicitly presented on the current slide. Make sure you narrate every single number 
showed in the slide and all the details. If you fail to mention every number in the slide, 
some viewers who are visual impaired won’t be able to read the information by themselves.

Put your narration in <narration> tags.

Claude 3 Haiku’s response:

<narration>

This slide focuses on the 9th Sustainable Development Goal, which is to "Build Resilient 
Infrastructure, Promote Inclusive and Sustainable Industrialization and Foster Innovation".

The slide presents information on global manufacturing. It shows that global manufacturing 
growth slowed from 7.4% in 2021 to 3.3% in 2022, due to factors like inflation, energy 
price shocks, supply chain disruptions, and global economic deceleration.

The slide also highlights that CO2 emissions from energy-related sources reached a record 
high of 36.8 billion metric tons in 2022.

Furthermore, the slide indicates that less developed countries, or LDCs, are likely to miss 
their 2030 target of doubling their manufacturing share of GDP. In 2015, this share was 12.1%, 
rising to 14% in 2022, but the 2030 target is 24.2%.

The regional breakdown shows that sub-Saharan Africa has the lowest manufacturing share at 
21.7%, Europe and North America has the highest at 47.1%, and Eastern Asia is in the middle 
at 47.7%.

</narration>

In this example, we were careful to control the content of the narration. We made sure Claude didn’t mention any extra information or discuss anything it wasn’t completely confident about. We also made sure Claude covered all the key details and numbers presented in the slide. This is very important because the information from the narration in text format needs to be precise and accurate in order to be used to respond to questions.

An in-depth prompt example for information extraction

Information extraction is the process of automating the retrieval of specific information related to a specific topic from a collection of texts or documents. LLMs can extract information regarding attributes given a context and a schema. The kinds of documents that can be better analyzed with LLMs are resumes, legal contracts, leases, newspaper articles, and other documents with unstructured text.

The following prompt instructs Claude 3 Haiku to extract information from short text like posts on social media, although it can be used for much longer pieces of text like legal documents or manuals. In the following example, we use the color code defined earlier to highlight the prompt sections:

Human: You are an information extraction system. Your task is to extract key information 
from the text enclosed between <post></post> and put it in JSON.

Here are some basic rules for the task:
- Do not output your reasoning for the extraction
- Always produce complete and valid JSON objects
- If no information can be extracted or you can not produce a valid JSON object output
an empty json object "{}"
Here are some examples of how to extract information from text:
<examples>
<example_1>
<post>
"""Six months ago, Wall Street Journal reporter Evan Gershkovich was detained in Russia
during a reporting trip. He remains in a Moscow prison. We’re offering resources for
those who want to show their support for him. #IStandWithEvan https://wsj.com/Evan"""
</post>
<json>
{
"topic": "detention of a reporter",
"location": "Moscow"
"entities": ["Evan Gershkovich", "Wall Street Journal"],
"keyphrases": ["reporter", "detained", "prison"],
"sentiment": "negative",
"links": ["https://wsj.com/Evan"],
}
</json>
</example_1>

<example_2>
<post>
"""'We’re living an internal war': Once-peaceful Ecuador has become engulfed in the
cocaine trade, and the bodies are piling up."""
</post>
<json>
{
"topic": "drug war",
"location": "Ecuador",
"entities": ["Ecuador"],
"keyphrases": ["drug war", "cocaine trade"],
"sentiment": "negative",
"links": [],
}
</json>
</example_2>

</examples>

Extract information from the following post. Generate only a complete JSON object and put
it in <json></json>.
<post>
"""A postpandemic hiring spree has left airports vulnerable to security gaps as new staff
gain access to secure areas, creating an opening for criminal groups."""
</post>

Use the following JSON object definition to write your answer
<json_definition>
{
"type": "object",
"properties": {
"topic": {
"description": "the main topic of the post",
"type": "string",
"default": ""
},
"location": {
"description": "the location, if exists, where the events occur",
"type": "string",
"default": ""
},
"entities": {
"description": "the entities involved in the post",
"type": "list",
"default": []
},
"keyphrases": {
"description": "the keyphrases in the post",
"type": "list",
"default": []
},
"sentiment": {
"description": "the sentiment of the post",
"type": "string",
"default": ""
},
"links": {
"description": "any links found within the post",
"type": "list",
"default": []
}
}
}
</json_definition>

Assistant:<json>

Claude 3 Haiku’s response:

{
"topic": "airport security gaps",
"location": "",
"entities": ["airports"],
"keyphrases": ["postpandemic hiring spree", "security gaps", 
			"new staff", "secure areas", "criminal groups"],
"sentiment": "negative",
"links": []
}
</json>

The prompt incorporates the following best practices:

  • Define a persona and tone for the LLM – In this case, we specified that the LLM is an information extraction system.
  • Provide clear task descriptions – We were as specific as possible when describing the task to the LLM.
  • Specify the data you want to extract using JSON objects to define the expected output – We provided a full definition of the JSON object we want to obtain.
  • Use few-shot prompting – We showed the LLM pairs of unstructured text and information extracted.
  • Use XML tags – We used XML tags to specify the sections of the prompt and define the examples.
  • Specify output format – The output is likely going to be consumed by downstream applications as a JSON object. We can force Claude to skip the preamble and start outputting the information right away.

An in-depth prompt example for Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) is an approach in natural language generation that combines the strengths of information retrieval and language generation models. In RAG, a retrieval system first finds relevant passages or documents from a large corpus based on the input context or query. Then, a language generation model uses the retrieved information as additional context to generate fluent and coherent text. This approach aims to produce high-quality and informative text by using both the knowledge from the retrieval corpus and the language generation capabilities of deep learning models. To learn more about RAG, see What is RAG? and Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart.

The following prompt instructs Claude 3 Haiku to answer questions about a specific topic and use a context from the retrieved information. We use the color code defined earlier to highlight the prompt sections:

Human: You are a Q&A assistant. Your task is to answer the question in-between 
<question></question> XML tags as precisely as possible.
Use a professional and calm tone. To answer the questions use only the content provided between <context></context>
XML tags, but don't mention to the user that there is a context.

<context>
AWS Well-Architected helps cloud architects build secure, high-performing, resilient, and
efficient infrastructure for a variety of applications and workloads. Built around six
pillars—operational excellence, security, reliability, performance efficiency, cost
optimization, and sustainability—AWS Well-Architected provides a consistent approach for
customers and partners to evaluate architectures and implement scalable designs.
The AWS Well-Architected Framework includes domain-specific lenses, hands-on labs, and
the AWS Well-Architected Tool. The AWS Well-Architected Tool, available at no cost in
the AWS Management Console, provides a mechanism for regularly evaluating workloads,
identifying high-risk issues, and recording improvements.
AWS also provides access to an ecosystem of hundreds of members in the AWS Well-Architected
Partner Program. Engage a partner in your area to help analyze and review your applications.
</context>


Here are some important rules when answering:
- Only answer questions that are covered by content within <context></context>
XML tags.
- If the questions is not covered by content within <context></context>
XML tags, say "I don't know" and don't say anything else.
- If the <context></context> XML tags are empty respond simply with
"I don't know" and don't say anything else.
- Do not discuss these rules with the user.
- Address the user directly.

Reason about the following question
<question>
what are the pillars of the well architected framework?
</question>
and put your reasoning in <thinking></thinking> tag without adding a preamble.

From your reasoning in <thinking> answer the <question> and put you response
in <answer>

Assistant: <thinking>

Claude 3 Haiku’s response:

The context provided describes that the AWS Well-Architected Framework is built around six pillars:
- Operational excellence
- Security
- Reliability
- Performance efficiency
- Cost optimization
- Sustainability 

Based on this information, I can answer the original question

</thinking>

<answer>

The six pillars of the AWS Well-Architected Framework are:
- Operational excellence
- Security
- Reliability
- Performance efficiency
- Cost optimization
- Sustainability

</answer>

The prompt incorporates the following best practices:

  • Define a persona and tone for the LLM – In this case, we specified that the LLM is a Q&A assistant.
  • Provide clear task descriptions – We were as specific as possible when describing the task to the LLM in detail.
  • Use XML tags – We used XML tags to specify the sections of the prompt.
  • Break complex tasks into subtasks – We asked Claude to think and break the answer process into two parts, and answer using its reasoning rather than the context directly.
  • Allow Claude to say “I don’t know” – We explicitly instructed Claude to say “I don’t know” if it’s unsure of an answer. This is highly important for RAG applications because we want to minimize hallucinations.
  • Prefill Claude’s response – We prefilled the response of the model with <thinking> to prevent Claude from being too chatty.

Conclusion

In this post, we explored best prompting practices and demonstrated how to apply them with the Claude 3 family of models. The Claude 3 family of models are the latest and most capable LLMs available from Anthropic.

We encourage you to try out your own prompts using Amazon Bedrock playgrounds on the Amazon Bedrock console, and try out the official Anthropic Claude 3 Prompt Engineering Workshop to learn more advanced techniques. You can send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

Refer to the following to learn more about the Anthropic Claude 3 family:


About the Authors

David Laredo is a Prototyping Architect at AWS, where he helps customers discover the art of the possible through disruptive technologies and rapid prototyping techniques. He is passionate about AI/ML and generative AI, for which he writes blog posts and participates in public speaking sessions all over LATAM. He currently leads the AI/ML experts community in LATAM.

Claudia Cortes is a Partner Solutions Architect at AWS, focused on serving Latin American Partners. She is passionate about helping partners understand the transformative potential of innovative technologies like AI/ML and generative AI, and loves to help partners achieve practical use cases. She is responsible for programs such as AWS Latam Black Belt, which aims to empower partners in the Region by equipping them with the necessary knowledge and resources.

Simón Córdova is a Senior Solutions Architect at AWS, focused on bridging the gap between AWS services and customer needs. Driven by an insatiable curiosity and passion for generative AI and AI/ML, he tirelessly explores ways to leverage these cutting-edge technologies to enhance solutions offered to customers.

Gabriel Velazquez is a Sr Generative AI Solutions Architect at AWS, he currently focuses on supporting Anthropic on go-to-market strategy. Prior to working in AI, Gabriel built deep expertise in the telecom industry where he supported the launch of Canada’s first 4G wireless network. He now combines his expertise in connecting a nation with knowledge of generative AI to help customers innovate and scale.

Read More

Improve productivity when processing scanned PDFs using Amazon Q Business

Improve productivity when processing scanned PDFs using Amazon Q Business

Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and extract insights directly from the content in digital as well as scanned PDF documents in your enterprise data sources without needing to extract the text first.

Customers across industries such as finance, insurance, healthcare life sciences, and more need to derive insights from various document types, such as receipts, healthcare plans, or tax statements, which are frequently in scanned PDF format. These document types often have a semi-structured or unstructured format, which requires processing to extract text before indexing with Amazon Q Business.

The launch of scanned PDF document support with Amazon Q Business can help you seamlessly process a variety of multi-modal document types through the AWS Management Console and APIs, across all supported Amazon Q Business AWS Regions. You can ingest documents, including scanned PDFs, from your data sources using supported connectors, index them, and then use the documents to answer questions, provide summaries, and generate content securely and accurately from your enterprise systems. This feature eliminates the development effort required to extract text from scanned PDF documents outside of Amazon Q Business, and improves the document processing pipeline for building your generative artificial intelligence (AI) assistant with Amazon Q Business.

In this post, we show how to asynchronously index and run real-time queries with scanned PDF documents using Amazon Q Business.

Solution overview

You can use Amazon Q Business for scanned PDF documents from the console, AWS SDKs, or AWS Command Line Interface (AWS CLI).

Amazon Q Business provides a versatile suite of data connectors that can integrate with a wide range of enterprise data sources, empowering you to develop generative AI solutions with minimal setup and configuration. To learn more, visit Amazon Q Business, now generally available, helps boost workforce productivity with generative AI.

After your Amazon Q Business application is ready to use, you can directly upload the scanned PDFs into an Amazon Q Business index using either the console or the APIs. Amazon Q Business offers multiple data source connectors that can integrate and synchronize data from multiple data repositories into single index. For this post, we demonstrate two scenarios to use documents: one with the direct document upload option, and another using the Amazon Simple Storage Service (Amazon S3) connector. If you need to ingest documents from other data sources, refer to Supported connectors for details on connecting additional data sources.

Index the documents

In this post, we use three scanned PDF documents as examples: an invoice, a health plan summary, and an employment verification form, along with some text documents.

The first step is to index these documents. Complete the following steps to index documents using the direct upload feature of Amazon Q Business. For this example, we upload the scanned PDFs.

  1. On the Amazon Q Business console, choose Applications in the navigation pane and open your application.
  2. Choose Add data source.
  3. Choose Upload Files.
  4. Upload the scanned PDF files.

You can monitor the uploaded files on the Data sources tab. The Upload status changes from Received to Processing to Indexed or Updated, as which point the file has been successfully indexed into the Amazon Q Business data store. The following screenshot shows the successfully indexed PDFs.

Indexed documents in uploaded files section.

The following steps demonstrate how to integrate and synchronize documents using an Amazon S3 connector with Amazon Q Business. For this example, we index the text documents.

  1. On the Amazon Q Business console, choose Applications in the navigation pane and open your application.
  2. Choose Add data source.
  3. Choose Amazon S3 for the connector.
  4. Enter the information for Name, VPC and security group settings, IAM role, and Sync mode.
  5. To finish connecting your data source to Amazon Q Business, choose Add data source.
  6. In the Data source details section of your connector details page, choose Sync now to allow Amazon Q Business to begin syncing (crawling and ingesting) data from your data source.

When the sync job is complete, your data source is ready to use. The following screenshot shows all five documents (scanned and digital PDFs, and text files) are successfully indexed.

Amazon S3 connector

The following screenshot shows a comprehensive view of the two data sources: the directly uploaded documents and the documents ingested through the Amazon S3 connector.

Amazon Q Business data sources.

Now let’s run some queries with Amazon Q Business on our data sources.

Queries on dense, unstructured, scanned PDF documents

Your documents might be dense, unstructured, scanned PDF document types. Amazon Q Business can identify and extract the most salient information-dense text from it. In this example, we use the multi-page health plan summary PDF we indexed earlier. The following screenshot shows an example page.

Health plan summary document.

This is an example of a health plan summary document.

In the Amazon Q Business web UI, we ask “What is the annual total out-of-pocket maximum, mentioned in the health plan summary?”

Amazon Q Business searches the indexed document, retrieves the relevant information, and generates an answer while citing the source for its information. The following screenshot shows the sample output.

Amazon Q Business output

Queries on structured, tabular, scanned PDF documents

Documents might also contain structured data elements in tabular format. Amazon Q Business can automatically identify, extract, and linearize structured data from scanned PDFs to accurately resolve any user queries. In the following example, we use the invoice PDF we indexed earlier. The following screenshot shows an example.

Invoice

This is an example of an invoice.

In the Amazon Q Business web UI, we ask “How much were the headphones charged in the invoice?”

Amazon Q Business searches the indexed document and retrieves the answer with reference to the source document. The following screenshot shows that Amazon Q Business is able to extract bill information from the invoice.

Amazon Q Business output

Queries on semi-structured forms

Your documents might also contain semi-structured data elements in a form, such as key-value pairs. Amazon Q Business can accurately satisfy queries related to these data elements by extracting specific fields or attributes that are meaningful for the queries. In this example, we use the employment verification PDF. The following screenshot shows an example.

Employment verification sample

This is an example of an employment verification form.

In the Amazon Q Business web UI, we ask “What is the applicant’s date of employment in the employment verification form?” Amazon Q Business searches the indexed employment verification document and retrieves the answer with reference to the source document.

Amazon Q Business output

Index documents using the AWS CLI

In this section, we show you how to use the AWS CLI to ingest structured and unstructured documents stored in an S3 bucket into an Amazon Q Business index. You can quickly retrieve detailed information about your documents, including their statuses and any errors occurred during indexing. If you’re an existing Amazon Q Business user and have indexed documents in various formats, such as scanned PDFs and other supported types, and you now want to reindex the scanned documents, complete the following steps:

  1.  Check the status of each document to filter failed documents according to the status "DOCUMENT_FAILED_TO_INDEX". You can filter the documents based on this error message:

"errorMessage": "Document cannot be indexed since it contains no text to index and search on. Document must contain some text."

If you’re a new user and haven’t indexed any documents, you can skip this step.

The following is an example of using the ListDocuments API to filter documents with a specific status and their error messages:

aws qbusiness list-documents --region <region> 
--application-id <application-id> 
--index-id <index-id> 
--query "documentDetailList[?status=='DOCUMENT_FAILED_TO_INDEX'].{DocumentId:documentId, ErrorMessage:error.errorMessage}"
--output json

The following screenshot shows the AWS CLI output with a list of failed documents with error messages.

List of failed documents

Now you batch-process the documents. Amazon Q Business supports adding one or more documents to an Amazon Q Business index.

  1. Use the BatchPutDocument API to ingest multiple scanned documents stored in an S3 bucket into the index:
    aws qbusiness batch-put-document —region <region> 
    --documents '[{ "id":"s3://<your-bucket-path>/<scanned-pdf-document1>","content":{"s3":{"bucket":"<your-bucket> ","key":"<scanned-pdf-document1>"}}}, { "id":"s3://<your-bucket-path>/<scanned-pdf-document2>","content":{"s3":{"bucket":" <your-bucket>","key":"<scanned-pdf-document2>"}}}]' 
    --application-id <application-id> 
    --index-id <index-id> 
    --endpoint-url <application-endpoint-url> 
    --role-arn <role-arn> 
    --no-verify-ssl

The following screenshot shows the AWS CLI output. You should see failed documents as an empty list.

List of failed documents

  1. Finally, use the ListDocuments API again to review if all documents were indexed properly:
    aws qbusiness list-documents --region <region> 
    --application-id <application-id> 
    --index-id <index-id> 
    --endpoint-url <application-endpoint-url> 
    --no-verify-ssl

The following screenshot shows that the documents are indexed in the data source.

List of indexed documents

Clean up

If you created a new Amazon Q Business application and don’t plan to use it further, unsubscribe and remove assigned users from the application and delete it so that your AWS account doesn’t accumulate costs. Moreover, if you don’t need to use the indexed data sources further, refer to Managing Amazon Q Business data sources for instructions to delete your indexed data sources.

Conclusion

This post demonstrated the support for scanned PDF document types with Amazon Q Business. We highlighted the steps to sync, index, and query supported document types—now including scanned PDF documents—using generative AI with Amazon Q Business. We also showed examples of queries on structured, unstructured, or semi-structured multi-modal scanned documents using the Amazon Q Business web UI and AWS CLI.

To learn more about this feature, refer to Supported document formats in Amazon Q Business. Give it a try on the Amazon Q Business console today! For more information, visit Amazon Q Business and the Amazon Q Business User Guide. You can send feedback to AWS re:Post for Amazon Q or through your usual AWS support contacts.


About the Authors

Sonali Sahu is leading the Generative AI Specialist Solutions Architecture team in AWS. She is an author, thought leader, and passionate technologist. Her core area of focus is AI and ML, and she frequently speaks at AI and ML conferences and meetups around the world. She has both breadth and depth of experience in technology and the technology industry, with industry expertise in healthcare, the financial sector, and insurance.

Chinmayee Rane is a Generative AI Specialist Solutions Architect at AWS. She is passionate about applied mathematics and machine learning. She focuses on designing intelligent document processing and generative AI solutions for AWS customers. Outside of work, she enjoys salsa and bachata dancing.

Himesh Kumar is a seasoned Senior Software Engineer, currently working at Amazon Q Business in AWS. He is passionate about building distributed systems in the generative AI/ML space. His expertise extends to develop scalable and efficient systems, ensuring high availability, performance, and reliability. Beyond the technical skills, he is dedicated to continuous learning and staying at the forefront of technological advancements in AI and machine learning.

Qing Wei is a Senior Software Developer for Amazon Q Business team in AWS, and passionate about building modern applications using AWS technologies. He loves community-driven learning and sharing of technology especially for machine learning hosting and inference related topics. His main focus right now is on building serverless and event-driven architectures for RAG data ingestion.

Read More