How GoDaddy built Lighthouse, an interaction analytics solution to generate insights on support interactions using Amazon Bedrock

How GoDaddy built Lighthouse, an interaction analytics solution to generate insights on support interactions using Amazon Bedrock

This post is co-written with Mayur Patel, Nick Koenig, and Karthik Jetti from GoDaddy.

GoDaddy empowers everyday entrepreneurs by providing all the help and tools to succeed online. With 21 million customers worldwide, GoDaddy’s global solutions help seamlessly connect entrepreneurs’ identity and presence with commerce, leading to profitable growth. At GoDaddy, we take pride in being a data-driven company. Our relentless pursuit of valuable insights from data fuels our business decisions and works to achieve customer satisfaction.

In this post, we discuss how GoDaddy’s Care & Services team, in close collaboration with the  AWS GenAI Labs team, built Lighthouse—a generative AI solution powered by Amazon Bedrock. Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the AWS tools without having to manage infrastructure. With Amazon Bedrock, GoDaddy’s Lighthouse mines insights from customer care interactions using crafted prompts to identify top call drivers and reduce friction points in customers’ product and website experiences, leading to improved customer experience.

GoDaddy’s business challenge

Data has always been a competitive advantage for GoDaddy, as has the Care & Services team . We realize the potential to derive meaningful insights from this data and identify key call drivers and pain points. In the world before generative AI, however, the technology for mining insights from unstructured data was computationally expensive and challenging to operationalize.

Solution overview

This changed with GoDaddy Lighthouse, a generative AI-powered interactions analytics solution, which unlocks the rich mine of insights sitting within our customer care transcript data. Fed by customer care interactions data, it enables scale for deep and actionable analysis, allowing us to:

  • Detect and size customer friction points in our product and website experiences, leading to improvements in customer experience (CX) and retention
  • Improve customer care operations, including quality assurance and routing optimization, leading to improvements in CX and operational expenditure (OpEx)
  • Deprecate our reliance on costly vendor solutions for voice analytics

The following diagram illustrates the high-level business workflow of Lighthouse.

GoDaddy Lighthouse is an insights solution powered by large language models (LLMs) that allows prompt engineers throughout the company to craft, manage, and evaluate prompts using a portal where they can interact with an LLM of their choice. By engineering prompts that run against an LLM, we can systematically derive powerful and standardized insights across text-based data. Product subject matter experts use the Lighthouse platform UI to test and iterate on generative AI prompts that produce tailored insights about a Care & Services interaction.

The below diagram shows the iterative process of creating and strengthening the prompts.

After the prompts are tested and confirmed to work as intended, they are deployed into production, where they are scaled across thousands of interactions. Then, the insights produced for each interaction are aggregated and visualized in dashboards and other analytical tools. Additionally, Lighthouse lets GoDaddy users craft one-time generative AI prompts to reveal rich insights for a highly specific customer scenario.

Let’s dive into how the Lighthouse architecture and features support users in generating insights. The following diagram illustrates the Lighthouse architecture on AWS.

The Lighthouse UI is powered by data generated from Amazon Bedrock LLM calls on thousands of transcripts, utilizing a library of prompts from GoDaddy’s internal prompt catalog. The UI facilitates the selection of LLM model based on the user’s choice, making the solution independent of one model. These LLM calls are processed sequentially using Amazon EMR and Amazon EMR Serverless. The seamless integration of backend data into the UI is facilitated by Amazon API Gateway and Amazon Lambdas functions, while the UI/UX is supported by AWS Fargate and Elastic Load Balancing to maintain high availability. For data storage and retrieval, Lighthouse employs a combination of Amazon DynamoDB, Amazon Simple Storage Service (Amazon S3), and Amazon Athena. Visual data analysis and representation are achieved through dashboards built on Tableau and Amazon QuickSight.

Prompt evaluation

Lighthouse offers a unique proposition by allowing users to evaluate their one-time generative AI prompts using an LLM of their choice. This feature empowers users to write a new one-time prompt specifically for evaluation purposes. Lighthouse processes this new prompt using the actual transcript and response from a previous LLM call.

This capability is particularly valuable for users seeking to refine their prompts through multiple iterations. By iteratively adjusting and evaluating their prompts, users can progressively enhance and solidify the effectiveness of their queries. This iterative refinement process makes sure that users can achieve the highest-quality outputs tailored to their specific needs.

The flexibility and precision offered by this feature make Lighthouse an indispensable tool for anyone trying to optimize their interactions with LLMs, fostering continuous improvement and innovation in prompt engineering.

The following screenshot illustrates how Lighthouse lets users validate the accuracy of the model response with an evaluation prompt

After a prompt is evaluated for quality and the user is satisfied with the results, the prompt can be promoted into the prompt catalog.

Response summarization

After the user submits their prompt, Lighthouse processes this prompt against each available transcript, generating a series of responses. The user can then view the generated responses for that query on a dedicated page. This page serves as a valuable resource, allowing users to review the responses in detail and even download them into an Excel sheet for further analysis.

However, the sheer volume of responses can sometimes make this task overwhelming. To address this, Lighthouse offers a feature that allows users to pass these responses through a new prompt for summarization. This functionality enables users to obtain concise, single-line summaries of the responses, significantly simplifying the review process and enhancing efficiency.

The following screenshot shows an example of the prompt with which Lighthouse lets users meta-analyze all responses into one, reducing the time needed to review each response individually.

With this summarization tool, users can quickly distill large sets of data into easily digestible insights, streamlining their workflow and making Lighthouse an indispensable tool for data analysis and decision-making.

Insights

Lighthouse generates valuable insights, providing a deeper understanding of key focus areas, opportunities for improvement, and strategic directions. With these insights, GoDaddy can make informed, strategic decisions that enhance operational efficiency and drive revenue growth.

The following screenshot is an example of the dashboard based on insights generated by Lighthouse, showing the distribution of categories in each insight.

Through Lighthouse, we analyzed the distribution of root causes and intents across the vast number of daily calls handled by GoDaddy agents. This analysis identified the most frequent causes of escalations and factors most likely to lead to customer dissatisfaction.

Business value and impact

To date (as of the time of writing), Lighthouse has generated 15 new insights. Most notably, the team used insights from Lighthouse to quantify the impact and cost of the friction within the current process, enabling them to prioritize necessary improvements across multiple departments. This strategic approach led to a streamlined password reset process, reducing support contacts related to the password reset process and shortening resolution times, ultimately providing significant cost savings.

Other insights leading to improvements to the GoDaddy business include:

  • The discovery of call routing flows suboptimal to profit per interaction
  • Understanding the root cause of repeat contact interactions

Conclusion

GoDaddy’s Lighthouse, powered by Amazon Bedrock, represents a transformative leap in using generative AI to unlock the value hidden within unstructured customer interaction data. By scaling deep analysis and generating actionable insights, Lighthouse empowers GoDaddy to enhance customer experiences, optimize operations, and drive business growth. As a testament to its success, Lighthouse has already delivered financial and operational improvements, solidifying GoDaddy’s position as a data-driven leader in the industry.


About the Authors

Mayur Patel is a Director, Software Development in the Data & Analytics (DnA) team at GoDaddy, specializing in data engineering and AI-driven solutions. With nearly 20 years of experience in engineering, architecture, and leadership, he has designed and implemented innovative solutions to improve business processes, reduce costs, and increase revenue. His work has enabled companies to achieve their highest potential through data. Passionate about leveraging data and AI, he aims to create solutions that delight customers, enhance operational efficiency, and optimize costs. Outside of his professional life, he enjoys reading, hiking, DIY projects, and exploring new technologies.

Nick Koenig is a Senior Director of Data Analytics and has worked across GoDaddy building data solutions for the last 10 years. His first job at GoDaddy included listening to calls and finding trends, so he’s particularly proud to be involved in building an AI solution for this a decade later.

Karthik Jetti is a Senior Data Engineer in the Data & Analytics organization at GoDaddy. With more than 12 years of experience in engineering and architecture in data technologies, AI, and cloud platforms, he has produced data to support advanced analytics and AI initiatives. His work drives strategy and innovation, focusing on revenue generation and improving efficiency.

Ranjit Rajan is a Principal GenAI Lab Solutions Architect with AWS. Ranjit works with AWS customers to help them design and build data and analytics applications in the cloud.

Satveer Khurpa is a Senior Solutions Architect on the GenAI Labs team at Amazon Web Services. In this role, he uses his expertise in cloud-based architectures to develop innovative generative AI solutions for clients across diverse industries. Satveer’s deep understanding of generative AI technologies allows him to design scalable, secure, and responsible applications that unlock new business opportunities and drive tangible value.

Richa Gupta is a Solutions Architect at Amazon Web Services specializing in generative AI and AI/ML designs. She helps customers implement scalable, cloud-based solutions to use advanced AI technologies and drive business growth. She has also presented generative AI use cases in AWS Summits. Prior to joining AWS, she worked in the capacity of a software engineer and solutions architect, building solutions for large telecom operators.

Read More

Fine-tune multimodal models for vision and text use cases on Amazon SageMaker JumpStart

Fine-tune multimodal models for vision and text use cases on Amazon SageMaker JumpStart

In the rapidly evolving landscape of AI, generative models have emerged as a transformative technology, empowering users to explore new frontiers of creativity and problem-solving. These advanced AI systems have transcended their traditional text-based capabilities, now seamlessly integrating multimodal functionalities that expand their reach into diverse applications. models have become increasingly powerful, enabling a wide range of applications beyond just text generation. These models can now create striking images, generate engaging summaries, answer complex questions, and even produce code—all while maintaining a high level of accuracy and coherence. The integration of these multimodal capabilities has unlocked new possibilities for businesses and individuals, revolutionizing fields such as content creation, visual analytics, and software development.

In this post, we showcase how to fine-tune a text and vision model, such as Meta Llama 3.2, to better perform at visual question answering tasks. The Meta Llama 3.2 Vision Instruct models demonstrated impressive performance on the challenging DocVQA benchmark for visual question answering. The non-fine-tuned 11B and 90B models achieved strong ANLS (Aggregated Normalized Levenshtein Similarity) scores of 88.4 and 90.1, respectively, on the DocVQA test set. ANLS is a metric used to evaluate the performance of models on visual question answering tasks, which measures the similarity between the model’s predicted answer and the ground truth answer. However, by using the power of Amazon SageMaker JumpStart, we demonstrate the process of adapting these generative AI models to excel at understanding and responding to natural language questions about images. By fine-tuning these models using SageMaker JumpStart, we were able to further enhance their abilities, boosting the ANLS scores to 91 and 92.4. This significant improvement showcases how the fine-tuning process can equip these powerful multimodal AI systems with specialized skills for excelling at understanding and answering natural language questions about complex, document-based visual information.

For a detailed walkthrough on fine-tuning the Meta Llama 3.2 Vision models, refer to the accompanying notebook.

Overview of Meta Llama 3.2 11B and 90B Vision models

The Meta Llama 3.2 collection of multimodal and multilingual large language models (LLMs) is a collection of pre-trained and instruction-tuned generative models in a variety of sizes. The 11B and 90B models are multimodal—they support text in/text out, and text+image in/text out.

Meta Llama 3.2 11B and 90B are the first Llama models to support vision tasks, with a new model architecture that integrates image encoder representations into the language model. The new models are designed to be more efficient for AI workloads, with reduced latency and improved performance, making them suitable for a wide range of applications. All Meta Llama 3.2 models support a 128,000 context length, maintaining the expanded token capacity introduced in Meta Llama 3.1. Additionally, the models offer improved multilingual support for eight languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

DocVQA dataset

The DocVQA (Document Visual Question Answering) dataset is a widely used benchmark for evaluating the performance of multimodal AI models on visual question answering tasks involving document-style images. This dataset consists of a diverse collection of document images paired with a series of natural language questions that require both visual and textual understanding to answer correctly. By fine-tuning a generative AI model like Meta Llama 3.2 on the DocVQA dataset using Amazon SageMaker, you can equip the model with the specialized skills needed to excel at answering questions about the content and structure of complex, document-based visual information.

For more information on the dataset used in this post, see DocVQA – Datasets.

Dataset preparation for visual question and answering tasks

The Meta Llama 3.2 Vision models can be fine-tuned on image-text datasets for vision and language tasks such as visual question answering (VQA). The training data should be structured with the image, the question about the image, and the expected answer. This data format allows the fine-tuning process to adapt the model’s multimodal understanding and reasoning abilities to excel at answering natural language questions about visual content.

The input includes the following:

  • A train and an optional validation directory. Train and validation directories should contain one directory named images hosting all the image data and one JSON Lines (.jsonl) file named metadata.jsonl.
  • In the metadata.jsonl file, each example is a dictionary that contains three keys named file_name, prompt, and completion. The file_name defines the path to image data. prompt defines the text input prompt and completion defines the text completion corresponding to the input prompt. The following code is an example of the contents in the metadata.jsonl file:
{"file_name": "images/img_0.jpg", "prompt": "what is the date mentioned in this letter?", "completion": "1/8/93"}
{"file_name": "images/img_1.jpg", "prompt": "what is the contact person name mentioned in letter?", "completion": "P. Carter"}
{"file_name": "images/img_2.jpg", "prompt": "Which part of Virginia is this letter sent from", "completion": "Richmond"}

SageMaker JumpStart

SageMaker JumpStart is a powerful feature within the SageMaker machine learning (ML) environment that provides ML practitioners a comprehensive hub of publicly available and proprietary foundation models (FMs). With this managed service, ML practitioners get access to a growing list of cutting-edge models from leading model hubs and providers that you can deploy to dedicated SageMaker instances within a network isolated environment, and customize models using SageMaker for model training and deployment.

Solution overview

In the following sections, we discuss the steps to fine-tune Meta Llama 3.2 Vision models. We cover two approaches: using the Amazon SageMaker Studio UI for a no-code solution, and using the SageMaker Python SDK.

Prerequisites

To try out this solution using SageMaker JumpStart, you need the following prerequisites:

  • An AWS account that will contain all of your AWS resources.
  • An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, refer to Identity and Access Management for Amazon SageMaker.
  • Access to SageMaker Studio or a SageMaker notebook instance, or an interactive development environment (IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio for straightforward deployment and inference.

No-code fine-tuning through the SageMaker Studio UI

SageMaker JumpStart provides access to publicly available and proprietary FMs from third-party and proprietary providers. Data scientists and developers can quickly prototype and experiment with various ML use cases, accelerating the development and deployment of ML applications. It helps reduce the time and effort required to build ML models from scratch, allowing teams to focus on fine-tuning and customizing the models for their specific use cases. These models are released under different licenses designated by their respective sources. It’s essential to review and adhere to the applicable license terms before downloading or using these models to make sure they’re suitable for your intended use case.

You can access the Meta Llama 3.2 FMs through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we cover how to discover these models in SageMaker Studio.

SageMaker Studio is an IDE that offers a web-based visual interface for performing the ML development steps, from data preparation to model building, training, and deployment. For instructions on getting started and setting up SageMaker Studio, refer to Amazon SageMaker Studio.

When you’re in SageMaker Studio, you can access SageMaker JumpStart by choosing JumpStart in the navigation pane.

In the JumpStart view, you’re presented with the list of public models offered by SageMaker. You can explore other models from other providers in this view. To start using the Meta Llama 3 models, under Providers, choose Meta.

You’re presented with a list of the models available. Choose one of the Vision Instruct models, for example the Meta Llama 3.2 90B Vision Instruct model.

Here you can view the model details, as well as train, deploy, optimize, and evaluate the model. For this demonstration, we choose Train.

On this page, you can point to the Amazon Simple Storage Service (Amazon S3) bucket containing the training and validation datasets for fine-tuning. In addition, you can configure deployment configuration, hyperparameters, and security settings for fine-tuning. Choose Submit to start the training job on a SageMaker ML instance.

Deploy the model

After the model is fine-tuned, you can deploy it using the model page on SageMaker JumpStart. The option to deploy the fine-tuned model will appear when fine-tuning is finished, as shown in the following screenshot.

You can also deploy the model from this view. You can configure endpoint settings such as the instance type, number of instances, and endpoint name. You will need to accept the End User License Agreement (EULA) before you can deploy the model.

Fine-tune using the SageMaker Python SDK

You can also fine-tune Meta Llama 3.2 Vision Instruct models using the SageMaker Python SDK. A sample notebook with the full instructions can be found on GitHub. The following code example demonstrates how to fine-tune the Meta Llama 3.2 11B Vision Instruct model:

import os
import boto3
from sagemaker.jumpstart.estimator import JumpStartEstimator
model_id, model_version = "meta-vlm-llama-3-2-11b-vision-instruct", "*"

from sagemaker import hyperparameters
my_hyperparameters = hyperparameters.retrieve_default(
    model_id=model_id, model_version=model_version
)
my_hyperparameters["epoch"] = "1"
estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    environment={"accept_eula": "true"},  # Please change {"accept_eula": "true"}
    disable_output_compression=True,
    instance_type="ml.p5.48xlarge",
    hyperparameters=my_hyperparameters,
)
estimator.fit({"training": train_data_location})

The code sets up a SageMaker JumpStart estimator for fine-tuning the Meta Llama 3.2 Vision Instruct model on a custom training dataset. It configures the estimator with the desired model ID, accepts the EULA, sets the number of training epochs as a hyperparameter, and initiates the fine-tuning process.

When the fine-tuning process is complete, you can review the evaluation metrics for the model. These metrics will provide insights into the performance of the fine-tuned model on the validation dataset, allowing you to assess how well the model has adapted. We discuss these metrics more in the following sections.

You can then deploy the fine-tuned model directly from the estimator, as shown in the following code:

estimator = attached_estimator
finetuned_predictor = estimator.deploy()

As part of the deploy settings, you can define the instance type you want to deploy the model on. For the full list of deployment parameters, refer to the deploy parameters in the SageMaker SDK documentation.

After the endpoint is up and running, you can perform an inference request against it using the predictor object as follows:

q, a, image = each["prompt"], each['completion'], get_image_decode_64base(image_path=f"./docvqa/validation/{each['file_name']}")
payload = formulate_payload(q=q, image=image, instruct=is_chat_template)

ft_response = finetuned_predictor.predict(
    JumpStartSerializablePayload(payload)
)

For the full list of predictor parameters, refer to the predictor object in the SageMaker SDK documentation.

Fine-tuning quantitative metrics

SageMaker JumpStart automatically outputs various training and validation metrics, such as loss, during the fine-tuning process to help evaluate the model’s performance.

The DocVQA dataset is a widely used benchmark for evaluating the performance of multimodal AI models on visual question answering tasks involving document-style images. As shown in the following table, the non-fine-tuned Meta Llama 3.2 11B and 90B models achieved ANLS scores of 88.4 and 90.1 respectively on the DocVQA test set, as reported in the post Llama 3.2: Revolutionizing edge AI and vision with open, customizable models on the Meta AI website. After fine-tuning the 11B and 90B Vision Instruct models using SageMaker JumpStart, the fine-tuned models achieved improved ANLS scores of 91 and 92.4, demonstrating that the fine-tuning process significantly enhanced the models’ ability to understand and answer natural language questions about complex document-based visual information.

DocVQA test set (5138 examples, metric: ANLS) 11B-Instruct 90B-Instruct
Non-fine-tuned 88.4 90.1
SageMaker JumpStart Fine-tuned 91 92.4

For the fine-tuning results shown in the table, the models were trained using the DeepSpeed framework on a single P5.48xlarge instance with multi-GPU distributed training. The fine-tuning process used Low-Rank Adaptation (LoRA) on all linear layers, with a LoRA alpha of 8, LoRA dropout of 0.05, and a LoRA rank of 16. The 90B Instruct model was trained for 6 epochs, while the 11B Instruct model was trained for 4 epochs. Both models used a learning rate of 5e-5 with a linear learning rate schedule. Importantly, the Instruct models were fine-tuned using the built-in chat template format, where the loss was computed on the last turn of the conversation (the assistant’s response)

For the base model fine-tuning, you have the choice of using chat completion format or text completion format, controlled by the hyperparameter chat_template. For text completion, it is simply a concatenation of image token, prompt, and completion, where the prompt and completion part are connected by a response key ###Response:nn and loss values are computed on the completion part only.

Fine-tuning qualitative results

In addition to the quantitative evaluation metrics, you can observe qualitative differences in the model’s outputs after the fine-tuning process.

For the non-Instruct models, the fine-tuning was performed using a specific prompt template that doesn’t use the chat format. The prompt template was structured as follows:

prompt = f"![]({image})<|image|><|begin_of_text|>Read the text in the image carefully and answer the question with the text as seen exactly in the image. For yes/no questions, just respond Yes or No. If the answer is numeric, just respond with the number and nothing else. If the answer has multiple words, just respond with the words and absolutely nothing else. Never respond in a sentence or a phrase.n Question: {q}### Response:nn"

This prompt template required the model to generate a direct, concise response based on the visual information in the image, without producing additional context or commentary. The results of fine-tuning a 11 B Vision non-Instruct base model using this prompt template are shown in the following qualitative examples, demonstrating how the fine-tuning process improved the models’ ability to accurately extract and reproduce the relevant information from the document images.

Image Input prompt Pre-trained response Fine-tuned response Ground truth
What is the name of the company? ### Response:
### Response:
### Response:
### Response:
### Response:
### Response:
### Response:
###
ITC Limited itc limited
Where is the company located? 1) Opening Stock :
a) Cigarette Filter Rods
Current Year
Previous year
b) Poly Propelene
CHENNAI chennai
What the location address of NSDA? Source: https://www.
industrydocuments.ucsf
.edu/docs/qqvf0227.
<OCR/> The best thing between
1128 SIXTEENTH ST., N. W., WASHINGTON, D. C. 20036 1128 SIXTEENTH ST., N. W., WASHINGTON, D. C. 20036
What is the ‘no. of persons present’ for the sustainability committee meeting held on 5th April, 2012? 1
2
3
4
5
6
7
8
9
10
11
12
13
6 6

Clean up

After you’re done running the notebook, make sure to delete all the resources that you created in the process so your billing is stopped:

# Delete resources
finetuned_predictor.delete_model()
finetuned_predictor.delete_endpoint()

Conclusion

In this post, we discussed fine-tuning Meta Llama 3.2 Vision Instruct models using SageMaker JumpStart. We showed that you can use the SageMaker JumpStart console in SageMaker Studio or the SageMaker Python SDK to fine-tune and deploy these models. We also discussed the fine-tuning technique, instance types, and supported hyperparameters. Finally, we showcased both the quantitative metrics and qualitative results of fine-tuning the Meta Llama 3.2 Vision model on the DocVQA dataset, highlighting the model’s improved performance on visual question answering tasks involving complex document-style images.

As a next step, you can try fine-tuning these models on your own dataset using the code provided in the notebook to test and benchmark the results for your use cases.


About the Authors

Marc Karp is an ML Architect with the Amazon SageMaker Service team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.

Dr. Xin Huang is a Senior Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A.


Appendix

Language models such as Meta Llama are more than 10 GB or even 100 GB in size. Fine-tuning such large models requires instances with significantly higher CUDA memory. Furthermore, training these models can be very slow due to their size. Therefore, for efficient fine-tuning, we use the following optimizations:

  • Low-Rank Adaptation (LoRA) – To efficiently fine-tune the LLM, we employ LoRA, a type of parameter-efficient fine-tuning (PEFT) technique. Instead of training all the model parameters, LoRA introduces a small set of adaptable parameters that are added to the pre-trained model. This significantly reduces the memory footprint and training time compared to fine-tuning the entire model.
  • Mixed precision training (bf16) – To further optimize memory usage, we use mixed precision training using bfloat16 (bf16) data type. bf16 provides similar performance to full-precision float32 while using only half the memory, enabling us to train larger batch sizes and fit the model on available hardware.

The default hyperparameters are as follows:

  • Peft Type: lora – LoRA fine-tuning, which can efficiently adapt a pre-trained language model to a specific task
  • Chat Template: True – Enables the use of a chat-based template for the fine-tuning process
  • Gradient Checkpointing: True – Reduces the memory footprint during training by recomputing the activations during the backward pass, rather than storing them during the forward pass
  • Per Device Train Batch Size: 2 – The batch size for training on each device
  • Per Device Evaluation Batch Size: 2 – The batch size for evaluation on each device
  • Gradient Accumulation Steps: 2 – The number of steps to accumulate gradients for before performing an update
  • Bf16 16-Bit (Mixed) Precision Training: True – Enables the use of bfloat16 (bf16) data type for mixed precision training, which can speed up training and reduce memory usage
  • Fp16 16-Bit (Mixed) Precision Training: False – Disables the use of float16 (fp16) data type for mixed precision training
  • Deepspeed: True – Enables the use of the Deepspeed library for efficient distributed training
  • Epochs: 10 – The number of training epochs
  • Learning Rate: 6e-06 – The learning rate to be used during training
  • Lora R: 64 – The rank parameter for the LoRA fine-tuning
  • Lora Alpha: 16 – The alpha parameter for the LoRA fine-tuning
  • Lora Dropout: 0 – The dropout rate for the LoRA fine-tuning
  • Warmup Ratio: 0.1 – The ratio of the total number of steps to use for a linear warmup from 0 to the learning rate
  • Evaluation Strategy: steps – The strategy for evaluating the model during training
  • Evaluation Steps: 20 – The number of steps to use for evaluating the model during training
  • Logging Steps: 20 – The number of steps between logging training metrics
  • Weight Decay: 0.2 – The weight decay to be used during training
  • Load Best Model At End: False – Disables loading the best performing model at the end of training
  • Seed: 42 – The random seed to use for reproducibility
  • Max Input Length: -1 – The maximum length of the input sequence
  • Validation Split Ratio: 0.2 – The ratio of the training dataset to use for validation
  • Train Data Split Seed: 0 – The random seed to use for splitting the training data
  • Preprocessing Num Workers: None – The number of worker processes to use for data preprocessing
  • Max Steps: -1 – The maximum number of training steps to perform
  • Adam Beta1: 0.9 – The beta1 parameter for the Adam optimizer
  • Adam Beta2: 0.999 – The beta2 parameter for the Adam optimizer
  • Adam Epsilon: 1e-08 – The epsilon parameter for the Adam optimizer
  • Max Grad Norm: 1.0 – The maximum gradient norm to be used for gradient clipping
  • Label Smoothing Factor: 0 – The label smoothing factor to be used during training
  • Logging First Step: False – Disables logging the first step of training
  • Logging Nan Inf Filter: True – Enables filtering out NaN and Inf values from the training logs
  • Saving Strategy: no – Disables automatic saving of the model during training
  • Save Steps: 500 – The number of steps between saving the model during training
  • Save Total Limit: 1 – The maximum number of saved models to keep
  • Dataloader Drop Last: False – Disables dropping the last incomplete batch during data loading
  • Dataloader Num Workers: 32 – The number of worker processes to use for data loading
  • Eval Accumulation Steps: None – The number of steps to accumulate gradients for before performing an evaluation
  • Auto Find Batch Size: False – Disables automatically finding the optimal batch size
  • Lr Scheduler Type: constant_with_warmup – The type of learning rate scheduler to use (for example, constant with warmup)
  • Warm Up Steps: 0 – The number of steps to use for linear warmup of the learning rate

Read More

GraphRAG: Improving global search via dynamic community selection

GraphRAG: Improving global search via dynamic community selection

The image features three white icons on a gradient background transitioning from blue on the left to green on the right. The first icon, located on the left, depicts a hierarchical structure resembling a workflow with connected squares. The middle icon represents GraphRAG (interconnected nodes and lines). The third icon, on the right, shows a globe with a magnifying glass overlaying it, symbolizing global search.

Retrieval-augmented generation (RAG) allows AI systems to provide additional information and context to a large language model (LLM) when generating a response to a user query. However, traditional RAG-based methods can struggle to retrieve information that requires high-level knowledge of the entire dataset, especially with abstract and global questions such as the keywordless query: “Catch me up on the last two weeks of updates.” These types of queries are known as “global” queries, as they require holistic understanding of the dataset to answer the question. GraphRAG aims to tackle these questions in two main steps: indexing and query. The indexing engine first breaks down a collection of text documents into segments which are then clustered into hierarchical communities with entities and relationships connecting each segment up through higher levels of abstraction. We then use an LLM to generate a summary of each community, known as a community report. The indexing engine thus creates a hierarchical knowledge graph of the dataset, with each level in the hierarchy representing a different level of abstraction and summarization of the original material. In the query step, GraphRAG uses this structured knowledge to provide additional context to the LLM to help answer the question. In this blog post, we show a new method for conducting “global” queries that efficiently utilizes the knowledge graph representation and optimizes the performance of global search in GraphRAG. 

Static vs. dynamic global search

The global search (opens in new tab) algorithm in GraphRAG aims to answer abstract questions that require knowledge of the entire dataset. It generates answers by searching over communities at a predetermined level in the knowledge graph. Then the LLM combines and summarizes all the community reports at this level of abstraction. Finally, the summary is used as additional context for the LLM to generate the response to the user question. This map-reduce process allows the LLM to select relevant text from all the community reports to generate its final answer. This static approach is expensive and inefficient because it includes many lower-level reports that are not informative to the user query. Since it is unlikely that all community reports, especially at a high level, are relevant in answering the query, an approach that first considers the relevancy of the report prior to the resource-intensive map-reduce operation is highly desirable.  

Here, we introduce dynamic community selection to the global search algorithm, which leverages the knowledge graph structure of the indexed dataset. Starting from the root of the knowledge graph, we use an LLM to rate how relevant a community report is in answering the user question. If the report is deemed irrelevant, we simply remove it and its nodes (or sub-communities) from the search process. On the other hand, if the report is deemed relevant, we then traverse down its child nodes and repeat the operation. Finally, only relevant reports are passed to the map-reduce operation to generate the response to the user. Figure 1 illustrates the dynamic community selection process in action. 

An image that shows the workflow of dynamic community selection in global search. Each node illustrates a community report, and the arrow indicates the rate operation.
Figure 1: Dynamic community selection workflow

The dynamic global search approach has two main benefits. First, it prunes irrelevant reports early on, reducing the total number of community reports to be considered in the map-reduce operation. Second, it enables users to search the entire knowledge graph, instead of predefining a static community level, and can lead to more detailed answers. This allows it to collect information at various levels of abstraction. Moreover, the rating operation is a classification problem, which is considerably easier to perform than summarization and text generation, therefore, a less complex model can be used. In our experiments leveraging OpenAI’s models, a GPT-4o-mini rater achieved a very similar retrieval rate as a GPT-4o rater, while operating at a fraction of both cost and time. Overall, we use the smaller and more cost-effective model, GPT-4o-mini, in the rate operation to prune any irrelevant community reports, then we use GPT-4o to perform the map-reduce operation to generate the final response. 

Dynamic community selection on the AP News dataset

To demonstrate the cost saving that dynamic global search brings while maintaining a similar response quality, we evaluated the two methods side by side on a dataset from AP News. We tested static and dynamic search on 50 global questions and assessed the final response quality using an LLM evaluator. Moreover, we compared the total token cost of the two methods. To compare the two methods directly, we constrained the maximum search depth on dynamic global search so that both methods used the same underlying information.

We use an LLM evaluator to select the best response (i.e. win rate) on 3 key metrics: 

  • Comprehensiveness: How much detail does the answer provide to cover all the aspects and details of the question?
  • Diversity: How varied and rich is the answer in providing different perspectives and insights on the question?
  • Empowerment: How well does the answer help the reader understand and make informed judgements about the topic?

Microsoft Research Blog

Microsoft Research Forum Episode 3: Globally inclusive and equitable AI, new use cases for AI, and more

In the latest episode of Microsoft Research Forum, researchers explored the importance of globally inclusive and equitable AI, shared updates on AutoGen and MatterGen, presented novel use cases for AI, including industrial applications and the potential of multimodal models to improve assistive technologies.


Significant cost reduction while maintaining output quality

The quality of responses generated from dynamic community selection are comparable to its static counterpart while reducing the total token cost. Our LLM evaluation shows that the output quality of the two methods is similar in the three key metrics across the 50 global questions on the AP News dataset, with no statistical significance between them. More importantly, we observed a significant reduction of total token costs when using the new method, with an average cost reduction of 77% over the existing static global search at community level 1. This is due to the large number of community reports being eliminated via the rating process, thus requiring fewer prompt and output tokens needed in the map-reduce operation. For instance, the existing static global search method processes about 1,500 level 1 community reports in the map-reduce operation, while only 470 community reports on average are selected in dynamic search to generate the final answer.  

Moreover, if we allow dynamic search to continue the rating process further to deeper level community reports, we observe an improvement in its final responses. Here, we conducted the same experiment but allowed dynamic search to continue until community level 3. Out of the 50 global questions, 29 included more community reports than our static search baseline, suggesting that some community reports at deeper levels are relevant to the user question. Indeed, we observed a moderate and statistically significant improvement in both comprehensiveness and empowerment.  Using an LLM evaluator to score pairs of responses, we observe that dynamic global search scores a win rate of 58% and 60%, respectively, against static search at level 1. Nevertheless, while the rating operation is performed by a smaller model and hence induces negligible cost, it can still lead to a higher overall cost due to the increased number of community reports that the map-reduce operation processes. In this experiment, the total cost with dynamic search at level 3 is 34% higher on average. Table 1 summarizes the results of static search at level 1 against dynamic search at level 1 and 3. 

Dynamic search Num. queries Win rate against static search (level 1) Relative cost
Comprehensiveness Diversity Empowerment
Level 1 50 49.5% 46.0% 48.0% -77%
Level 3 29* 58.8% 48.0% 60.0% +34%

Table 1. LLM evaluation on dynamic search vs static search at different community levels on global questions. We show the win rates of dynamic search at level 1 and level 3 against static search at level 1, that is the number of times the LLM evaluator prefers the generated responses from dynamic search over static search. Green background indicates there is statistical significance in the win rates, and grey cells if it is not significant. The right-most column shows the average relative token cost against static search at level 1. * Note that we only evaluate answers from dynamic search at community level 3, which contains more community reports than static search at level 1. 

Overall, dynamic community selection achieved a cost reduction of 77% while achieving output quality similar to the baseline static approach. These appear to be promising numbers, but what of the result quality? Let’s examine some direct side-by-side comparisons with an example: 

Comparable response quality and detail

Query: Across the dataset, describe the common trends in vaccination rates for major diseases.
Static global search at level 1 Global search with dynamic community selection at level 3

### Common Trends in Vaccination Rates for Major Diseases

#### Decline in Vaccination Rates

A significant trend observed across various regions is the decline in vaccination rates for major diseases. For instance, Minnesota has experienced a decrease in immunization rates, leading to recent measles outbreaks [Data: Reports (335)]. Similarly, the COVID-19 pandemic has exacerbated this issue, causing a drop in childhood immunization rates in Michigan and other areas [Data: Reports (1231, 915)]. This decline is often attributed to misinformation about vaccines, particularly the false claim linking the MMR vaccine to autism [Data: Reports (912, 1335)].

#### Impact of COVID-19

The COVID-19 pandemic has had a profound impact on vaccination rates. Efforts to increase COVID-19 vaccination rates have been a major focus of public health policies, but the pandemic has also led to a decline in routine vaccinations among children [Data: Reports (920, 194)]. The pandemic has influenced public attitudes towards health measures and vaccinations, contributing to broader public health challenges [Data: Reports (920)].

#### Regional Variations

Vaccination rates vary significantly by region. West Virginia, for example, has one of the strictest school vaccination policies in the U.S., resulting in high child immunization rates [Data: Reports (1199)]. In contrast, Idaho has the highest overall childhood vaccination exemption rate in the U.S., posing significant public health risks [Data: Reports (998)]. Illinois faces challenges with low vaccination rates in schools, leading to measles outbreaks [Data: Reports (108, 82)].

#### Public Health Initiatives

Various public health initiatives have been implemented to address vaccination rates. The CDC plays a crucial role in monitoring and reporting on vaccination rates for diseases such as measles, mumps, and polio, and provides vital information on preventable diseases [Data: Reports (914)]. Programs like the ‘Do It For Babydog’ sweepstakes in West Virginia encourage vaccination among residents [Data: Reports (1199)]. Additionally, the Bill & Melinda Gates Foundation is heavily involved in funding and supporting vaccination programs for major diseases such as polio, malaria, HIV, and cholera [Data: Reports (1091)].

### Conclusion

The dataset reveals a complex landscape of vaccination rates influenced by regional policies, public health initiatives, and the impact of the COVID-19 pandemic. While some regions have managed to maintain high vaccination rates through strict policies and public health campaigns, others face significant challenges due to misinformation and exemption rates. Global efforts and collaborations remain crucial in addressing these challenges and improving vaccination coverage worldwide.

### Common Trends in Vaccination Rates for Major Diseases

#### Decline in Vaccination Rates

A significant trend observed across the dataset is the decline in vaccination rates for various diseases, including measles, mumps, rubella (MMR), and polio. This decline is particularly evident among U.S. kindergartners, where vaccination coverage has dropped from 95.2% during the 2019-2020 school year to 93.1% in the 2022-2023 school year. This reduction falls below the 95% threshold needed to achieve herd immunity, putting approximately 250,000 kindergartners at risk each year [Data: Reports (5668, 15469, 15703, 15481, 15471, +more)].

#### Impact of the COVID-19 Pandemic

The COVID-19 pandemic has had a profound impact on vaccination rates. Efforts to promote COVID-19 vaccinations have been a major focus, with initiatives in regions like Michigan and Illinois offering free meals to students and promoting updated COVID-19 shots [Data: Reports (19125, 15466)]. However, the pandemic has also disrupted routine vaccination programs, leading to lower coverage for diseases such as measles and contributing to recent outbreaks [Data: Reports (15464, 15674, 15482, 15481, +more)].

#### Regional Variations and Legislative Measures

There are notable regional variations in vaccination rates and exemption rates across the United States. For example, West Virginia has implemented strict immunization requirements for children entering school, resulting in high child immunization rates [Data: Reports (5674, 18874, 18899)]. In contrast, states like Minnesota and Illinois have seen increases in non-medical exemptions, contributing to lower vaccination rates and recent outbreaks [Data: Reports (15483, 15481, 108, 2705, +more)].

#### Efforts to Improve Vaccination Rates

Various initiatives and legislative measures have been introduced to address declining vaccination rates. For instance, the Government of Sindh introduced a polio vaccination bill that includes provisions for imprisonment for parents who do not vaccinate their children [Data: Reports (15398)]. In the U.S., the CDC has recommended new COVID-19 shots for everyone aged 6 months and older and has launched initiatives to ensure equitable access to vaccines, especially in developing countries [Data: Reports (15847, 15571, 15691, 15694, +more)].

### Conclusion

The dataset reveals a complex landscape of vaccination rates influenced by the COVID-19 pandemic, vaccine hesitancy, misinformation, and regional variations. While efforts to improve vaccination rates are ongoing, the decline in immunization coverage poses significant public health risks, highlighting the need for continued vigilance and proactive measures to ensure high vaccination rates and prevent outbreaks of vaccine-preventable diseases.

Table 2. Generated response from static search (level 1) and dynamic search (level 3) to the same global question on the AP News dataset. 

Table 2 shows an example output from static search at level 1 and dynamic search at level 3 to the same question. While the two outputs contain similar high-level topics, the response from dynamic search provided specific data such as the reduction of vaccination rates in certain demographics. We also notice that the response from dynamic search made significantly more references to the source material, indicated by “[Data Reports]” in the text. By selectively providing information that is relevant to the question, this alleviates the map-reduce operation from having to filter and process all the community reports all at once, and therefore it can generate a response that is more comprehensive and specific to the user question. 

Overall, dynamic community selection proposes an alternative method to perform global search in GraphRAG by leveraging the indexed knowledge graph and the usage of cheaper LLM models in the rate relevancy operation. These changes led to lower total token cost and potential improvements to response detail and quality. 

Availability

You can experiment with dynamic global search on the GraphRAG GitHub repository. 

Dynamic global search is the second of several major optimizations to GraphRAG that are being explored. If you are interested in optimizations for local questions, please check out our recent blog post on DRIFT search. Stay tuned for our upcoming work, where we explore a radically different approach to graph-enabled RAG that is significantly more cost-efficient while improving answer quality for both local and global questions. 

The post GraphRAG: Improving global search via dynamic community selection appeared first on Microsoft Research.

Read More

Principal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI

Principal Financial Group uses QnABot on AWS and Amazon Q Business to enhance workforce productivity with generative AI

Principal is a global financial company with nearly 20,000 employees passionate about improving the wealth and well-being of people and businesses. In business for 145 years, Principal is helping approximately 64 million customers (as of Q2, 2024) plan, protect, invest, and retire, while working to support the communities where it does business and build a diverse, inclusive workforce.

As Principal grew, its internal support knowledge base considerably expanded. This wealth of content provides an opportunity to streamline access to information in a compliant and responsible way. Principal wanted to use existing internal FAQs, documentation, and unstructured data and build an intelligent chatbot that could provide quick access to the right information for different roles. With the QnABot on AWS (QnABot), integrated with Microsoft Azure Entra ID access controls, Principal launched an intelligent self-service solution rooted in generative AI. Now, employees at Principal can receive role-based answers in real time through a conversational chatbot interface. The chatbot improved access to enterprise data and increased productivity across the organization.

In this post, we explore how Principal used QnABot paired with Amazon Q Business and Amazon Bedrock to create Principal AI Generative Experience: a user-friendly, secure internal chatbot for faster access to information.

QnABot is a multilanguage, multichannel conversational interface (chatbot) that responds to customers’ questions, answers, and feedback. It allows companies to deploy a fully functional chatbot integrated with generative AI offerings from Amazon, including Amazon Bedrock, Amazon Q Business, and intelligent search services, with natural language understanding (NLU) such as Amazon OpenSearch Service and Amazon Bedrock Knowledge Bases. With QnABot, companies have the flexibility to tier questions and answers based on need, from static FAQs to generating answers on the fly based on documents, webpages, indexed data, operational manuals, and more.

Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. It empowers employees to be more creative, data-driven, efficient, prepared, and productive.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Challenges, opportunities, and constraints

Principal team members need insights from vast amounts of unstructured data to serve their customers. This data includes manuals, communications, documents, and other content across various systems like SharePoint, OneNote, and the company’s intranet. The information exists in various formats such as Word documents, ASPX pages, PDFs, Excel spreadsheets, and PowerPoint presentations that were previously difficult to systematically search and analyze. Principal sought to develop natural language processing (NLP) and question-answering capabilities to accurately query and summarize this unstructured data at scale. This solution would allow for greater understanding of a wide range of employee questions by searching internal documentation for responses and suggesting answers—all through a user-friendly interface. The solution had to adhere to compliance, privacy, and ethics regulations and brand standards and use existing compliance-approved responses without additional summarization. It was important for Principal to maintain fine-grained access controls and make sure all data and sources remained secure within its environment.

Principal needed a solution that could be rapidly deployed without extensive custom coding. It also wanted a flexible platform that it could own and customize for the long term. As a leader in financial services, Principal wanted to make sure all data and responses adhered to strict risk management and responsible AI guidelines. This included preventing any data from leaving its source or being accessible to third parties.

The chatbot solution deployed by Principal had to address two use cases. The first use case, treated as a proof of concept, was to respond to customers’ request for proposal (RFP) inquiries. This first use case was chosen because the RFP process relies on reviewing multiple types of information to generate an accurate response based on the most up-to-date information, which can be time-consuming.

The second use case applied to Principal employees in charge of responding to customer inquiries using a vast well of SharePoint data. The extensive amount of data employees must search to find appropriate answers for customers made it difficult and time-consuming to navigate. It is estimated these employees collectively spent hundreds of hours each year searching for information. As the volume and complexity of customer requests grows, without a solution to enhance search capabilities, costs were projected to significantly rise.

Principal saw an opportunity for an internal generic AI assistant to allow employees to use AI in their daily work without risking exposure of sensitive information through any unapproved or unregulated external AI vendors.

The solution: Principal AI Generative Experience with QnABot

Principal began its development of an AI assistant by using the core question-answering capabilities in QnABot. Within QnABot, company subject matter experts authored hard-coded questions and answers using the QnABot editor. Principal also used the AWS open source repository Lex Web UI to build a frontend chat interface with Principal branding.

Initially, Principal relied on the built-in capabilities of QnABot, using Anthropic’s Claude on Amazon Bedrock for information summarization and retrieval. Upon the release of Amazon Q Business in preview, Principal integrated QnABot with Amazon Q Business to take advantage of its advanced response aggregation algorithms and more complete AI assistant features. Integration enhanced the solution by providing a more human-like interaction for end-users.

Principal implemented several measures to improve the security, governance, and performance of its conversational AI platform. By integrating QnABot with Azure Active Directory, Principal facilitated single sign-on capabilities and role-based access controls. This allowed fine-tuned management of user access to content and systems. Generative AI models (for example, Amazon Titan) hosted on Amazon Bedrock were used for query disambiguation and semantic matching for answer lookups and responses.

Usability and continual improvement were top priorities, and Principal enhanced the standard user feedback from QnABot to gain input from end-users on answer accuracy, outdated content, and relevance. This input made it straightforward for administrators and developers to identify and improve answer relevancy. Custom monitoring dashboards in Amazon OpenSearch Service provided real-time visibility into platform performance. Additional integrations with services like Amazon Data Firehose, AWS Glue, and Amazon Athena allowed for historical reporting, user activity analytics, and sentiment trends over time through Amazon QuickSight.

Adherence to responsible and ethical AI practices were a priority for Principal. The Principal AI Enablement team, which was building the generative AI experience, consulted with governance and security teams to make sure security and data privacy standards were met. Model monitoring of key NLP metrics was incorporated and controls were implemented to prevent unsafe, unethical, or off-topic responses. The flexible, scalable nature of AWS services makes it straightforward to continually refine the platform through improvements to the machine learning models and addition of new features.

The initial proof of concept was deployed in a preproduction environment within 3 months. The first data source connected was an Amazon Simple Storage Service (Amazon S3) bucket, where a 100-page RFP manual was uploaded for natural language querying by users. The data source allowed accurate results to be returned based on indexed content.

The first large-scale use case directly interfaced with SharePoint data, indexing over 8,000 pages. The Principal team partnered with Amazon Q Business data connector developers to implement improvements to the SharePoint connector. Improvements included the ability to index pages in SharePoint lists and add data security features. The use case was piloted with 10 users during 1 month, while working to onboard an additional 300 users over the next 3 months.

During the initial pilot, the Principal AI Enablement team worked with business users to gather feedback. The first round of testers needed more training on fine-tuning the prompts to improve returned results. The enablement team took this feedback and partnered with training and development teams to design learning plans to help new users more quickly gain proficiency with the AI assistant. The goal was to onboard future users faster through improved guidance on how to properly frame questions for the assistant and additional coaching resources for those who needed more guidance to learn the system.

Some users lacked access to corporate data, but they used the platform as a generative AI chatbot to securely attach internal-use documentation (also called initial generic entitlement) and query it in real time or to ask questions of the model’s foundational knowledge without risk of data leaving the tenant. Queries from users were also analyzed to identify beneficial future features to implement.

The following diagram illustrates the Principal generative AI chatbot architecture with AWS services.

Principal-AWS-GenAI-Architecture
Principal started by deploying QnABot, which draws on numerous services including Amazon Bedrock, Amazon Q Business, QuickSight, and others. All AWS services are high-performing, secure, scalable, and purpose-built. AWS services are designed to meet your specific industry, cross-industry, and technology use cases and are developed, maintained, and supported by AWS. AWS solutions (for example, QnABot) bring together AWS services into preconfigured deployable products, with architecture diagrams and implementation guides. Developed, maintained, and supported by AWS, AWS solutions simplify the deployment of optimized infrastructure tailored to meet customer use cases.

Principal strategically worked with the Amazon Q Business and QnABot teams to test and improve the Amazon Q Business conversational AI platform. The QnABot team worked closely with the Principal AI Enablement team on the implementation of QnABot, helping to define and build out capabilities to meet the incoming use cases. As an early adopter of Amazon Q Business, engineers from Principal worked directly with the Amazon Q Business team to validate updates and new features. When Amazon Q Business became generally available, Principal collaborated with the team to implement the integration of AWS IAM Identity Center, helping to define the process for IAM Identity Center implementation and software development kit (SDK) integration. The results of the IAM Identity Center integration were contributed back to the QnABot Amazon Q Business plugin repository so other customers could benefit from this work.

Results

The initial proof of concept was highly successful in driving efficiencies for users. It achieved an estimated 50% reduction in time required for users to respond to client inquiries and requests for proposals. This reduction in time stemmed from the platform’s ability to search for and summarize the data needed to quickly and accurately respond to inquiries. This early success demonstrated the solution’s effectiveness, generating excitement within the organization to broaden use cases.

The initial generic entitlement option allowed users to attach files to their chat sessions and dynamically query content. This option proved popular because of large productivity gains across various roles, including project management, enterprise architecture, communications, and education. Users interacting with the application in their daily work have received it well. Some users have reported up to a 50% reduction in the time spent conducting rote work. Removing the time spent on routine work allows employees to focus on human judgement-based and strategic decisions.

The platform has delivered strong results across several key metrics. Over 95% of queries received answers users accepted or built upon, with only 4% of answers receiving negative feedback. For queries earning negative feedback, less than 1% involved answers or documentation deemed irrelevant to the original question. Over 99% of documents provided through the system were evaluated as relevant and containing up-to-date information. 56% of total queries were addressed through either sourcing documentation related to the question or having the user attach a relevant file through the chat interface. The remaining queries were answered based on foundational knowledge built into the platform or from the current session’s chat history. These results indicate that users benefit from both Retrieval Augmented Generation (RAG) functionality and Amazon Q Business foundational knowledge, which provide helpful responses based on past experiences.

The positive feedback validates the application’s ability to deliver timely, accurate information to users, optimizing processes and empowering employees with data-driven insights. Metrics indicate a high level of success in delivering the right information, reducing time spent on client inquiries and tasks and producing significant savings in hours and dollars. The platform effectively and quickly resolves issues by surfacing relevant information faster than manual searches, improving processes, productivity, and the customer experience. As usage expands, it is expected that the benefits will multiply for both users and stakeholders. This initial proof of concept provides a strong foundation for continued optimization and value, with potential expansion to maximize benefits for Principal employees.

Roadmap

The Principal AI Enablement team has an ambitious roadmap for 2024 focused on expanding the capabilities of its conversational AI platform. There is a commitment to scale and accelerate development of generative AI technology to meet the growing needs of the enterprise.

Numerous use cases are currently in development by the AI Enablement team. Many future use cases are expected to use the Principal AI Generative Experience application due to its success in automating processes and empowering users with self-service insights. As adoption increases, ongoing feature additions will further strengthen the platform’s value.

At Principal, the roadmap indicates a commitment to delivering continual innovation. Innovation will drive further optimization of operations and workflows. It could also create new opportunities to enhance customer and employee experiences through advanced AI applications.

Principal is well positioned to build upon early successes by fulfilling its vision. The roadmap provides a strategic framework to maximize the platform’s business impact and differentiate solutions in the years ahead.

Conclusion

Principal utilized QnABot on AWS paired with Amazon Q Business and Amazon Bedrock, to deliver a generative AI experience for its users, thereby reducing manual time spent on client inquiries and tasks and producing significant savings in hours and dollars. Using generative AI, Principal’s employees can now focus on deeper human judgment based decisioning, instead of time spent scouring for answers from data sources manually. Get started with QnABot on AWS, Amazon Q Business and Amazon Bedrock.


About the Authors

Ajay Swamy is the Global Product Leader for Data, AIML and Generative AI AWS Solutions. He specializes in building AWS Solutions (production-ready software packages) that deliver compelling value to customers by solving for their unique business needs. Other than QnABot on AWS, he manages Generative AI Application BuilderEnhanced Document UnderstandingDiscovering Hot Topics using Machine Learning and other AWS Solutions. He lives with his wife (Tina) and dog (Figaro), in New York, NY.

Dr. Nicki Susman is a Senior Machine Learning Engineer and the Technical Lead of the Principal AI Enablement team. She has extensive experience in data and analytics, application development, infrastructure engineering, and DevSecOps.

Joel Elscott is a Senior Data Engineer on the Principal AI Enablement team. He has over 20 years of software development experience in the financial services industry, specializing in ML/AI application development and cloud data architecture. Joel lives in Des Moines, Iowa, with his wife and five children, and is also a group fitness instructor.

Bob StrahanBob Strahan is a Principal Solutions Architect in the AWS Generative AI Innovation Center team.

Austin Johnson is a Solutions Architect, maintaining the Lex Web UI open source library.


The subject matter in this communication is educational only and provided with the understanding that Principal is not endorsing, or necessarily recommending use of artificial intelligence. You should consult with appropriate counsel, compliance, and information security for your business needs.

Insurance products and plan administrative services provided through Principal Life Insurance Company, a member of the Principal Financial Group, Des Moines, IA 50392.
2024, Principal Financial Services, Inc.
3778998-082024

Read More

Governing ML lifecycle at scale: Best practices to set up cost and usage visibility of ML workloads in multi-account environments

Governing ML lifecycle at scale: Best practices to set up cost and usage visibility of ML workloads in multi-account environments

Cloud costs can significantly impact your business operations. Gaining real-time visibility into infrastructure expenses, usage patterns, and cost drivers is essential. This insight enables agile decision-making, optimized scalability, and maximizes the value derived from cloud investments, providing cost-effective and efficient cloud utilization for your organization’s future growth. What makes cost visibility even more important for the cloud is that cloud usage is dynamic. This requires continuous cost reporting and monitoring to make sure costs don’t exceed expectations and you only pay for the usage you need. Additionally, you can measure the value the cloud delivers to your organization by quantifying the associated cloud costs.

For a multi-account environment, you can track costs at an AWS account level to associate expenses. However, to allocate costs to cloud resources, a tagging strategy is essential. A combination of an AWS account and tags provides the best results. Implementing a cost allocation strategy early is critical for managing your expenses and future optimization activities that will reduce your spend.

This post outlines steps you can take to implement a comprehensive tagging governance strategy across accounts, using AWS tools and services that provide visibility and control. By setting up automated policy enforcement and checks, you can achieve cost optimization across your machine learning (ML) environment.

Implement a tagging strategy

A tag is a label you assign to an AWS resource. Tags consist of a customer-defined key and an optional value to help manage, search for, and filter resources. Tag keys and values are case sensitive. A tag value (for example, Production) is also case sensitive, like the keys.

It’s important to define a tagging strategy for your resources as soon as possible when establishing your cloud foundation. Tagging is an effective scaling mechanism for implementing cloud management and governance strategies. When defining your tagging strategy, you need to determine the right tags that will gather all the necessary information in your environment. You can remove tags when they’re no longer needed and apply new tags whenever required.

Categories for designing tags

Some of the common categories used for designing tags are as follows:

  • Cost allocation tags – These help track costs by different attributes like department, environment, or application. This allows reporting and filtering costs in billing consoles based on tags.
  • Automation tags – These are used during resource creation or management workflows. For example, tagging resources with their environment allows automating tasks like stopping non-production instances after hours.
  • Access control tags – These enable restricting access and permissions based on tags. AWS Identity and Access Management (IAM) roles and policies can reference tags to control which users or services can access specific tagged resources.
  • Technical tags – These provide metadata about resources. For example, tags like environment or owner help identify technical attributes. The AWS reserved prefix aws: tags provide additional metadata tracked by AWS.
  • Compliance tags – These may be needed to adhere to regulatory requirements, such as tagging with classification levels or whether data is encrypted or not.
  • Business tags – These represent business-related attributes, not technical metadata, such as cost centers, business lines, and products. This helps track spending for cost allocation purposes.

A tagging strategy also defines a standardized convention and implementation of tags across all resource types.

When defining tags, use the following conventions:

  • Use all lowercase for consistency and to avoid confusion
  • Separate words with hyphens
  • Use a prefix to identify and separate AWS generated tags from third-party tool generated tags

Tagging dictionary

When defining a tagging dictionary, delineate between mandatory and discretionary tags. Mandatory tags help identify resources and their metadata, regardless of purpose. Discretionary tags are the tags that your tagging strategy defines, and they should be made available to assign to resources as needed. The following table provides examples of a tagging dictionary used for tagging ML resources.

Tag Type Tag Key Purpose Cost Allocation Mandatory
Workload anycompany:workload:application-id Identifies disparate resources that are related to a specific application Y Y
Workload anycompany:workload:environment Distinguishes between dev, test, and production Y Y
Financial anycompany:finance:owner Indicates who is responsible for the resource, for example SecurityLead, SecOps, Workload-1-Development-team Y Y
Financial anycompany:finance:business-unit Identifies the business unit the resource belongs to, for example Finance, Retail, Sales, DevOps, Shared Y Y
Financial anycompany:finance:cost-center Indicates cost allocation and tracking, for example 5045, Sales-5045, HR-2045 Y Y
Security anycompany:security:data-classification Indicates data confidentiality that the resource supports N Y
Automation anycompany:automation:encryption Indicates if the resource needs to store encrypted data N N
Workload anycompany:workload:name Identifies an individual resource N N
Workload anycompany:workload:cluster Identifies resources that share a common configuration or perform a specific function for the application N N
Workload anycompany:workload:version Distinguishes between different versions of a resource or application component N N
Operations anycompany:operations:backup Identifies if the resource needs to be backed up based on the type of workload and the data that it manages N N
Regulatory anycompany:regulatory:framework Requirements for compliance to specific standards and frameworks, for example NIST, HIPAA, or GDPR N N

You need to define what resources require tagging and implement mechanisms to enforce mandatory tags on all necessary resources. For multiple accounts, assign mandatory tags to each one, identifying its purpose and the owner responsible. Avoid personally identifiable information (PII) when labeling resources because tags remain unencrypted and visible.

Tagging ML workloads on AWS

When running ML workloads on AWS, primary costs are incurred from compute resources required, such as Amazon Elastic Compute Cloud (Amazon EC2) instances for hosting notebooks, running training jobs, or deploying hosted models. You also incur storage costs for datasets, notebooks, models, and so on stored in Amazon Simple Storage Service (Amazon S3).

A reference architecture for the ML platform with various AWS services is shown in the following diagram. This framework considers multiple personas and services to govern the ML lifecycle at scale. For more information about the reference architecture in detail, see Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker.

Machine Learning Platform Reference Architecture

The reference architecture includes a landing zone and multi-account landing zone accounts. These should be tagged to track costs for governance and shared services.

The key contributors towards recurring ML cost that should be tagged and tracked are as follows:

  • Amazon DataZone Amazon DataZone allows you to catalog, discover, govern, share, and analyze data across various AWS services. Tags can be added at an Amazon DataZone domain and used for organizing data assets, users, and projects. Usage of data is tracked through the data consumers, such as Amazon Athena, Amazon Redshift, or Amazon SageMaker.
  • AWS Lake Formation AWS Lake Formation helps manage data lakes and integrate them with other AWS analytics services. You can define metadata tags and assign them to resources like databases and tables. This identifies teams or cost centers responsible for those resources. Automating resource tags when creating databases or tables with the AWS Command Line Interface (AWS CLI) or SDKs provides consistent tagging. This enables accurate tracking of costs incurred by different teams.
  • Amazon SageMaker Amazon SageMaker uses a domain to provide access to an environment and resources. When a domain is created, tags are automatically generated with a DomainId key by SageMaker, and administrators can add a custom ProjectId Together, these tags can be used for project-level resource isolation. Tags on a SageMaker domain are automatically propagated to any SageMaker resources created in the domain.
  • Amazon SageMaker Feature Store Amazon SageMaker Feature Store allows you to tag your feature groups and search for feature groups using tags. You can add tags when creating a new feature group or edit the tags of an existing feature group.
  • Amazon SageMaker resources – When you tag SageMaker resources such as jobs or endpoints, you can track spending based on attributes like project, team, or environment. For example, you can specify tags when creating the SageMaker Estimator that launches a training job.

Using tags allows you to incur costs that align with business needs. Monitoring expenses this way gives insight into how budgets are consumed.

Enforce a tagging strategy

An effective tagging strategy uses mandatory tags and applies them consistently and programmatically across AWS resources. You can use both reactive and proactive approaches for governing tags in your AWS environment.

Proactive governance uses tools such as AWS CloudFormation, AWS Service Catalog, tag policies in AWS Organizations, or IAM resource-level permissions to make sure you apply mandatory tags consistently at resource creation. For example, you can use the CloudFormation Resource Tags property to apply tags to resource types. In Service Catalog, you can add tags that automatically apply when you launch the service.

Reactive governance is for finding resources that lack proper tags using tools such as the AWS Resource Groups tagging API, AWS Config rules, and custom scripts. To find resources manually, you can use Tag Editor and detailed billing reports.

Proactive governance

Proactive governance uses the following tools:

  • Service catalog – You can apply tags to all resources created when a product launches from the service catalog. The service catalog provides a TagOptions Use this to define the tag key-pairs to associate with the product.
  • CloudFormation Resource Tags – You can apply tags to resources using the AWS CloudFormation Resource Tags property. Tag only those resources that support tagging through AWS CloudFormation.
  • Tag policiesTag policies standardize tags across your organization’s account resources. Define tagging rules in a tag policy that apply when resources get tagged. For example, specify that a CostCenter tag attached to a resource must match the case and values the policy defines. Also specify that noncompliant tagging operations on some resources get enforced, preventing noncompliant requests from completing. The policy doesn’t evaluate untagged resources or undefined tags for compliance. Tag policies involve working with multiple AWS services:
    • To enable the tag policies feature, use AWS Organizations. You can create tag policies and then attach those policies to organization entities to put the tagging rules into effect.
    • Use AWS Resource Groups to find noncompliant tags on account resources. Correct the noncompliant tags in the AWS service where you created the resource.
  • Service Control Policies – You can restrict the creation of an AWS resource without proper tags. Use Service Control Policies (SCPs) to set guardrails around requests to create resources. SCPs allow you to enforce tagging policies on resource creation. To create an SCP, navigate to the AWS Organizations console, choose Policies in the navigation pane, then choose Service Control Policies.

Reactive governance

Reactive governance uses the following tools:

  • AWS Config rules – Check resources regularly for improper tagging. The AWS Config rule required-tags examines resources to make sure they contain specified tags. You should take action when resources lack necessary tags.
  • AWS Resource Groups tagging API – The AWS Resource Groups Tagging API lets you tag or untag resources. It also enables searching for resources in a specified AWS Region or account using tag-based filters. Additionally, you can search for existing tags in a Region or account, or find existing values for a key within a specific Region or account. To create a resource tag group, refer to Creating query-based groups in AWS Resource Groups.
  • Tag Editor – With Tag Editor, you build a query to find resources in one or more Regions that are available for tagging. To find resources to tag, see Finding resources to tag.

SageMaker tag propagation

Amazon SageMaker Studio provides a single, web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. SageMaker Studio automatically copies and assign tags to the SageMaker Studio notebooks created by the users, so you can track and categorize the cost of SageMaker Studio notebooks.

Amazon SageMaker Pipelines allows you to create end-to-end workflows for managing and deploying SageMaker jobs. Each pipeline is composed of a sequence of steps that transform data into a trained model. Tags can be applied to pipelines similarly to how they are used for other SageMaker resources. When a pipeline is run, its tags can potentially propagate to the underlying jobs launched as part of the pipeline steps.

When models are registered in Amazon SageMaker Model Registry, tags can be propagated from model packages to other related resources like endpoints. Model packages in the registry can be tagged when registering a model version. These tags become associated with the model package. Tags on model packages can potentially propagate to other resources that reference the model, such as endpoints created using the model.

Tag policy quotas

The number of policies that you can attach to an entity (root, OU, and account) is subject to quotas for AWS Organizations. See Quotas and service limits for AWS Organizations for the number of tags that you can attach.

Monitor resources

To achieve financial success and accelerate business value realization in the cloud, you need complete, near real-time visibility of cost and usage information to make informed decisions.

Cost organization

You can apply meaningful metadata to your AWS usage with AWS cost allocation tags. Use AWS Cost Categories to create rules that logically group cost and usage information by account, tags, service, charge type, or other categories. Access the metadata and groupings in services like AWS Cost Explorer, AWS Cost and Usage Reports, and AWS Budgets to trace costs and usage back to specific teams, projects, and business initiatives.

Cost visualization

You can view and analyze your AWS costs and usage over the past 13 months using Cost Explorer. You can also forecast your likely spending for the next 12 months and receive recommendations for Reserved Instance purchases that may reduce your costs. Using Cost Explorer enables you to identify areas needing further inquiry and to view trends to understand your costs. For more detailed cost and usage data, use AWS Data Exports to create exports of your billing and cost management data by selecting SQL columns and rows to filter the data you want to receive. Data exports get delivered on a recurring basis to your S3 bucket for you to use with your business intelligence (BI) or data analytics solutions.

You can use AWS Budgets to set custom budgets that track cost and usage for simple or complex use cases. AWS Budgets also lets you enable email or Amazon Simple Notification Service (Amazon SNS) notifications when actual or forecasted cost and usage exceed your set budget threshold. In addition, AWS Budgets integrates with Cost Explorer.

Cost allocation

Cost Explorer enables you to view and analyze your costs and usage data over time, up to 13 months, through the AWS Management Console. It provides premade views displaying quick information about your cost trends to help you customize views suiting your needs. You can apply various available filters to view specific costs. Also, you can save any view as a report.

Monitoring in a multi-account setup

SageMaker supports cross-account lineage tracking. This allows you to associate and query lineage entities, like models and training jobs, owned by different accounts. It helps you track related resources and costs across accounts. Use the AWS Cost and Usage Report to track costs for SageMaker and other services across accounts. The report aggregates usage and costs based on tags, resources, and more so you can analyze spending per team, project, or other criteria spanning multiple accounts.

Cost Explorer allows you to visualize and analyze SageMaker costs from different accounts. You can filter costs by tags, resources, or other dimensions. You can also export the data to third-party BI tools for customized reporting.

Conclusion

In this post, we discussed how to implement a comprehensive tagging strategy to track costs for ML workloads across multiple accounts. We discussed implementing tagging best practices by logically grouping resources and tracking costs by dimensions like environment, application, team, and more. We also looked at enforcing the tagging strategy using proactive and reactive approaches. Additionally, we explored the capabilities within SageMaker to apply tags. Lastly, we examined approaches to provide visibility of cost and usage for your ML workloads.

For more information about how to govern your ML lifecycle, see Part 1 and Part 2 of this series.


About the authors

Gunjan JainGunjan Jain, an AWS Solutions Architect based in Southern California, specializes in guiding large financial services companies through their cloud transformation journeys. He expertly facilitates cloud adoption, optimization, and implementation of Well-Architected best practices. Gunjan’s professional focus extends to machine learning and cloud resilience, areas where he demonstrates particular enthusiasm. Outside of his professional commitments, he finds balance by spending time in nature.

Ram Vittal is a Principal Generative AI Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure, reliable and scalable GenAI/ML systems to help enterprise customers improve their business outcomes. In his spare time, he rides motorcycle and enjoys walking with his dogs!

Read More

Automate invoice processing with Streamlit and Amazon Bedrock

Automate invoice processing with Streamlit and Amazon Bedrock

Invoice processing is a critical yet often cumbersome task for businesses of all sizes, especially for large enterprises dealing with invoices from multiple vendors with varying formats. The sheer volume of data, coupled with the need for accuracy and efficiency, can make invoice processing a significant challenge. Invoices can vary widely in format, structure, and content, making efficient processing at scale difficult. Traditional methods relying on manual data entry or custom scripts for each vendor’s format can not only lead to inefficiencies, but can also increase the potential for errors, resulting in financial discrepancies, operational bottlenecks, and backlogs.

To extract key details such as invoice numbers, dates, and amounts, we use Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

In this post, we provide a step-by-step guide with the building blocks needed for creating a Streamlit application to process and review invoices from multiple vendors. Streamlit is an open source framework for data scientists to efficiently create interactive web-based data applications in pure Python. We use Anthropic’s Claude 3 Sonnet model in Amazon Bedrock and Streamlit for building the application front-end.

Solution overview

This solution uses the Amazon Bedrock Knowledge Bases chat with document feature to analyze and extract key details from your invoices, without needing a knowledge base. The results are shown in a Streamlit app, with the invoices and extracted information displayed side-by-side for quick review. Importantly, your document and data are not stored after processing.

The storage layer uses Amazon Simple Storage Service (Amazon S3) to hold the invoices that business users upload. After uploading, you can set up a regular batch job to process these invoices, extract key information, and save the results in a JSON file. In this post, we save the data in JSON format, but you can also choose to store it in your preferred SQL or NoSQL database.

The application layer uses Streamlit to display the PDF invoices alongside the extracted data from Amazon Bedrock. For simplicity, we deploy the app locally, but you can also run it on Amazon SageMaker Studio, Amazon Elastic Compute Cloud (Amazon EC2), or Amazon Elastic Container Service (Amazon ECS) if needed.

Prerequisites

To perform this solution, complete the following:

Install dependencies and clone the example

To get started, install the necessary packages on your local machine or on an EC2 instance. If you’re new to Amazon EC2, refer to the Amazon EC2 User Guide. This tutorial we will use the local machine for project setup.

To install dependencies and clone the example, follow these steps:

  1. Clone the repository into a local folder:
    git clone https://github.com/aws-samples/genai-invoice-processor.git

  2. Install Python dependencies
    • Navigate to the project directory:
      cd </path/to/your/folder>/genai-invoice-processor

    • Upgrade pip
      python3 -m pip install --upgrade pip

    • (Optional) Create a virtual environment isolate dependencies:
      python3 -m venv venv

    • Activate the virtual environment:
      1. Mac/Linux:
        source venv/bin/activate

      2. Windows:
        venv/Scripts/activate

  3. In the cloned directory, invoke the following to install the necessary Python packages:
    pip install -r requirements.txt

    This will install the necessary packages, including Boto3 (AWS SDK for Python), Streamlit, and other dependencies.

  4. Update the region in the config.yaml file to the same Region set for your AWS CLI where Amazon Bedrock and Anthropic’s Claude 3 Sonnet model are available.

After completing these steps, the invoice processor code will be set up in your local environment and will be ready for the next stages to process invoices using Amazon Bedrock.

Process invoices using Amazon Bedrock

Now that the environment setup is done, you’re ready to start processing invoices and deploying the Streamlit app. To process invoices using Amazon Bedrock, follow these steps:

Store invoices in Amazon S3

Store invoices from different vendors in an S3 bucket. You can upload them directly using the console, API, or as part of your regular business process. Follow these steps to upload using the CLI:

  1. Create an S3 bucket:
    aws s3 mb s3://<your-bucket-name> --region <your-region>

    Replace your-bucket-name with the name of the bucket you created and your-region with the Region set for your AWS CLI and in config.yaml (for example, us-east-1)

  2. Upload invoices to S3 bucket. Use one of the following commands to upload the invoice to S3.
    • To upload invoices to the root of the bucket:
      aws s3 cp </path/to/your/folder> s3://<your-bucket-name>/ --recursive

    • To upload invoices to a specific folder (for example, invoices):
      aws s3 cp </path/to/your/folder> s3://<your-bucket-name>/<prefix>/ --recursive

    • Validate the upload:
      aws s3 ls s3://<your-bucket-name>/

Process invoices with Amazon Bedrock

In this section, you will process the invoices in Amazon S3 and store the results in a JSON file (processed_invoice_output.json). You will extract the key details from the invoices (such as invoice numbers, dates, and amounts) and generate summaries.

You can trigger the processing of these invoices using the AWS CLI or automate the process with an Amazon EventBridge rule or AWS Lambda trigger. For this walkthrough, we will use the AWS CLI to trigger the processing.

We packaged the processing logic in the Python script invoices_processor.py, which can be run as follows:

python invoices_processor.py --bucket_name=<your-bucket-name> --prefix=<your-folder>

The --prefix argument is optional. If omitted, all of the PDFs in the bucket will be processed. For example:

python invoices_processor.py --bucket_name=’gen_ai_demo_bucket’

or

python invoices_processor.py --bucket_name='gen_ai_demo_bucket' --prefix='invoice'

Use the solution

This section examines the invoices_processor.py code. You can chat with your document either on the Amazon Bedrock console or by using the Amazon Bedrock RetrieveAndGenerate API (SDK). In this tutorial, we use the API approach.

    1. Initialize the environment: The script imports the necessary libraries and initializes the Amazon Bedrock and Amazon S3 client.
      import boto3
      import os
      import json
      import shutil
      import argparse
      import time
      import datetime
      import yaml
      from typing import Dict, Any, Tuple
      from concurrent.futures import ThreadPoolExecutor, as_completed
      from threading import Lock
      from mypy_boto3_bedrock_runtime.client import BedrockRuntimeClient
      from mypy_boto3_s3.client import S3Client
      
      # Load configuration from YAML file
      def load_config():
          """
          Load and return the configuration from the 'config.yaml' file.
          """
          with open('config.yaml', 'r') as file:
              return yaml.safe_load(file)
      
      CONFIG = load_config()
      
      write_lock = Lock() # Lock for managing concurrent writes to the output file
      
      def initialize_aws_clients() -> Tuple[S3Client, BedrockRuntimeClient]:
          """
          Initialize and return AWS S3 and Bedrock clients.
      
          Returns:
              Tuple[S3Client, BedrockRuntimeClient]
          """
          return (
              boto3.client('s3', region_name=CONFIG['aws']['region_name']),
              boto3.client(service_name='bedrock-agent-runtime', 
                           region_name=CONFIG['aws']['region_name'])
          )

    2. Configure : The config.yaml file specifies the model ID, Region, prompts for entity extraction, and the output file location for processing.
      aws: 
          region_name: us-west-2 
          model_id: anthropic.claude-3-sonnet-20240229-v1:0
          prompts: 
              full: Extract data from attached invoice in key-value format. 
              structured: | 
                  Process the pdf invoice and list all metadata and values in json format for the variables with descriptions in <variables></variables> tags. The result should be returned as JSON as given in the <output></output> tags. 
      
                  <variables> 
                      Vendor: Name of the company or entity the invoice is from. 
                      InvoiceDate: Date the invoice was created.
                      DueDate: Date the invoice is due and needs to be paid by. 
                      CurrencyCode: Currency code for the invoice amount based on the symbol and vendor details.
                      TotalAmountDue: Total amount due for the invoice
                      Description: a concise summary of the invoice description within 20 words 
                  </variables> 
      
                  Format your analysis as a JSON object in following structure: 
                      <output> {
                      "Vendor": "<vendor name>", 
                      "InvoiceDate":"<DD-MM-YYYY>", 
                      "DueDate":"<DD-MM-YYYY>",
                      "CurrencyCode":"<Currency code based on the symbol and vendor details>", 
                      "TotalAmountDue":"<100.90>" # should be a decimal number in string 
                      "Description":"<Concise summary of the invoice description within 20 words>" 
                      } </output> 
                  Please proceed with the analysis based on the above instructions. Please don't state "Based on the .."
              summary: Process the pdf invoice and summarize the invoice under 3 lines 
      
      processing: 
          output_file: processed_invoice_output.json
          local_download_folder: invoices

    3. Set up API calls: The RetrieveAndGenerate API fetches the invoice from Amazon S3 and processes it using the FM. It takes several parameters, such as prompt, source type (S3), model ID, AWS Region, and S3 URI of the invoice.
      def retrieve_and_generate(bedrock_client: BedrockRuntimeClient, input_prompt: str, document_s3_uri: str) -> Dict[str, Any]: 
          """ 
          Use AWS Bedrock to retrieve and generate invoice data based on the provided prompt and S3 document URI.
      
          Args: 
              bedrock_client (BedrockRuntimeClient): AWS Bedrock client 
              input_prompt (str): Prompt for the AI model
              document_s3_uri (str): S3 URI of the invoice document 
      
          Returns: 
              Dict[str, Any]: Generated data from Bedrock 
          """ 
          model_arn = f'arn:aws:bedrock:{CONFIG["aws"]["region_name"]}::foundation-model/{CONFIG["aws"]["model_id"]}' 
          return bedrock_client.retrieve_and_generate( 
              input={'text': input_prompt}, retrieveAndGenerateConfiguration={ 
                  'type': 'EXTERNAL_SOURCES',
                  'externalSourcesConfiguration': { 
                      'modelArn': model_arn, 
                      'sources': [ 
                          { 
                              "sourceType": "S3", 
                              "s3Location": {"uri": document_s3_uri} 
                          }
                      ] 
                  } 
              } 
          )

    4. Batch processing: The batch_process_s3_bucket_invoices function batch process the invoices in parallel in the specified S3 bucket and writes the results to the output file (processed_invoice_output.json as specified by output_file in config.yaml). It relies on the process_invoice function, which calls the Amazon Bedrock RetrieveAndGenerate API for each invoice and prompt.
      def process_invoice(s3_client: S3Client, bedrock_client: BedrockRuntimeClient, bucket_name: str, pdf_file_key: str) -> Dict[str, str]: 
          """ 
          Process a single invoice by downloading it from S3 and using Bedrock to analyze it. 
      
          Args: 
              s3_client (S3Client): AWS S3 client 
              bedrock_client (BedrockRuntimeClient): AWS Bedrock client 
              bucket_name (str): Name of the S3 bucket
              pdf_file_key (str): S3 key of the PDF invoice 
      
          Returns: 
              Dict[str, Any]: Processed invoice data 
          """ 
          document_uri = f"s3://{bucket_name}/{pdf_file_key}"
          local_file_path = os.path.join(CONFIG['processing']['local_download_folder'], pdf_file_key) 
      
          # Ensure the local directory exists and download the invoice from S3
          os.makedirs(os.path.dirname(local_file_path), exist_ok=True) 
          s3_client.download_file(bucket_name, pdf_file_key, local_file_path) 
      
          # Process invoice with different prompts 
          results = {} 
          for prompt_name in ["full", "structured", "summary"]:
              response = retrieve_and_generate(bedrock_client, CONFIG['aws']['prompts'][prompt_name], document_uri)
              results[prompt_name] = response['output']['text']
      
          return results

      def batch_process_s3_bucket_invoices(s3_client: S3Client, bedrock_client: BedrockRuntimeClient, bucket_name: str, prefix: str = "") -> int: 
          """ 
          Batch process all invoices in an S3 bucket or a specific prefix within the bucket. 
      
          Args: 
              s3_client (S3Client): AWS S3 client 
              bedrock_client (BedrockRuntimeClient): AWS Bedrock client 
              bucket_name (str): Name of the S3 bucket 
              prefix (str, optional): S3 prefix to filter invoices. Defaults to "". 
      
          Returns: 
              int: Number of processed invoices 
          """ 
          # Clear and recreate local download folder
          shutil.rmtree(CONFIG['processing']['local_download_folder'], ignore_errors=True)
          os.makedirs(CONFIG['processing']['local_download_folder'], exist_ok=True) 
      
          # Prepare to iterate through all objects in the S3 bucket
          continuation_token = None # Pagination handling
          pdf_file_keys = [] 
      
          while True: 
              list_kwargs = {'Bucket': bucket_name, 'Prefix': prefix}
              if continuation_token:
                  list_kwargs['ContinuationToken'] = continuation_token 
      
              response = s3_client.list_objects_v2(**list_kwargs)
      
              for obj in response.get('Contents', []): 
                  pdf_file_key = obj['Key'] 
                  if pdf_file_key.lower().endswith('.pdf'): # Skip folders or non-PDF files
                      pdf_file_keys.append(pdf_file_key) 
      
              if not response.get('IsTruncated'): 
                  break 
                  continuation_token = response.get('NextContinuationToken') 
      
          # Process invoices in parallel 
          processed_count = 0 
          with ThreadPoolExecutor() as executor: 
              future_to_key = { 
                  executor.submit(process_invoice, s3_client, bedrock_client, bucket_name, pdf_file_key): pdf_file_key
                  for pdf_file_key in pdf_file_keys 
              } 
      
              for future in as_completed(future_to_key):
                  pdf_file_key = future_to_key[future] 
                  try: 
                      result = future.result() 
                      # Write result to the JSON output file as soon as it's available 
                      write_to_json_file(CONFIG['processing']['output_file'], {pdf_file_key: result}) 
                      processed_count += 1 
                      print(f"Processed file: s3://{bucket_name}/{pdf_file_key}") 
                  except Exception as e: 
                      print(f"Failed to process s3://{bucket_name}/{pdf_file_key}: {str(e)}") 
      
          return processed_count

    5. Post-processing: The extracted data in processed_invoice_output.json can be further structured or customized to suit your needs.

This approach allows invoice handling from multiple vendors, each with its own unique format and structure. By using large language models (LLMs), it extracts important details such as invoice numbers, dates, amounts, and vendor information without requiring custom scripts for each vendor format.

Run the Streamlit demo

Now that you have the components in place and the invoices processed using Amazon Bedrock, it’s time to deploy the Streamlit application. You can launch the app by invoking the following command:

streamlit run review-invoice-data.py

or

python -m streamlit run review-invoice-data.py

When the app is up, it will open in your default web browser. From there, you can review the invoices and the extracted data side-by-side. Use the Previous and Next arrows to seamlessly navigate through the processed invoices so you can interact with and analyze the results efficiently. The following screenshot shows the UI.

There are quotas for Amazon Bedrock (of which some are adjustable) that you need to consider when building at scale with Amazon Bedrock.

Cleanup

To clean up after running the demo, follow these steps:

  • Delete the S3 bucket containing your invoices using the command
    aws s3 rb s3://<your-bucket-name> --force

  • If you set up a virtual environment, deactivate it by invoking deactivate
  • Remove any local files created during the process, including the cloned repository and output files
  • If you used any AWS resources such as an EC2 instance, terminate them to avoid unnecessary charges

Conclusion

In this post, we walked through a step-by-step guide to automating invoice processing using Streamlit and Amazon Bedrock, addressing the challenge of handling invoices from multiple vendors with different formats. We showed how to set up the environment, process invoices stored in Amazon S3, and deploy a user-friendly Streamlit application to review and interact with the processed data.

If you are looking to further enhance this solution, consider integrating additional features or deploying the app on scalable AWS services such as Amazon SageMaker, Amazon EC2, or Amazon ECS. Due to this flexibility, your invoice processing solution can evolve with your business, providing long-term value and efficiency.

We encourage you to learn more by exploring Amazon Bedrock, Access Amazon Bedrock foundation models, RetrieveAndGenerate API, and Quotas for Amazon Bedrock and building a solution using the sample implementation provided in this post and a dataset relevant to your business. If you have questions or suggestions, leave a comment.


About the Authors

Deepika Kumar is a Solution Architect at AWS. She has over 13 years of experience in the technology industry and has helped enterprises and SaaS organizations build and securely deploy their workloads on the cloud securely. She is passionate about using Generative AI in a responsible manner whether that is driving product innovation, boost productivity or enhancing customer experiences.

Jobandeep Singh is an Associate Solution Architect at AWS specializing in Machine Learning. He supports customers across a wide range of industries to leverage AWS, driving innovation and efficiency in their operations. In his free time, he enjoys playing sports, with a particular love for hockey.

Ratan Kumar is a solutions architect based out of Auckland, New Zealand. He works with large enterprise customers helping them design and build secure, cost-effective, and reliable internet scale applications using the AWS cloud. He is passionate about technology and likes sharing knowledge through blog posts and twitch sessions.

Read More

Orca-AgentInstruct: Agentic flows can be effective synthetic-data generators

Orca-AgentInstruct: Agentic flows can be effective synthetic-data generators

Orca-3 blog - abstract wave graphic

Our work on Orca and Orca 2 demonstrated the power of using synthetic data for the post-training of small language models and getting them to levels of performance previously found only in much larger language models. Orca-AgentInstruct is another step in this direction, where we explore using agentic flows to generate diverse and high-quality data at scale. Orca-AgentInstruct is an agentic solution for synthetic-data generation. By leveraging an agentic framework, AgentInstruct can generate tailored datasets, comprising both prompts and responses, from raw data sources, paving the way to building a synthetic data factory for model fine-tuning.  

The efficacy of this approach is exemplified by the substantial improvement observed by fine-tuning a base Mistral 7-billion-parameter model and using AgentInstruct to generate a 25-million-pair dataset. The fine-tuned model (which we refer to as Orca-3-Mistral) showcases a notable performance gain across multiple benchmarks. For example, it shows 40% improvement on AGIEval, 19% improvement on MMLU, 54% improvement on GSM8K, 38% improvement on BBH, 45% improvement on AlpacaEval, and a 31.34% reduction of inaccurate or unreliable results across multiple summarization benchmarks.

We are making a 1-million-pair subset (orca-agentinstruct-1M) of this dataset publicly available, along with a report describing the data generation procedure, to encourage research on synthetic data generation and finetuning of language models. 

Bar graph comparing scores of the Mistral-Instruct-7B model and the Mistral-7B post-trained AgentInstruct data (Orca-3). The benchmarks are AGIEval, MMLU, BBH, GSM8K, AlpaceEval, FOFO and Mirage-RAG. The graph shows substantial improvement across different benchmarks for the model fine-tuned with AgentInstruct data.
Figure 1: Effect of using AgentInstruct for post-training Mistral-7B. 
The figure shows the three flows used in AgentInstruct: 1) Content Transformation Flow converts the raw seed into an intermediate representation that simplifies the creation of instructions tailored to specific objectives. 2) Seed Instruction Generation Flow, comprising multiple agents, takes as input the transformed seed from the Content Transformation Flow and generates a set of diverse instructions. 3) Instruction Refinement Flow takes as input the instructions from the Seed Instruction Flow and iteratively enhances their complexity and quality.
Figure 2. This figure provides a thematic overview of the roles played by different groups of agents. Content Transformation Flow converts the seed into an intermediate representation that makes it easier to create high-quality and diverse data. Seed Instruction Generation Flow creates instances of the target tasks following a taxonomy. Instruction Refinement Flow explores the space further by starting from these initial data points and exploring the neighborhood. The expectation is that by picking a random seed we will be able to cover the entire region of data points. 

Synthetic Data Accelerated LLM Development: Over the past year, using synthetic data has greatly advanced the training of large language models (LLMs). It sped up model training at all stages, from pre-training (e.g., Phi-3) to instruction-tuning (e.g., Orca and WizardLM) and reinforcement learning from human feedback (e.g., Direct Nash Optimization). 

Generating high-quality synthetic data is hard: On the other hand, research indicates that pre-training models on synthetic data produced by other models can result in model collapse, causing models to progressively degrade. Similar concerns have been raised regarding the use of synthetic data for post-training, suggesting that it might lead to an imitation process where the trained model learns only stylistic features rather than actual capabilities. 

This discrepancy may be attributed to the challenge of generating high-quality and diverse synthetic data.  Successful use of synthetic data involves significant human effort in curating and filtering the data to ensure high quality. 

Synthetic data meets agents: Another major development we witnessed during the past year is the rise of agentic (especially multi-agent) workflows, such as with AutoGen. Agentic workflows can generate high-quality data, which surpasses the capabilities of the underlying LLMs, by using flows with reflection and iteration that enable agents to look back at solutions, generate critiques, and improve solutions. They can also use tools like search APIs, calculators, and code interpreters to address LLM limitations. 

Multi-agent workflows bring in additional benefits as well, such as simulating scenarios where we can generate both new prompts and the corresponding responses. They also enable automation of data-generation workflows, reducing or eliminating the need for unnecessary human intervention on some tasks. 

AgentInstruct: Generating synthetic data for post-training or finetuning often relies on an existing prompt set that is either used as is or as seeds for generating more instructions. In this work, we generalize the problem settings to a broader objective of generating an abundant amount of diverse, challenging, and high-quality data to teach a particular skill to an AI model. We refer to this setting as generative teaching.   

AgentInstruct is an agentic solution for generative teaching. AgentInstruct uses raw documents as input to create demonstration and feedback data. When generic data is used as seeds, AgentInstruct can be used to teach an LLM a general capability, such as writing, reasoning, or retrieval-augmented generation (RAG). Domain specific data, like retail or finance, can also be used as seeds to improve the model in a certain specialization. AgentInstruct can create: 

  1. High-quality data: AgentInstruct uses GPT-4, coupled with tools like search and code interpreters, to create high-quality data.  
  2. Diverse data: AgentInstruct creates prompts and responses using a set of specialized agents (with powerful LLMs, tools, and reflection flows) and a taxonomy (of more than 100 subcategories), , ensuring diversity and quality.
  3. Large quantities of data: AgentInstruct can run autonomously. and applyiflows for verification and data filtering. It does not require seed prompts and uses raw documents for seeding. 

Using raw data as seeds offers two advantages: it is plentiful, allowing AgentInstruct to generate large-scale and diverse datasets, and it encourages learning general skills instead of benchmark-specific ones by avoiding using existing prompts.

Microsoft Research Blog

Microsoft Research Forum Episode 3: Globally inclusive and equitable AI, new use cases for AI, and more

In the latest episode of Microsoft Research Forum, researchers explored the importance of globally inclusive and equitable AI, shared updates on AutoGen and MatterGen, presented novel use cases for AI, including industrial applications and the potential of multimodal models to improve assistive technologies.


We anticipate agentic flows becoming increasingly important throughout the model-training lifecycle, including pre-training, post-training, and specialization, and ultimately enabling the creation of a synthetic data factory for model customization and continuous improvement. This has the potential to drive AI advances across multiple industries by making high-quality model training more efficient and accessible. 

Contributors:

Arindam Mitra, Luciano Del Corro, Guoqing Zheng, Shweti Mahajan, Dany Rouhana, Andres Codas, Yadong Lu, Wei-ge Chen, Olga Vrousgou, Corby Rosset, Fillipe Silva, Hamed Khanpour, Yash Lara, and Ahmed Awadallah

The post Orca-AgentInstruct: Agentic flows can be effective synthetic-data generators appeared first on Microsoft Research.

Read More

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

Centralize model governance with SageMaker Model Registry Resource Access Manager sharing

We recently announced the general availability of cross-account sharing of Amazon SageMaker Model Registry using AWS Resource Access Manager (AWS RAM), making it easier to securely share and discover machine learning (ML) models across your AWS accounts.

Customers find it challenging to share and access ML models across AWS accounts because they have to set up complex AWS Identity and Access Management (IAM) policies and create custom integrations. With this launch, customers can now seamlessly share and access ML models registered in SageMaker Model Registry between different AWS accounts.

Customers can use the SageMaker Studio UI or APIs to specify the SageMaker Model Registry model to be shared and grant access to specific AWS accounts or to everyone in the organization. Authorized users can then quickly discover and use those shared models in their own AWS accounts. This streamlines the ML workflows, enables better visibility and governance, and accelerates the adoption of ML models across the organization.

In this post, we will show you how to use this new cross-account model sharing feature to build your own centralized model governance capability, which is often needed for centralized model approval, deployment, auditing, and monitoring workflows. Before we dive into the details of the architecture for sharing models, let’s review what use case and model governance is and why it’s needed.

Use case governance is essential to help ensure that AI systems are developed and used in ways that respect values, rights, and regulations. According to the EU AI Act, use case governance refers to the process of overseeing and managing the development, deployment, and use of AI systems in specific contexts or applications. This includes:

  • Risk assessment: Identifying and evaluating potential risks associated with AI systems.
  • Mitigation strategies: Implementing measures to minimize or eliminate risks.
  • Transparency and explainability: Making sure that AI systems are transparent, explainable, and accountable.
  • Human oversight: Including human involvement in AI decision-making processes.
  • Monitoring and evaluation: Continuously monitoring and evaluating AI systems to help ensure compliance with regulations and ethical standards.

Model governance involves overseeing the development, deployment, and maintenance of ML models to help ensure that they meet business objectives and are accurate, fair, and compliant with regulations. It includes processes for monitoring model performance, managing risks, ensuring data quality, and maintaining transparency and accountability throughout the model’s lifecycle. In AWS, these model lifecycle activities can be performed over multiple AWS accounts (for example, development, test, and production accounts) at the use case or business unit level. However, model governance functions in an organization are centralized and to perform those functions, teams need access to metadata about model lifecycle activities across those accounts for validation, approval, auditing, and monitoring to manage risk and compliance.

Use case and model governance plays a crucial role in implementing responsible AI and helps with the reliability, fairness, compliance, and risk management of ML models across use cases in the organization. It helps prevent biases, manage risks, protect against misuse, and maintain transparency. By establishing robust oversight, organizations can build trust, meet regulatory requirements, and help ensure ethical use of AI technologies.

Use case and model lifecycle governance overview

In the context of regulations such as the European Union’s Artificial Intelligence Act (EU AI Act), a use case refers to a specific application or scenario where AI is used to achieve a particular goal or solve a problem. The EU AI Act proposes to regulate AI systems based on their intended use cases, which are categorized into four levels of risk:

  1. Unacceptable risk: Significant threat to safety, livelihoods, or rights
  2. High risk: Significant impacts on lives (for example, use of AI in healthcare and transportation)
  3. Limited risk: Minimal impacts (for example, chatbots and virtual assistants)
  4. Minimal risk: Negligible risks (for example, entertainment and gaming)

An AI system is built to satisfy a use case such as credit risk, which can be comprised of workflows orchestrated with one or more ML models—such as credit risk and fraud detection models. You can build a use case (or AI system) using existing models, newly built models, or combination of both. Regardless of how the AI system is built, governance will be applied at the AI system level where use case decisions (for example, denying a loan application) are being made. However, explaining why that decision was made requires next-level detailed reports from each affected model component of that AI system. Therefore, governance applies both at the use case and model level and is driven by each of their lifecycle stages.

Use case lifecycle stages

A use case has its own set of lifecycle stages from development through deployment to production, shown in the following figure. A use case typically starts with an experimentation or proof-of-concept (POC) stage where the idea is explored for feasibility. When the use case is determined to be feasible, it’s approved and moves to the next stage for development. The use case is then developed using various components including ML models and unit testing, and then moved to the next stage—quality assurance (QA)—after approval. Next, the use case is tested, validated, and approved to be moved to the pre-production stage where it’s A/B tested with production-like settings and approved for the next stage. Now, the use case is deployed and operational in production. When the use case is no longer needed for business, it’s retired and decommissioned. Even though these stages are depicted as linear in the diagram, they are frequently iterative.

Model lifecycle stages

When an ML model is developed it goes through a similar set of lifecycle stages as a use case. In the case of an ML model, shown in the following figure, the lifecycle starts with the development or candidate model. Prior to that stage, there would be several experiments performed to build the candidate model. From a governance perspective, tracking starts from the candidate or dev model stage. After approval in dev, the model moves into the QA stage where it’s validated and integration tested to make sure that it meets the use case requirements and then is approved for promotion to the next stage. The model is then A/B tested along with the use case in pre-production with production-like data settings and approved for deployment to the next stage. The model is finally deployed to production. When the model is no longer needed, it’s retired and removed from deployed endpoints.

Stage status types

In the preceding use case and model stages discussion, we mentioned approving the model to go to the next stage. However, there are two other possible states—pending and rejected, as depicted in the following figure. These stages are applicable to both use case and model stages. For example, a use case that’s been moved from the QA stage to pre-production could be rejected and sent back to the development stage for rework because of missing documentation related to meeting certain regulatory controls.

Multi-account architecture for sharing models

A multi-account strategy improves security, scalability, and reliability of your systems. It also helps achieve data, project, and team isolation while supporting software development lifecycle best practices. Cross-account model sharing supports a multi-account strategy, removing the overhead of assuming roles into multiple accounts. Furthermore, sharing model resources directly across multiple accounts helps improve ML model approval, deployment, and auditing.

The following diagram depicts an architecture for centralizing model governance using AWS RAM for sharing models using a SageMaker Model Group, a core construct within SageMaker Model Registry where you register your model version.

Figure 1:  Centralizing Model Governance using AWS RAM Share

Figure 1:  Centralizing Model Governance using AWS RAM Share

In the architecture presented in the preceding figure, the use case stakeholder, data scientist (DS) and ML engineer (MLE) perform the following steps:

  1. The use case stakeholder, that is the DS team lead, receives the request to build an AI use case such as credit risk from their line of business lead.
    • The DS team lead records the credit risk use case in the POC stage in the stage governance table.
    • The MLE is notified to set up a model group for new model development. The MLE creates the necessary infrastructure pipeline to set up a new model group.
  2. The MLE sets up the pipeline to share the model group with the necessary permissions (create and update the model version) to the ML project team’s development account. Optionally, this model group can also be shared with their test and production accounts if local account access to model versions is needed.
  3. The DS uses SageMaker Training jobs to generate metrics captured by , selects a candidate model, and registers the model version inside the shared model group in their local model registry.
  4. Because this is a shared model group, the actual model version will be recorded in the shared services account model registry and a link will be maintained in the development account. The Amazon S3 model artifacts associated to the model will be copied to the shared services account when the model is registered in the shared services model registry.
  5. The model group and associated model version will be synced into the model stage governance Amazon DynamoDB table with attributes such as model group, model version, model stage (development, test, production, and so on), model status (pending, approved, or rejected), and model metrics (in JSON format). The ML admin sets up this table with the necessary attributes based on their central governance requirements.
  6. The model version is approved for deployment into the test stage and is deployed into the test account along with necessary infrastructure for invoking the model, such as an Amazon API gateway and AWS Lambda
  7. Model is integration tested in the test environment and model test metrics are updated in the model stage governance table
  8. Model test results are validated, and the model version is approved for deployment into the production stage and is deployed into the production account along with the necessary infrastructure for invoking the model such as an API gateway and Lambda functions.
  9. The model is A/B tested or optionally shadow tested in the production environment and model production metrics are updated in the model stage governance table. When satisfactory production results are attained, the model version is rolled out in the production environment.
  10. The model governance (compliance) officer uses the governance dashboard to act on model governance functions such as reviewing the model to validate compliance and monitoring for risk mitigation.

Building a central model registry using model group resource sharing

Model group resource sharing makes it possible to build a central model registry with few clicks or API calls without needing to write complex IAM policies. We will demonstrate how to set up a central model registry based on the architecture we described in the previous sections. We will start by using the SageMaker Studio UI and then by using APIs. In both cases, we will demonstrate how to create a model package group in the ML Shared Services account (Account A) and share it with the ML Dev account (Account B) so that any updates to model versions in Account B automatically update the corresponding model versions in Account A.

Prerequisites

You need to have the following prerequisites in place to implement the solution in this post.

After you have the prerequisites set up, start by creating and sharing a model group across accounts. The basic steps are:

  1. In Account A, create a model group.
  2. In Account A, create a resource share for the model group, and then attach permissions and specify the target account to share the resource. Permissions can be standard or custom.
  3. Account B should accept the resource sharing invitation to start using the shared resource from Account A.
  4. Optionally, if Account A and Account B are part of the same AWS Organizations, and the resource sharing is enabled within AWS Organizations, then the resource sharing invitation are auto accepted without any manual intervention.

Create and share a model group across accounts using SageMaker Studio

The following section shows how to use SageMaker Studio to share models in a multi-account environment to build a central model registry. The following are instructions for using the AWS Management Console for SageMaker Studio to create a model package group in the shared services account, adding the necessary permissions, with the ML Dev account.

To use the console to create and share a model package:

  1. In the SageMaker Studio console, sign in to Account A and navigate to the model registry, select the model package group (in this example, the credit-risk-package-group-1724904598), and choose Share.
  2. In Account A, select the appropriate permissions to share the model package group with Account B. If you need to allow custom policy, navigate to the AWS RAM console and create the policy.
  3. After selecting the permission policy, specify Account B (and any other accounts) to share the resource, then choose Share.
  4. In Account B, navigate to the model registry, choose Shared with me, and then choose View pending approvals to see the model shared from Account A.
  5. Accept the model invitation from Account A to access the shared model package group and its versions. When accounts are set up in the same organization, invitations will be accepted without requiring user intervention.

Create and share the model group across accounts using APIs

The following section shows how to use APIs to share models in a multi-account environment to build a central model registry. Create a model package group in the ML Shared Services account (Account A) and share it with the ML Dev account (Account B).

Following are the steps completed by using APIs to create and share a model package group across accounts.

  1. In Account A, create a model package group.
  2. In Account A, if needed, create custom sharing permissions; otherwise use standard sharing permissions.
  3. In Account A, create a resource share for the model package group, attach permissions, and specify the target account to share the resource.
  4. In Account B, accept the resource sharing invitation to start using the resource.
  5. If Account A and B are part of the same organization, then the resource sharing invitation can be accepted without any manual intervention.

Run the following code in the ML Shared Services account (Account A).

import json
import time
import os
import boto3

region = boto3.Session().region_name

sm_client = boto3.client('sagemaker', region_name=region)

# Replace model package group name as per use case
model_package_group_name = "model-group-" + str(round(time.time()))
model_package_group_input_dict = {
 "ModelPackageGroupName" : model_package_group_name,
 "ModelPackageGroupDescription" : "Sample model package group"
}

# Create Model package group with Sagemaker client
create_model_package_group_response = sm_client.create_model_package_group(**model_package_group_input_dict)
model_package_group_arn = create_model_package_group_response['ModelPackageGroupArn']
print('ModelPackageGroup Arn : {}'.format(model_package_group_arn))

ram_client = boto3.client('ram')

# # Use this code path to create custom permission
# # Custom permission template resource policy string
# policy_template = '{nt"Effect": "Allow",nt"Action": [ntt"sagemaker:DescribeModelPackageGroup"nt]n}'
# permission = ram_client.create_permission(
#     name = "custom-permission" + str(round(time.time())),
#     resourceType = "sagemaker:ModelPackageGroup",
#     policyTemplate = policy_template
# )
# print('Created Permission: {}'.format(permission['permission']['arn']))
# permission = permission['permission']['arn']


# Use this code path to use managed Permission
# It can be one of:
# 1. arn:aws:ram::aws:permission/AWSRAMDefaultPermissionSageMakerModelPackageGroup
# 2. arn:aws:ram::aws:permission/AWSRAMPermissionSageMakerModelPackageGroupAllowDeploy
# 3. arn:aws:ram::aws:permission/AWSRAMPermissionSageMakerModelPackageGroupAllowRegister
# More details : 
permission = 'arn:aws:ram::aws:permission/AWSRAMDefaultPermissionSageMakerModelPackageGroup'

# Principals can be IAM User, Role, Account or Organization ID. Ref: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ram/client/create_resource_share.html
response = ram_client.create_resource_share(
    name="model-group-resource-share",
    resourceArns=[create_model_package_group_response['ModelPackageGroupArn']],
    principals=['12345'],
    permissionArns = [permission]
)

resource_share_arn = response['resourceShare']['resourceShareArn']
print('Resource Share Arn : {}'.format(resource_share_arn))

Run the following code in the ML Dev account (Account B).

import json
import os
import boto3
from time import gmtime, strftime 

region = boto3.Session().region_name

ram_client = boto3.client('ram')
response = ram_client.get_resource_share_invitations()
pending_invitations = []
# Review all pending invitations
for i in response['resourceShareInvitations']:
    if i['status'] == "PENDING":
        pending_invitations.append(i)
print(pending_invitations,sep='n')

# Accept the resource share invitation from central account
# Replace with the intended invitation arn for acceptance from the central account
if pending_invitations:
    response = ram_client.accept_resource_share_invitation(resourceShareInvitationArn=pending_invitations[0]['resourceShareInvitationArn'])
    print(response) 

sm_client = boto3.client('sagemaker', region_name=region)

response = sm_client.list_model_package_groups(CrossAccountFilterOption="CrossAccount")

MLflow experimentation with the shared model group

The following section shows you how to use Amazon SageMaker with MLflow to track your experiments in the development account and save candidate models in the shared model group while developing a credit risk model. It’s a binary classification problem where the goal is to predict whether a customer is a credit risk. If you want to run the code in your own environment, check out the notebook in this GitHub repository.

SageMaker with MLflow is a capability of SageMaker that you can use to create, manage, analyze, and compare your ML experiments. To get started with MLflow, you need to set up an MLflow tracking server to monitor your experiments and runs. You can set up the server programmatically or by using the SageMaker Studio UI. It can take up to 20 minutes for the setup to complete. The following code snippet shows how to create a tracking server.

sagemaker_client = boto3.client("sagemaker")
timestamp = strftime('%d-%H-%M-%S', gmtime())
server_name = f"mlflow-{domain_id}-{timestamp}"
response = sagemaker_client.create_mlflow_tracking_server(
    TrackingServerName=server_name,
    ArtifactStoreUri=f"s3://{bucket_name}/mlflow/{timestamp}",
    RoleArn=sm_role,
    AutomaticModelRegistration=True,
)

mlflow_arn = response['TrackingServerArn']

To set up an MLflow tracking server in SageMaker Studio, choose the MLflow application icon. When your server is running, click on the ellipses button and then click on Open MLflow button to open the MLflow UI.

Now that your MLflow tracking server is running, you can start tracking your experiments. MLflow tracking allows you to programmatically track the inputs, parameters, configurations, and models of your iterations as experiments and runs.

  • Runs are executions of some piece of data science code and record metadata and generated artifacts.
  • An experiment collects multiple runs with the same objective.

The following code shows you how to set up an experiment and track your executions while developing the credit risk model.

Data preparation

For this example, you will use the South German Credit dataset open source dataset. To use the dataset to train the model, you need to first do some pre-processing, You can run the pre-processing code in your JupyterLab application or on a SageMaker ephemeral cluster as a SageMaker Training job using the @remote decorator. In both cases, you can track your experiments using MLflow.

The following code demonstrates how to track your experiments when executing your code on a SageMaker ephemeral cluster using the @remote decorator. To get started, set-up a name for your experiment.

from time import gmtime, strftime
experiment_suffix = strftime('%d-%H-%M-%S', gmtime())
experiment_name = f"credit-risk-model-experiment-{experiment_suffix}"

The processing script creates a new MLflow active experiment by calling the mlflow.set_experiment() method with the experiment name above. After that, it invokes mlflow.start_run() to launch an MLflow run under that experiment.

@remote(s3_root_uri=f"s3://{bucket_name}/{prefix}", dependencies=f"requirements.txt", instance_type="ml.m5.large")
def preprocess(df, experiment_name, mlflow_arn, bucket_name, prefix, run_id=None): 
    try:
        suffix = strftime('%d-%H-%M-%S', gmtime())
        mlflow.set_tracking_uri(mlflow_arn)
        mlflow.set_experiment(experiment_name=experiment_name if experiment_name else f"credit-risk-model-experiment-{suffix}")
        run = mlflow.start_run(run_id=run_id) if run_id else mlflow.start_run(run_name=f"remote-processing-{suffix}", nested=True)
        .....
     except Exception as e:
        print(f"Exception in processing script: {e}")
        raise e
    finally:
        mlflow.end_run()

You can also log the input dataset and the sklearn model used to fit the training set during pre-processing as part of the same script.

model_dataset = mlflow.data.from_pandas(df)
mlflow.log_input(model_dataset, context="model_dataset")

.....

featurizer_model = transformer.fit(X)
features = featurizer_model.transform(X)
labels = LabelEncoder().fit_transform(y)

.....

mlflow.sklearn.log_model(
    sk_model=featurizer_model,
    artifact_path=f"processing/model",
    registered_model_name="sk-learn-model",
)

In the MLflow UI, use the Experiments to locate your experiment. Its name should start with “credit-risk-model-experiment”.

Click on the experiment name to reveal the table with the associated Runs and then click on the Run whose name starts with “remote-processing”. You will see its details as sin the following figure.

Click on the Artifacts tab to see the MLFlow model generated.

Model training

You can continue experimenting with different feature engineering techniques in your JupyterLab environment and track your experiments in MLflow. After you have completed the data preparation step, it’s time to train the classification model. You can use the xgboost algorithm for this purpose and run your code either in your JupyterLab environment or as a SageMaker Training job. Again, you can track your experiments using MLflow in both cases. The following example shows how to use MLflow with a SageMaker Training job in your code. You can use the method mlflow.autolog() to log metrics, parameters, and models without the need for explicit log statements.

import xgboost
import pickle as pkl
import os
import mlflow
import tarfile

@remote(s3_root_uri=f"s3://{bucket_name}/{prefix}", dependencies=f"requirements.txt", instance_type="ml.m5.large")
def train(X, val_X, y, val_y, num_round, params, mlflow_arn, experiment_name,run_id=None):
    output_path = "/opt/ml/model"
    mlflow.set_tracking_uri(mlflow_arn)
    mlflow.autolog()
    
    suffix = strftime('%d-%H-%M-%S', gmtime())
    mlflow.set_experiment(experiment_name=experiment_name if experiment_name else f"credit-risk-model-experiment-{suffix}")
    run = mlflow.start_run(run_id=run_id) if run_id else mlflow.start_run(run_name=f"remote-training-{suffix}", nested=True)

    try:
        os.makedirs(output_path, exist_ok=True)
        print(f"Directory '{output_path}' created successfully.")
    except OSError as e:
        print(f"Error creating directory '{output_path}': {e}")
        
    dtrain = xgboost.DMatrix(X, label=y)
    dval = xgboost.DMatrix(val_X, label=val_y)

    dtrain = xgboost.DMatrix(X, label=y)
    dval = xgboost.DMatrix(val_X, label=val_y)

    watchlist = [(dtrain, "train"), (dval, "validation")]
    mlflow.log_params(params)

    print("Training the model")
    evaluation__results = {}
    bst = xgboost.train(
        params=params, dtrain=dtrain, evals=watchlist, num_boost_round=num_round
    )
    pkl.dump(bst, open(output_path + "/model.bin", "wb"))

     # Compress the model.bin artifact to a tar file
    tar_filename = f"{output_path}/model.tar.gz"
    with tarfile.open(tar_filename, "w:gz") as tar:
        tar.add(f"{output_path}/model.bin", arcname="model.bin")

    mlflow.log_artifact(local_path=tar_filename)

In addition, you can use the mlflow.log_artifact() method to save the model.tar.gz file in MLflow so that you can directly use it later when you register the model to the model registry.

Navigate back to the MLflow UI. Click on the name of your experiment at the top of your screen starting with “credit-risk-model-experiment” to see the updated Runs table. Click on the name of your remote-training Run to see the overview of the training run including the associated hyperparameters, model metrics, and generated model artifacts.

The following figure shows the overview of a training run.

Click on the Model metrics tab to view the metrics tracked during the training run. The figure below shows the metrics of a training run.

Click on the Artifacts tab to view the artifacts generated during the training run. The following figure shows an example of the generated artifacts.

Registering the model to the model registry

ML experimentation is an iterative process and you typically end up with a number of candidate models. With MLflow, you can compare these models to identify the one that you want to move to quality assurance for approval. The following is an example of how to retrieve the best candidate using the MLflow API based on a specific metric.

from mlflow.entities import ViewType

run_filter = f"""
attributes.run_name LIKE "%training%"
attributes.status = 'FINISHED'
"""

runs_with_filter = mlflow.search_runs(
    experiment_names=[experiment_name],
    run_view_type=ViewType.ACTIVE_ONLY,
    filter_string=run_filter,
    order_by=["metrics.`validation-auc` DESC"],
)
best_run = runs_with_filter[:1]
artifact_uri = best_run['artifact_uri'][0]

After you have selected a model, you can register it to the shared model group in the shared services account. You can discover the model groups that are available to you either through the SageMaker Studio UI or programmatically.

The final step is to register the candidate model to the model group as a new model version.

modelpackage_inference_specification =  {
    "InferenceSpecification": {
      "Containers": [
         {
            "Image": "885854791233.dkr.ecr.us-east-1.amazonaws.com/sagemaker-distribution-prod@sha256:9e7622bbe2f3ee9dd516797bfe3ed310983b96190eeefbdeeeea69519d3946fe",
            "ModelDataUrl": f"{artifact_uri}/model.tar.gz"
         }
      ],
      "SupportedContentTypes": [ "text/csv" ],
      "SupportedResponseMIMETypes": [ "text/csv" ],
   },
    "ModelPackageGroupName" : model_package_group_arn,
    "ModelPackageDescription" : "Model to detect credit risk",
    "ModelApprovalStatus" : "PendingManualApproval"
}

model_package_group_name = "model-group-" + str(round(time.time()))

create_model_package_input_dict = {
    "ModelPackageGroupName" : model_package_group_name,
    "ModelPackageDescription" : "Model to detect credit risk",
    "ModelApprovalStatus" : "PendingManualApproval"
}
create_model_package_input_dict.update(modelpackage_inference_specification)

create_model_package_response = sagemaker_client.create_model_package(**create_model_package_input_dict)
model_package_arn = create_model_package_response["ModelPackageArn"]
print('ModelPackage Version ARN : {}'.format(model_package_arn))

Design considerations for use case and model stage governance

Use case and model stage governance is a construct to track governance information of a use case or model across various stages in its journey to production. Also, periodic tracking of key model performance and drift metrics is used to surface those metrics for governance functions.

There are several use case and model stage governance attributes that need to be tracked, such as the following:

  1. Use case ID: Unique OD of the use case.
  2. Use case name: Name of the use case.
  3. Use case stage: Current stage of the use case. For example, proof of concept, development, QA, and so on.
  4. Model group: SageMaker model group name.
  5. Model version: SageMaker model version name.
  6. Model owner: Person or entity who owns the model.
  7. Model LoB: Model owner’s line of business.
  8. Model project: Project or use case that the model is part of.
  9. Model stage: Stage where the model version is deployed. For example, development, test, or production.
  10. Model status: Status of the model version in a given stage. For example, pending or approved.
  11. Model risk: Risk categorization of the model version. For example, high, medium, or low.
  12. Model validation metrics: Model validation metrics in JSON format.
  13. Model monitoring metrics: Model monitoring metrics in JSON format. This needs to include the endpoint from which this metrics was captured.
  14. Model audit timestamp: Timestamp when this record was updated.
  15. Model audit user: User who updated this record.

Create a use case or model stage governance construct with the preceding set of attributes and drive your deployment and governance workflows using this table. Next, we will describe the design considerations for deployment and governance workflows.

Design considerations for deployment and governance workflows

Following are the design consideration for the deployment and governance workflows:

  1. The model version is built in the development account and registered with pending status in the central model registry or model group.
  2. A sync process is triggered to capture the key model attributes, derive additional governance attributes, and create a development stage record in the model governance stage table. Model artifacts from the development account are synced into the central model registry account.
  3. The model owner approves the model version in the development stage for deployment to the test stage in the central model registry.
  4. A deployment pipeline is triggered and the model is deployed to the test environment and a new test stage record is created for that model version.
  5. The model version is tested and validated in the test environment and validation metrics are captured in the test stage record in the model governance stage construct.
  6. The governance officer verifies the model validation results and approves the model version for deployment to production. The production stage record is created for the model version in the model governance stage table.
  7. A deployment pipeline is triggered and the model is deployed to the production environment and the production stage record model status is updated to deployed for that model version.
  8. After the model monitoring jobs are set up, model inference metrics are periodically captured and aggregated and model metrics are updated in model stage governance table.
  9. The use case stage value is updated to the next stage when all models for that use case are approved in the previous stage.

Conclusion

In this post, we have discussed how to centralize your use case and model governance function in a multi-account environment using the new model group sharing feature of SageMaker Model Registry. We shared an architecture for setting up central use case and model governance and walked through the steps involved in building that architecture. We provided practical guidance for setting up cross-account model group sharing using SageMaker Studio and APIs. Finally, we discussed key design considerations for building the centralized use case and model governance functions to extend the native SageMaker capabilities. We encourage you to try this model-sharing feature along with centralizing your use case and model governance functions. You can leave feedback in the comments section.


About the authors

Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his 3-year-old Sheepadoodle.

Anastasia Tzeveleka is a Senior Generative AI/ML Specialist Solutions Architect at AWS. As part of her work, she helps customers across EMEA build foundation models and create scalable generative AI and machine learning solutions using AWS services.

Siamak Nariman is a Senior Product Manager at AWS. He is focused on AI/ML technology, ML model management, and ML governance to improve overall organizational efficiency and productivity. He has extensive experience automating processes and deploying various technologies.

Madhubalasri B. is a Software Development Engineer at Amazon Web Services (AWS), focusing on the SageMaker Model Registry and machine learning governance domain. She has expertise in cross-account access and model sharing, ensuring secure, scalable, and compliant deployment of machine learning models. Madhubalasri is dedicated to driving innovation in ML governance and optimizing model management processes

Saumitra Vikaram is a Senior Software Engineer at AWS. He is focused on AI/ML technology, ML model management, ML governance, and MLOps to improve overall organizational efficiency and productivity.

Keshav Chandakis a Software Engineer at AWS with a focus on the SageMaker Repository Service. He specializes in developing capabilities to enhance governance and management of ML models.

Read More

Open for Development: NVIDIA Works With Cloud-Native Community to Advance AI and ML

Open for Development: NVIDIA Works With Cloud-Native Community to Advance AI and ML

Cloud-native technologies have become crucial for developers to create and implement scalable applications in dynamic cloud environments.

This week at KubeCon + CloudNativeCon North America 2024, one of the most-attended conferences focused on open-source technologies, Chris Lamb, vice president of computing software platforms at NVIDIA, delivered a keynote outlining the benefits of open source for developers and enterprises alike — and NVIDIA offered nearly 20 interactive sessions with engineers and experts.

The Cloud Native Computing Foundation (CNCF), part of the Linux Foundation and host of KubeCon, is at the forefront of championing a robust ecosystem to foster collaboration among industry leaders, developers and end users.

As a member of CNCF since 2018, NVIDIA is working across the developer community to contribute to and sustain cloud-native open-source projects. Our open-source software and more than 750 NVIDIA-led open-source projects help democratize access to tools that accelerate AI development and innovation.

Empowering Cloud-Native Ecosystems

NVIDIA has benefited from the many open-source projects under CNCF and has made contributions to dozens of them over the past decade. These actions help developers as they build applications and microservice architectures aligned with managing AI and machine learning workloads.

Kubernetes, the cornerstone of cloud-native computing, is undergoing a transformation to meet the challenges of AI and machine learning workloads. As organizations increasingly adopt large language models and other AI technologies, robust infrastructure becomes paramount.

NVIDIA has been working closely with the Kubernetes community to address these challenges. This includes:

  • Work on dynamic resource allocation (DRA) that allows for more flexible and nuanced resource management. This is crucial for AI workloads, which often require specialized hardware. NVIDIA engineers played a key role in designing and implementing this feature.
  • Leading efforts in KubeVirt, an open-source project extending Kubernetes to manage virtual machines alongside containers. This provides a unified, cloud-native approach to managing hybrid infrastructure.
  • Development of NVIDIA GPU Operator, which automates the lifecycle management of NVIDIA GPUs in Kubernetes clusters. This software simplifies the deployment and configuration of GPU drivers, runtime and monitoring tools, allowing organizations to focus on building AI applications rather than managing infrastructure.

The company’s open-source efforts extend beyond Kubernetes to other CNCF projects:

  • NVIDIA is a key contributor to Kubeflow, a comprehensive toolkit that makes it easier for data scientists and engineers to build and manage ML systems on Kubernetes. Kubeflow reduces the complexity of infrastructure management and allows users to focus on developing and improving ML models.
  • NVIDIA has contributed to the development of CNAO, which manages the lifecycle of host networks in Kubernetes clusters.
  • NVIDIA has also added to Node Health Check, which provides virtual machine high availability.

And NVIDIA has assisted with projects that address the observability, performance and other critical areas of cloud-native computing, such as:

  • Prometheus: Enhancing monitoring and alerting capabilities
  • Envoy: Improving distributed proxy performance
  • OpenTelemetry: Advancing observability in complex, distributed systems
  • Argo: Facilitating Kubernetes-native workflows and application management

Community Engagement 

NVIDIA engages the cloud-native ecosystem by participating in CNCF events and activities, including:

  • Collaboration with cloud service providers to help them onboard new workloads.
  • Participation in CNCF’s special interest groups and working groups on AI discussions.
  • Participation in industry events such as KubeCon + CloudNativeCon, where it shares insights on GPU acceleration for AI workloads.
  • Work with CNCF-adjacent projects in the Linux Foundation as well as many partners.

This translates into extended benefits for developers, such as improved efficiency in managing AI and ML workloads; enhanced scalability and performance of cloud-native applications; better resource utilization, which can lead to cost savings; and simplified deployment and management of complex AI infrastructures.

As AI and machine learning continue to transform industries, NVIDIA is helping advance cloud-native technologies to support compute-intensive workloads. This includes facilitating the migration of legacy applications and supporting the development of new ones.

These contributions to the open-source community help developers harness the full potential of AI technologies and strengthen Kubernetes and other CNCF projects as the tools of choice for AI compute workloads.

Check out NVIDIA’s keynote at KubeCon + CloudNativeCon North America 2024 delivered by Chris Lamb, where he discusses the importance of CNCF projects in building and delivering AI in the cloud and NVIDIA’s contributions to the community to push the AI revolution forward.

Read More