Use Llama 3.1 405B to generate synthetic data for fine-tuning tasks

Use Llama 3.1 405B to generate synthetic data for fine-tuning tasks

Today, we are excited to announce the availability of the Llama 3.1 405B model on Amazon SageMaker JumpStart, and Amazon Bedrock in preview. The Llama 3.1 models are a collection of state-of-the-art pre-trained and instruct fine-tuned generative artificial intelligence (AI) models in 8B, 70B, and 405B sizes. Amazon SageMaker JumpStart is a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. Amazon Bedrock offers a straightforward way to build and scale generative AI applications with Meta Llama models, using a single API.

In this post, we show how to use Llama 3.1 405B to generate data (labels for a sample dataset), and how to use the generated data to fine-tune a smaller model like Llama 3 8B to generate better responses compared to the non-fine-tuned model.

Overview of Llama 3.1 405B

The Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pre-trained and instruction tuned generative models in 8B, 70B, and 405B sizes (text in/text out). All models support long context length (128,000) and are optimized for inference with support for grouped query attention (GQA). The Llama 3.1 instruction tuned text-only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the publicly available chat models on common industry benchmarks.

Llama 3.1 405B is the first publicly available model that rivals the top models in AI when it comes to state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation. There are some unique ways to use it—in addition to direct inference, you can use the Llama 3.1 405B model to generate synthetic data to improve smaller models, and it can be a powerful domain-specific model by acting as the base model for domain-specific fine-tuning.

Llama 3.1 models are available today for inferencing on SageMaker JumpStart and Amazon Bedrock. On SageMaker JumpStart, they are rolling out to all AWS Regions where SageMaker JumpStart is available and support the required instance types. Llama 3.1 405B will require P5 instances on Amazon SageMaker. The Llama 3.1 models are also available today in the us-west-2 Region on Amazon Bedrock, with planned future expanded Regional availability.

Prerequisites

The following prerequisites are needed to implement the steps outlined in this post:

Responses from the Llama 3 8B Instruct model

Firstly, we perform inference with the Llama 3 8B model either directly through Amazon Bedrock or a deployed endpoint using SageMaker JumpStart. With Llama 3 Instruct models, which are optimized for dialogue use cases, the input to the model endpoints is the previous history between the chat assistant and the user. We can ask context-aware questions to conversations that have happened so far, using specific formatting for the input text (described in our earlier Llama 3B release posts, Meta Llama 3 models are now available in Amazon Bedrock and Meta Llama 3 models are now available in Amazon SageMaker JumpStart).

In the following example, the user has a conversation with the assistant about tourist sites in Paris. The assistant generated four different recommendation options, and then the user inquires about the first option:

Input: <|begin_of_text|><|start_header_id|>user<|end_header_id|>nnI am going to Paris, what should I see?<|eot_id|><|start_header_id|>assistant<|end_header_id|>nnParis, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:nn1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.n2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.n3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.nnThese are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.<|eot_id|><|start_header_id|>user<|end_header_id|>nnWhat is so great about #1?<|eot_id|><|start_header_id|>assistant<|end_header_id|>nn

Output: The Eiffel Tower! It's an iconic symbol of Paris and one of the most recognizable landmarks in the world. Here are some reasons why it's so great:nn1. **Engineering marvel**: When it was built for the 1889 World's Fair, the Eiffel Tower was a groundbreaking feat of engineering. It was the tallest structure in the world at the time, standing at 324 meters (1,063 feet).n2. **Panoramic views**: The Eiffel Tower offers breathtaking 360-degree views of the city from its observation decks. You can see many of Paris' famous landmarks, such as the Seine River, the Louvre, and Montmartre.n3. **Romantic atmosphere**: The Eiffel Tower is often associated with romance and is a popular spot for proposals, weddings, and romantic dinners. The tower's iron latticework and twinkling lights at night add to its romantic ambiance.n4. **Historical significance**: The Eiffel Tower has played a significant role in French history, serving as a communication tower during World War I and a symbol of French culture and engineering.n5. **Accessibility**: The Eiffel Tower is easily accessible by public transportation or on foot, and there are various ways to experience it, including taking the stairs, elevator, or a guided tour.n6. **Nighttime spectacle**: The Eiffel Tower is even more magical at night, when it's illuminated with thousands of twinkling lights. The light show, known as the "Eiffel Tower Sparkle," takes place every hour on the hour from dusk to midnight.n7. **Iconic status**: The Eiffel Tower has been featured in countless films, literature, and art, making it an instantly recognizable symbol of Paris and France.nnOverall, the Eiffel Tower is a must-visit attraction in Paris, offering a unique combination of history, engineering, romance, and stunning views.

The Llama 3 8B model is able to generate answers for the questions without issues.

Next, let’s test the ability of Llama 3 8B to answer logical and arithmetic questions (derived from Hugging Face’s AQUA-RAT dataset—instead of multiple choice options, we ask for full answers) as follows:

Input:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
A board 7ft. 9 inches long is divided into 3 equal parts .
What is the length of each part??<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

Output:
A nice simple problem!Let's convert the length of the board to inches:
7 ft 9 inches = 95 inches (since 1 foot = 12 inches).Since the board is
divided into 3 equal parts, each part will have the same length.
To find the length of each part, we'll divide the total length (95 inches)
by 3:95 inches ÷ 3 = 31.67 inches
So each part of the board is approximately 31.67 inches long.

This answer looks almost correct but not quite. The correct answer is 31 inches long. Similar logical questions are not answered correctly by the Llama 3 8B model.

In order for the Llama 3 8B model to improve its logical question answering capability, we want to fine-tune the model with data from the AQUA-RAT dataset. As we already mentioned, the AQUA-RAT dataset contains multiple choice options for the LLM to choose from. Because we don’t have the full answers for this dataset, we use the Llama 3.1 405B model to generate the verbal answer to the questions, and use that dataset to fine-tune the Llama 3 8B model.

Generate label data using Llama 3.1 405B

Because Llama 3.1 405B is the most capable of the Llama 3.1 collection of models, and because of its state-of-the-art math and general knowledge capabilities, we run direct inference of the questions in the AQUA-RAT dataset on Llama 3.1 405B using either SageMaker JumpStart or Amazon Bedrock. This helps us generate the answers we want to use to fine-tune the smaller Llama 3 8B models. In essence, we’re using Llama 3.1 405B as an alternative to human annotation to generate labels for the dataset. The following are example inference outputs from the 405B model:

Input:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
A board 7ft. 9 inches long is divided into 3 equal parts .
What is the length of each part??<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

Output:
A nice simple problem!Let's break it down:The board is 7ft 9in long.
We can convert this to inches only:7ft = 7 × 12 = 84in (since there are
12 inches in a foot) Add the 9 inches: 84in + 9in = 93in
Now, the board is divided into 3 equal parts, so we need to divide the
total length (93in) by 3:93in ÷ 3 = 31inSo, each part is 31 inches long.

We can clearly see that the 405B answer is logically and mathematically correct, and we can use this answer in the place of human annotation for fine-tuning purposes. We concatenate the answers from the 405B model for approximately 2,000 questions from the AQUA-RAT dataset, which becomes our training dataset. Our test dataset comes from extracted answers for approximately 4,000 more questions.

The training data is constructed as a JSON object as follows:

{"instruction": "The entrance fee for a fair is $5 for persons under the age of 18, and 20% more for persons older. Each ride at the fair costs $0.50. If Joe goes with her 6 years old twin brothers, and they each took 3 rides in total. How much money does Joe end up spending at the fair?", "response": "The entrance fee for the 2 brothers is 5 x 2 = 10 dollars. The entrance fee for Joe is 5 x 1.2 = 6 dollars. So the total entrance fee is 10 + 6 = 16 dollars. The cost of the rides is 3 x 0.5 x 3 = 4.5 dollars. So the total cost is 16 + 4.5 = 20.5 dollars. The answer is 20.5."}

{"instruction": "John found that the average of 15 numbers is 40. If 10 is added to each number then the mean of number is?", "response": "The average of 15 numbers is 40. So the sum of the 15 numbers is 15 x 40 = 600. If 10 is added to each number, then the sum becomes 600 + 15 x 10 = 600 + 150 = 750. The average of the new numbers is 750 / 15 = 50.n#### 50"}

{"instruction": "A person is traveling at 20 km/hr and reached his destiny in 2.5 hr then find the distance?", "response": "A simple problem!nnLet's use the formula:nnDistance = Speed u00d7 TimennGiven:nSpeed (v) = 20 km/hrnTime (t) = 2.5 hrnnPlugging in the values, we get:nnDistance = 20 km/hr u00d7 2.5 hrn= 50 kmnnSo, the person traveled a distance of 50 km."}

The instruction and response dataset are then used to fine-tune the Llama 3 8B model in SageMaker JumpStart.

Fine-tune the Llama 3 8B model with the generated labels

For instructions on fine-tuning this model, refer to Fine-tune Code Llama on Amazon SageMaker JumpStart. We used a g5.12xlarge instance using the instruction fine-tuning option for five epochs, and observed that the loss function showed improvement both on the training and test sets.

When the fine-tuning was complete, we deployed the fine-tuned model for inference using SageMaker JumpStart on a g5.12xlarge instance (you can deploy on any other supported instance based on your throughput and latency requirements). Please refer to Meta’s End User License Agreement (EULA) for requirements when using synthetic-fine-tuned models for any use case.

Then we performed inference on another 1,000 unseen queries from the AQUA-RAT dataset. The following table shows a few of the queries for comparison.

Query Answer Llama 3 8B Responses Before Fine-Tuning Llama 3 8B Responses After Fine-Tuning
Tim has 350 pounds of cement in 100, 50, and 25 pound bags. He has an equal number of each size bag. How many bags of cement does Tim have? 6 We know there are 3 types of bags. 100, 50, and 25 pounds. He has an equal number of each. So if x is the number of each type of bag, then you know 100x + 50x + 25x = 350. This is because each type of bag has x number of bags, and there is 100 pounds of cement per bag in the first case, 50 in the second, and 25 in the third. 175x = 350. So if we divide both sides by 175 we get x = 2. The answer is 2 bags of each. Let’s assume Tim has X 100 pound bags, X 50 pound bags, and X 25 pound bags. He has in total 100X + 50X + 25X = 175X pounds of cement. Since he has 350 pounds total, we know 175X = 350, so X = 2. That means he has 2 bags of each type, so in total he has 2 + 2 + 2 = 6 bags. The answer is 6.
The purchase price of an article is $48. In order to include 30% of cost for overhead and to provide $12 of net profit, what should the markup be? $55

To solve this problem, we can start by identifying the cost of the article, which is the purchase price of $48. We are also given that the markup should be enough to include 30% of the cost for overhead and provide a net profit of $12.

Let’s break this down step by step:

1. Calculate the cost of the article: $48 (purchase price)
2. Calculate the overhead cost: 30% of the cost = 0.3 x $48 = $14.40
3. Calculate the total cost: cost + overhead cost = $48 + $14.40 = $62.40
4. Calculate the selling price: total cost + net profit = $62.40 + $12 = $74.40

Therefore, the markup should be $74.40 – $48 = $26.40. This means that the selling price should be $26.40 more than the purchase price to include 30% of the cost for overhead and provide a net profit of $12.

Step 1: Calculate the cost price of the articlenThe cost price of the article is $48.nn## Step 2: Calculate the overhead cost as a percentage of the cost pricenThe overhead cost is 30% of the cost price, which is 0.3 * $48 = $14.40.nn## Step 3: Calculate the selling price of the articlenThe selling price of the article is the cost price plus the overhead cost plus the net profit. The net profit is $12.nn## Step 4: Calculate the selling pricenThe selling price is $48 + $14.40 + $12 = $74.40.nn## Step 5: Calculate the markup as a percentage of the cost pricenThe markup is the difference between the selling price and the cost price, divided by the cost price. The markup is ($74.40 – $48) / $48 = $26.40 / $48 = 55%.nnThe final answer is: $55‘}
The slant height of a cone is 35 cm and radius of the base is 14cm, find the curved surface of the cone. 1540 A nice problem!
The slant height of a cone is the distance from the center of the base to the vertex of the cone. Let’s call this distance “l”. We are given that the slant height is 35 cm, so: l = 35 cm
The radius of the base is given as 14 cm, so the diameter of the base is 28 cm. The formula to find the curved surface area of a cone is: Curved Surface Area = πr(l + r) where r is the radius of the base and l is the slant height. Plugging in the values, we get:
Curved Surface Area = π(14)(35 + 14) = π(14)(49) = 686π
Now, we can simplify this expression by multiplying the numbers:
Curved Surface Area = 686 × 3.14 (approximately) = 2153.24 cm²
So, the curved surface area of the cone is approximately 2153.24 cm².
The formula for the curved surface area of a cone is:
Curved Surface Area = πrl
Where r is the radius of the base and l is the slant height.
Given: r = 14 cm  l = 35 cm
Plugging in the values, we get:
Curved Surface Area = π(14)(35) = 14π(35) = 490π = 490 × 3.14 = 1537.4 cm²
Therefore, the curved surface area of the cone is approximately 1537.4 cm².
The speed at which a man can row a boat in still water is 25 kmph. If he rows downstream, where the speed of current is 11 kmph, what time will he take to cover 80 metres? 8 seconds The speed of the man in still water is 25 kmph. The speed of the current is 11 kmph. So the speed downstream is 25 + 11 = 36 kmph. 80 meters is 0.02 km. So the time taken is 0.02 / 36 = 0.0056 hours. The answer is 5.6 minutes. Speed of the boat in still water = 25 kmph
Speed of the current = 11 kmph
Speed downstream = 25 + 11 = 36 kmph
Speed downstream in m/s = (36 * 5/18) = 10 m/s
Distance = 80 m
Time = Distance / Speed = 80 / 10 = 8 seconds.

We can qualitatively see that the answers for the math problems and logical questions are improved after fine-tuning the model. We will perform additional quantitative metric evaluation in future posts. This exercise demonstrates how you can use the Llama 3.1 405B model to efficiently generate datasets in an accelerated fashion and then use those datasets to significantly improve the task-specific capabilities of smaller models.

Conclusion

In this post, we showed how you can use the new Llama 3.1 405B model to synthesize and generate data labels to improve the performance of a much smaller model (Llama 3 8B in this case). We also showed that the responses generated by the fine-tuned model are much improved compared to the model without fine-tuning. We also provided the code notebook that you can use to run and test the solution.

As a next step, we encourage you to use this idea along with the Llama-3.1 405B model in your use case to generate labels or even unlabeled data that can then be used by a smaller model downstream to help solve your use case.


About the Authors

Sebastian Bustillo is an Enterprise Solutions Architect at AWS. He focuses on AI/ML technologies with a profound passion for generative AI and compute accelerators. At AWS, he helps customers unlock business value through cloud technologies and AI/ML. When he’s not at work, he enjoys brewing a perfect cup of specialty coffee and riding his MTB.

Dr. Farooq Sabir is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He holds PhD and MS degrees in Electrical Engineering from the University of Texas at Austin and an MS in Computer Science from Georgia Institute of Technology. He has over 15 years of work experience and also likes to teach and mentor college students. At AWS, he helps customers formulate and solve their business problems in data science, machine learning, computer vision, artificial intelligence, numerical optimization, and related domains. Based in Dallas, Texas, he and his family love to travel and go on long road trips.

Dr. Natarajan Chennimalai Kumar is a Principal Solutions Architect in the 3rd Party Model Provider team at AWS, working closely with the Llama partner engineering team at Meta to enable AWS customers use Meta’s Llama models. He holds a PhD from University of Illinois at Urbana-Champaign. He is based in the Bay Area in California. Outside of work, he enjoys watching shows with his kids, playing tennis, and traveling with his family.

Madhur Prashant is an AI and ML Solutions Architect at Amazon Web Services. He is passionate about the intersection of human thinking and generative AI. His interests lie in generative AI, specifically building solutions that are helpful and harmless, and most of all optimal for customers. Outside of work, he loves doing yoga, writing blogs, hiking, spending time with his twin, and playing the guitar.

Dr. Nikita Ivkin is a Senior Applied Scientist for Amazon SageMaker. He focuses on inference acceleration for foundation models and scalable ML algorithms in general. His research interests are in the area of inference acceleration, streaming algorithms, and federated learning, with publishing in a variety of machine learning and computer science venues such as NeurIPS, ICML, ICLR, STOC, PODS, and others.

Supriya Puragundla is a Senior Solutions Architect at AWS. She has over 15 years of IT experience in software development, design, and architecture. She helps key customer accounts on their data, generative AI, and AI/ML journeys. She is passionate about data-driven AI and the area of depth in ML and generative AI.

Dr. Xin Huang is a Senior Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, and KDD conferences, and Royal Statistical Society: Series A.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker JumpStart and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Karl Albertsen leads the product management and partnership teams for Amazon SageMaker. He is focused on making AI accessible, cost-effective, and high-performing for business applications.

Christopher Whitten is an SDE with the SageMaker JumpStart team leading model onboarding and deeper integration with SageMaker services. Chris is passionate about accelerating the ubiquity of AI in practical business applications. His technical interests include agentic workflows and MLOps.

Hemant Singh is an Applied Scientist with experience in Amazon SageMaker JumpStart. He got his master’s from Courant Institute of Mathematical Sciences and B.Tech from IIT Delhi. He has experience in working on a diverse range of machine learning problems within the domain of natural language processing, computer vision, and time series analysis.

Evan Kravitz is a software engineer at Amazon Web Services, working on SageMaker JumpStart. He is interested in the confluence of machine learning with cloud computing. Evan received his undergraduate degree from Cornell University and master’s degree from the University of California, Berkeley. In 2021, he presented a paper on adversarial neural networks at the ICLR conference. In his free time, Evan enjoys cooking, traveling, and going on runs in New York City.

Read More

Llama 3.1 models are now available in Amazon SageMaker JumpStart

Llama 3.1 models are now available in Amazon SageMaker JumpStart

Today, we are excited to announce that the state-of-the-art Llama 3.1 collection of multilingual large language models (LLMs), which includes pre-trained and instruction tuned generative AI models in 8B, 70B, and 405B sizes, is available through Amazon SageMaker JumpStart to deploy for inference. Llama is a publicly accessible LLM designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative artificial intelligence (AI) ideas. In this post, we walk through how to discover and deploy Llama 3.1 models using SageMaker JumpStart.

Overview of Llama 3.1

The Llama 3.1 multilingual LLMs are a collection of pre-trained and instruction tuned generative models in 8B, 70B, and 405B sizes (text in/text and code out). All models support long context length (128,000) and are optimized for inference with support for grouped query attention (GQA). The Llama 3.1 instruction tuned text-only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the publicly available chat models on common industry benchmarks.

At its core, Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. Architecturally, the core LLM for Llama 3 and Llama 3.1 is the same dense architecture.

Llama 3.1 also offers instruct variants, and the instruct model is fine-tuned for tool use. The model has been trained to generate calls for a few specific tools for capabilities like search, image generation, code execution, and mathematical reasoning. In addition, the model supports zero-shot tool use.

The responsible use guide from Meta can assist you in performing additional fine-tuning that may be necessary to customize and optimize the models with appropriate safety mitigations.

Overview of SageMaker JumpStart

SageMaker JumpStart offers access to a broad selection of publicly available foundation models (FMs). These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch.

With SageMaker JumpStart, you can deploy models in a secure environment. The models are provisioned on dedicated SageMaker Inference instances, including AWS Trainium and AWS Inferentia powered instances, and are isolated within your virtual private cloud (VPC). This enforces data security and compliance, because the models operate under your own VPC controls, rather than in a shared public environment. After deploying an FM, you can further customize and fine-tune it using the extensive capabilities of Amazon SageMaker, including SageMaker Inference for deploying models and container logs for improved observability. With SageMaker, you can streamline the entire model deployment process.

Discover Llama 3.1 models in SageMaker JumpStart

SageMaker JumpStart provides FMs through two primary interfaces: Amazon SageMaker Studio and the SageMaker Python SDK. This provides multiple options to discover and use hundreds of models for your specific use case.

SageMaker Studio is a comprehensive integrated development environment (IDE) that offers a unified, web-based interface for performing all aspects of the machine learning (ML) development lifecycle. From preparing data to building, training, and deploying models, SageMaker Studio provides purpose-built tools to streamline the entire process. In SageMaker Studio, you can access SageMaker JumpStart to discover and explore the extensive catalog of FMs available for deployment to inference capabilities on SageMaker Inference.

Alternatively, you can use the SageMaker Python SDK to programmatically access and utilize SageMaker JumpStart models. This approach allows for greater flexibility and integration with existing AI and ML workflows and pipelines. By providing multiple access points, SageMaker JumpStart helps you seamlessly incorporate pre-trained models into your AI and ML development efforts, regardless of your preferred interface or workflow.

Deploy Llama 3.1 models for inference using SageMaker JumpStart

On the SageMaker JumpStart landing page, you can browse for solutions, models, notebooks, and other resources. You can find the Llama 3.1 models in the Foundation Models: Text Generation carousel.

If you don’t see the Llama 3.1 models, update your SageMaker Studio version by shutting down and restarting. For more information about version updates, refer to Shut down and Update Studio Classic Apps.

The following table lists the Llama 3.1 models you can access in SageMaker JumpStart.

Model Name Description Key Capabilities
Meta-Llama-3.1-8B Llama-3.1-8B is a state-of-the-art publicly accessible model that excels at language nuances, contextual understanding, and complex tasks like translation and dialogue generation in 8 languages. Top capabilities include multilingual support and stronger reasoning capabilities, enabling advanced use cases like long-form text summarization and multilingual conversational agents.
Meta-Llama-3.1-8B-Instruct Llama-3.1-8B-Instruct is an update to Meta-Llama-3-8B-Instruct, an assistant-like chat model, that includes an expanded 128K context length, multilinguality, and improved reasoning capabilities. Top capabilities include the ability to follow instructions and tasks, improved reasoning and understanding of nuances and context, and multilingual translation.
Meta-Llama-3.1-70B Llama-3.1-70B is a state-of-the-art publicly accessible model that excels at language nuances, contextual understanding, and complex tasks like translation and dialogue generation in 8 languages. Top capabilities include multilingual support and stronger reasoning capabilities, enabling advanced use cases like long-form text summarization, and multilingual conversational agents.
Meta-Llama-3.1-70B-Instruct Llama-3.1-70B-Instruct is an update to Llama-3-70B-Instruct, an assistant-like chat model, that includes an expanded 128K context length, multilinguality, and improved reasoning capabilities. Top capabilities include the ability to follow instructions and tasks, improved reasoning and understanding of nuances and context, and multilingual translation.
Meta-Llama-3.1-405B Llama-3.1-405B is the largest, most capable publicly available FM, unlocking new applications and innovations, and paving the way for groundbreaking technologies like synthetic data generation and model distillation. Llama-3.1-405B unlocks innovation with capabilities like general knowledge, steerability, math, tool use, and multilingual translation, enabling new possibilities for innovation and development.
Meta-Llama-3.1-405B-Instruct Llama-3.1-405B-Instruct is the largest and most powerful of the Llama 3.1 Instruct models. It’s a highly advanced model for conversational inference and reasoning, synthetic data generation, and a base to do specialized continual pre-training or fine-tuning on a specific domain. Llama-3.1-405B unlocks innovation with capabilities like general knowledge, steerability, math, tool use, and multilingual translation, enabling new possibilities for innovation and development.
Meta-Llama-3.1-405B-FP8 This is FP8 Quantized Version of Llama-3.1-405B. Llama-3.1-405B unlocks innovation with capabilities like general knowledge, steerability, math, tool use, and multilingual translation, enabling new possibilities for innovation and development.
Meta-Llama-3.1-405B-Instruct-FP8 This is FP8 Quantized Version of Llama-3.1-405B-Instruct. Llama-3.1-405B unlocks innovation with capabilities like general knowledge, steerability, math, tool use, and multilingual translation, enabling new possibilities for innovation and development.

You can choose the model card to view details about the model such as license, data used to train, and how to use. You can also find two buttons, Deploy and Open Notebook, which help you use the model.

When you choose either button, a pop-up window will show the End-User License Agreement (EULA) and acceptable use policy for you to accept.

Upon acceptance, you will proceed to the next step to use the model.

Deploy Llama 3.1 models for inference using the Python SDK

When you choose Deploy and accept the terms, model deployment will start. Alternatively, you can deploy through the example notebook by choosing Open Notebook. The notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.

To deploy using a notebook, you start by selecting an appropriate model, specified by the model_id. You can deploy any of the selected models on SageMaker.

You can deploy a Llama 3.1 405B model in FP8 using SageMaker JumpStart with the following SageMaker Python SDK code:

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id = "meta-llama-3-1-405b-fp8")
predictor = model.deploy(accept_eula=accept_eula)

This deploys the model on SageMaker with default configurations, including default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. To successfully deploy the model, you must manually set accept_eula=True as a deploy method argument. After it’s deployed, you can run inference against the deployed endpoint through the SageMaker predictor:

payload = {
    "inputs": "The color of the sky is blue but sometimes it can also be ",
    "parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
}
response = predictor.predict(payload)

The following table lists all the Llama models available in SageMaker JumpStart along with the model_ids, default instance types, and the maximum number of total tokens (sum of number of input tokens and number of generated tokens) supported for each of these models. For increased context length, customers can modify the default instance type in the SageMaker JumpStart UI.

Model Name Model ID Default instance type Supported instance types
Meta-Llama-3.1-8B meta-llama-3-1-8b ml.g5.4xlarge (2,000 context length ) ml.g5.4xlarge, ml.g5.12xlarge, ml.g5.24xlarge, ml.g5.48xlarge, ml.g5.4xlarge, ml.g5.8xlarge, ml.g6.12xlarge, ml.p4d.24xlarge, ml.p5.48xlarge
Meta-Llama-3.1-8B-Instruct meta-llama-3-1-8b-instruct ml.g5.4xlarge (2,000 context length ) Same as Llama-3.1-8B
Meta-Llama-3.1-70B meta-llama-3-1-70b ml.p4d.24xlarge (12,000 context length on 8 A100s) ml.g5.48xlarge, ml.g6.48xlarge, ml.p4d.24xlarge, ml.p5.48xlarge
Meta-Llama-3.1-70B-Instruct meta-llama-3-1-70b-instruct ml.p4d.24xlarge (12,000 context length on 8 A100s) Same as Llama-3.1-70B
Meta-Llama-3.1-405B meta-llama-3-1-405b ml.p5.48xlarge 2x ml.p5.48xlarge
Meta-Llama-3.1-405B-Instruct meta-llama-3-1-405b-instruct ml.p5.48xlarge 2x ml.p5.48xlarge
Meta-Llama-3.1-405B-FP8 meta-llama-3-1-405b-fp8 ml.p5.48xlarge (8,000 context length on 8 H100s) ml.p5.48xlarge
Meta-Llama-3.1-405B-Instruct-FP8 meta-llama-3-1-405-instruct-fp8 ml.p5.48xlarge (8,000 context length on 8 H100s) ml.p5.48xlarge

Inference and example prompts for Llama-3.1-405B-Instruct

You can use Llama models for text completion for any piece of text. Through text generation, you can perform a variety of tasks, such as question answering, language translation, sentiment analysis, and more. Input payload to the endpoint looks like the following code:

<|begin_of_text|><|start_header_id|>[ROLE_1]<|end_header_id|>

[ROLE_1_MESSAGE]<|eot_id|><|begin_of_text|><|start_header_id|>[ROLE_2]<|end_header_id|>

[ROLE_2_MESSAGE]<|eot_id|><|begin_of_text|><|start_header_id|>[ROLE_3]<|end_header_id|>

...

The roles should alternate between user and assistant while optionally starting with a system role.

In the next example, we show how to use Llama Instruct models within a conversational context, where a multi-turn chat is occurring between a user and an assistant. The first few rounds of the conversation are provided as input to the model:

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

I am going to Paris, what should I see?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris: 1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city. 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa. 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows. These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.<|eot_id|><|start_header_id|>user<|end_header_id|> What is so great about #1?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

This produces the following response:

The Eiffel Tower! It's an iconic symbol of Paris and one of the most recognizable landmarks in the world. Here are some reasons why it's so great:

1. **Engineering marvel**: When it was built for the 1889 World's Fair, the Eiffel Tower was a groundbreaking feat of engineering. It was the tallest structure in the world at the time, standing at 324 meters (1,063 feet).
2. **Breathtaking views**: The Eiffel Tower offers stunning views of the city from its observation decks. On a clear day, you can see for miles in every direction, taking in the beautiful Parisian landscape.
3. **Romantic atmosphere**: The Eiffel Tower is often associated with romance and love. It's a popular spot for proposals, weddings, and romantic getaways.
4. **Historical significance**: The Eiffel Tower has played a significant role in history, serving as a communication tower during World War I and a symbol of French resistance during World War II.
5. **Iconic design**: The Eiffel Tower's lattice-like design is instantly recognizable and has been imitated and parodied countless times in art, architecture, and popular culture.

Llama Guard

You can also use the Llama Guard model to help add guardrails for these models. Llama Guard provides input and output guardrails for LLM deployments. Llama Guard is a publicly available model that performs competitively on common open benchmarks and provides developers with a pre-trained model to help defend against generating potentially risky outputs. This model has been trained on a mix of publicly available datasets to enable detection of common types of potentially risky or violating content that may be relevant to a number of developer use cases.

You can use Llama Guard as a supplemental tool for developers to integrate into their own mitigation strategies, such as for chatbots, content moderation, customer service, social media monitoring, and education. By passing user-generated content through Llama Guard before publishing or responding to it, developers can flag unsafe or inappropriate language and take action to maintain a safe and respectful environment. Llama Guard is available on SageMaker JumpStart.

Conclusion

In this post, we explored how SageMaker JumpStart empowers data scientists and ML engineers to discover, access, and run a wide range of pre-trained FMs for inference, including Meta’s most advanced and capable models to date. Llama 3.1 models are available today in SageMaker JumpStart initially in the US East (N. Virginia), US East (Ohio), and US West (Oregon) AWS Regions. Get started with SageMaker JumpStart and Llama 3.1 models today.

Resources

For additional resources, refer to the following:


About the Authors

Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.

James Park is a Solutions Architect at Amazon Web Services. He works with Amazon.com to design, build, and deploy technology solutions on AWS, and has a particular interest in AI and machine learning. In his spare time he enjoys seeking out new cultures, new experiences, and staying up to date with the latest technology trends.You can find him on LinkedIn.

Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker built-in algorithms team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.

Jonathan Guinegagne is a Senior Software Engineer with Amazon SageMaker JumpStart at AWS. He got his master’s degree from Columbia University. His interests span machine learning, distributed systems, and cloud computing, as well as democratizing the use of AI. Jonathan is originally from France and now lives in Brooklyn, NY.

Christopher Whitten is a software developer on the JumpStart team. He helps scale model selection and integrate models with other SageMaker services. Chris is passionate about accelerating the ubiquity of AI across a variety of business domains.

Read More

Intelligent document processing using Amazon Bedrock and Anthropic Claude

Intelligent document processing using Amazon Bedrock and Anthropic Claude

Generative artificial intelligence (AI) not only empowers innovation through ideation, content creation, and enhanced customer service, but also streamlines operations and boosts productivity across various domains. To effectively harness this transformative technology, Amazon Bedrock offers a fully managed service that integrates high-performing foundation models (FMs) from leading AI companies, such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, Mistral AI, and Amazon. By providing access to these advanced models through a single API and supporting the development of generative AI applications with an emphasis on security, privacy, and responsible AI, Amazon Bedrock enables you to use AI to explore new avenues for innovation and improve overall offerings.

Enterprise customers can unlock significant value by harnessing the power of intelligent document processing (IDP) augmented with generative AI. By infusing IDP solutions with generative AI capabilities, organizations can revolutionize their document processing workflows, achieving exceptional levels of automation and reliability. This combination enables advanced document understanding, highly effective structured data extraction, automated document classification, and seamless information retrieval from unstructured text. With these capabilities, organizations can achieve scalable, efficient, and high-value document processing that drives business transformation and competitiveness, ultimately leading to improved productivity, reduced costs, and enhanced decision-making.

In this post, we show how to develop an IDP solution using Anthropic Claude 3 Sonnet on Amazon Bedrock. We demonstrate how to extract data from a scanned document and insert it into a database.

The Anthropic Claude 3 Sonnet model is optimized for speed and efficiency, making it an excellent choice for intelligent tasks—particularly for enterprise workloads. It also possesses sophisticated vision capabilities, demonstrating a strong aptitude for understanding a wide range of visual formats, including photos, charts, graphs, and technical diagrams. Although we demonstrate this solution using the Anthropic Claude 3 Sonnet model, you can alternatively use the Haiku and Opus models if your use case requires them.

Solution overview

The proposed solution uses Amazon Bedrock and the powerful Anthropic Claude 3 Sonnet model to enable IDP capabilities. The architecture consists of several AWS services seamlessly integrated with the Amazon Bedrock, enabling efficient and accurate extraction of data from scanned documents.

The following diagram illustrates our solution architecture.

The solution consists of the following steps:

  1. The process begins with scanned documents being uploaded and stored in an Amazon Simple Storage Service (Amazon S3) bucket, which invokes an S3 Event Notification on object upload.
  2. This event invokes an AWS Lambda function, responsible for invoking the Anthropic Claude 3 Sonnet model on Amazon Bedrock.
  3. The Anthropic Claude 3 Sonnet model, with its advanced multimodal capabilities, processes the scanned documents and extracts relevant data in a structured JSON format.
  4. The extracted data from the Anthropic Claude 3 model is sent to an Amazon Simple Queue Service (Amazon SQS) queue. Amazon SQS acts as a buffer, allowing components to send and receive messages reliably without being directly coupled, providing scalability and fault tolerance in the system.
  5. Another Lambda function consumes the messages from the SQS queue, parses the JSON data, and stores the extracted key-value pairs in an Amazon DynamoDB table for retrieval and further processing.

This serverless architecture takes advantage of the scalability and cost-effectiveness of AWS services while harnessing the cutting-edge intelligence of Anthropic Claude 3 Sonnet. By combining the robust infrastructure of AWS with Anthropic’s FMs, this solution enables organizations to streamline their document processing workflows, extract valuable insights, and enhance overall operational efficiency.

The solution uses the following services and features:

  • Amazon Bedrock is a fully managed service that provides access to large language models (LLMs), allowing developers to build and deploy their own customized AI applications.
  • The Anthropic Claude 3 family offers a versatile range of models tailored to meet diverse needs. With three options—Opus, Sonnet, and Haiku—you can choose the perfect balance of intelligence, speed, and cost. These models excel at understanding complex enterprise content, including charts, graphs, technical diagrams, and reports.
  • Amazon DynamoDB is a fully managed, serverless, NoSQL database service.
  • AWS Lambda is a serverless computing service that allows you to run code without provisioning or managing servers.
  • Amazon SQS is a fully managed message queuing service.
  • Amazon S3 is a highly scalable, durable, and secure object storage service.

In this solution, we use the generative AI capabilities in Amazon Bedrock to efficiently extract data. As of writing of this post, Anthropic Claude 3 Sonnet only accepts images as input. The supported file types are GIF, JPEG, PNG, and WebP. You can choose to save images during the scanning process or convert the PDF to images.

You can also enhance this solution by implementing human-in-the-loop and model evaluation features. The goal of this post is to demonstrate how you can build an IDP solution using Amazon Bedrock, but to use this as a production-scale solution, additional considerations should be taken into account, such as testing for edge case scenarios, better exception handling, trying additional prompting techniques, model fine-tuning, model evaluation, throughput requirements, number of concurrent requests to be supported, and carefully considering cost and latency implications.

Prerequisites

You need the following prerequisites before you can proceed with this solution. For this post, we use the us-east-1 AWS Region. For details on available Regions, see Amazon Bedrock endpoints and quotas.

Use case and dataset

For our example use case, let’s look at a state agency responsible for issuing birth certificates. The agency may receive birth certificate applications through various methods, such as online applications, forms completed at a physical location, and mailed-in completed paper applications. Today, most agencies spend a considerable amount of time and resources to manually extract the application details. The process begins with scanning the application forms, manually extracting the details, and then entering them into an application that eventually stores the data into a database. This process is time-consuming, inefficient, not scalable, and error-prone. Additionally, it adds complexity if the application form is in a different language (such as Spanish).

For this demonstration, we use sample scanned images of birth certificate application forms. These forms don’t contain any real personal data. Two examples are provided: one in English (handwritten) and another in Spanish (printed). Save these images as .jpeg files to your computer. You need them later for testing the solution.

Create an S3 bucket

On the Amazon S3 console, create a new bucket with a unique name (for example, bedrock-claude3-idp-{random characters to make it globally unique}) and leave the other settings as default. Within the bucket, create a folder named images and a sub-folder named birth_certificates.

Create an SQS queue

On the Amazon SQS console, create a queue with the Standard queue type, provide a name (for example, bedrock-idp-extracted-data), and leave the other settings as default.

Create a Lambda function to invoke the Amazon Bedrock model

On the Lambda console, create a function (for example, invoke_bedrock_claude3), choose Python 3.12 for the runtime, and leave the remaining settings as default. Later, you configure this function to be invoked every time a new image is uploaded into the S3 bucket. You can download the entire Lambda function code from invoke_bedrock_claude3.py. Replace the contents of the lambda_function.py file with the code from the downloaded file. Make sure to substitute {SQS URL} with the URL of the SQS queue you created earlier, then choose Deploy.

The Lambda function should perform the following actions:

s3 = boto3.client('s3')
sqs = boto3.client('sqs')
bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
QUEUE_URL = {SQS URL}
MODEL_ID = "anthropic.claude-3-sonnet-20240229-v1:0"

The following code gets the image from the S3 bucket using the get_object method and converts it to base64 data:

image_data = s3.get_object(Bucket=bucket_name, Key=object_key)['Body'].read()
base64_image = base64.b64encode(image_data).decode('utf-8')

Prompt engineering is a critical factor in unlocking the full potential of generative AI applications like IDP. Crafting well-structured prompts makes sure that the AI system’s outputs are accurate, relevant, and aligned with your objectives, while mitigating potential risks.

With the Anthropic Claude 3 model integrated into the Amazon Bedrock IDP solution, you can use the model’s impressive visual understanding capabilities to effortlessly extract data from documents. Simply provide the image or document as input, and Anthropic Claude 3 will comprehend its contents, seamlessly extracting the desired information and presenting it in a human-readable format. All Anthropic Claude 3 models are capable of understanding non-English languages such as Spanish, Japanese, and French. In this particular use case, we demonstrate how to translate Spanish application forms into English by providing the appropriate prompt instructions.

However, LLMs like Anthropic Claude 3 can exhibit variability in their response formats. To achieve consistent and structured output, you can tailor your prompts to instruct the model to return the extracted data in a specific format, such as JSON with predefined keys. This approach enhances the interoperability of the model’s output with downstream applications and streamlines data processing workflows.

The following is the prompt with the specific JSON output format:

prompt = """
This image shows a birth certificate application form. 
Please precisely copy all the relevant information from the form.
Leave the field blank if there is no information in corresponding field.
If the image is not a birth certificate application form, simply return an empty JSON object. 
If the application form is not filled, leave the fees attributes blank. 
Translate any non-English text to English. 
Organize and return the extracted data in a JSON format with the following keys:
{
    "applicantDetails":{
        "applicantName": "",
        "dayPhoneNumber": "",
        "address": "",
        "city": "",
        "state": "",
        "zipCode": "",
        "email":""
    },
    "mailingAddress":{
        "mailingAddressApplicantName": "",
        "mailingAddress": "",
        "mailingAddressCity": "",
        "mailingAddressState": "",
        "mailingAddressZipCode": ""
    },
    "relationToApplicant":[""],
    "purposeOfRequest": "",
    
    "BirthCertificateDetails":
    {
        "nameOnBirthCertificate": "",
        "dateOfBirth": "",
        "sex": "",
        "cityOfBirth": "",
        "countyOfBirth": "",
        "mothersMaidenName": "",
        "fathersName": "",
        "mothersPlaceOfBirth": "",
        "fathersPlaceOfBirth": "",
        "parentsMarriedAtBirth": "",
        "numberOfChildrenBornInSCToMother": "",
        "diffNameAtBirth":""
    },
    "fees":{
        "searchFee": "",
        "eachAdditionalCopy": "",
        "expediteFee": "",
        "totalFees": ""
    } 
  }
""" 

Invoke the Anthropic Claude 3 Sonnet model using the Amazon Bedrock API. Pass the prompt and the base64 image data as parameters:

def invoke_claude_3_multimodal(prompt, base64_image_data):
    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": prompt,
                    },
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": "image/png",
                            "data": base64_image_data,
                        },
                    },
                ],
            }
        ],
    }

    try:
        response = bedrock.invoke_model(modelId=MODEL_ID, body=json.dumps(request_body))
        return json.loads(response['body'].read())
    except bedrock.exceptions.ClientError as err:
        print(f"Couldn't invoke Claude 3 Sonnet. Here's why: {err.response['Error']['Code']}: {err.response['Error']['Message']}")
        raise

Send the Amazon Bedrock API response to the SQS queue using the send_message method:

def send_message_to_sqs(message_body):
    try:
        sqs.send_message(QueueUrl=QUEUE_URL, MessageBody=json.dumps(message_body))
    except sqs.exceptions.ClientError as e:
        print(f"Error sending message to SQS: {e.response['Error']['Code']}: {e.response['Error']['Message']}")

Next, modify the IAM role of the Lambda function to grant the required permissions:

  1. On the Lambda console, navigate to the function.
  2. On the Configuration tab, choose Permissions in the left pane.
  3. Choose the IAM role (for example, invoke_bedrock_claude3-role-{random chars}).

This will open the role on a new tab.

  1. In the Permissions policies section, choose Add permissions and Create inline policy.
  2. On the Create policy page, switch to the JSON tab in the policy editor.
  3. Enter the policy from the following code block, replacing {AWS Account ID} with your AWS account ID and {S3 Bucket Name} with your S3 bucket name.
  4. Choose Next.
  5. Enter a name for the policy (for example, invoke_bedrock_claude3-role-policy), and choose Create policy.
{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": "bedrock:InvokeModel",
        "Resource": "arn:aws:bedrock:us-east-1::foundation-model/*"
    }, {
        "Effect": "Allow",
        "Action": "s3:GetObject",
        "Resource": "arn:aws:s3:::{S3 Bucket Name}/*"
    }, {
        "Effect": "Allow",
        "Action": "sqs:SendMessage",
        "Resource": "arn:aws:sqs:us-east-1:{AWS Account ID}:bedrock-idp-extracted-data"
    }]
}

The policy will grant the following permissions:

  • Invoke model access to Amazon Bedrock FMs
  • Retrieve objects from the bedrock-claude3-idp... S3 bucket
  • Send messages to the bedrock-idp-extracted-data SQS queue for processing the extracted data

Additionally, modify the Lambda function’s timeout to 2 minutes. By default, it’s set to 3 seconds.

Create an S3 Event Notification

To create an S3 Event Notification, complete the following steps:

  1. On the Amazon S3 console, open the bedrock-claude3-idp... S3 bucket.
  2. Navigate to Properties, and in the Event notifications section, create an event notification.
  3. Enter a name for Event name (for example, bedrock-claude3-idp-event-notification).
  4. Enter images/birth_certificates/ for the prefix.
  5. For Event Type, select Put in the Object creation section.
  6. For Destination, select Lambda function and choose invoke_bedrock_claude3.
  7. Choose Save changes.

Create a DynamoDB table

To store the extracted data in DynamoDB, you need to create a table. On the DynamoDB console, create a table called birth_certificates with Id as the partition key, and keep the remaining settings as default.

Create a Lambda function to insert records into the DynamoDB table

On the Lambda console, create a Lambda function (for example, insert_into_dynamodb), choose Python 3.12 for the runtime, and leave the remaining settings as default. You can download the entire Lambda function code from insert_into_dynamodb.py. Replace the contents of the lambda_function.py file with the code from the downloaded file and choose Deploy.

The Lambda function should perform the following actions:

Get the message from the SQS queue that contains the response from the Anthropic Claude 3 Sonnet model:

data = json.loads(event['Records'][0]['body'])['content'][0]['text']
event_id = event['Records'][0]['messageId']
data = json.loads(data)

Create objects representing DynamoDB and its table:

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('birth_certificates')

Get the key objects from the JSON data:

applicant_details = data.get('applicantDetails', {})
    mailing_address = data.get('mailingAddress', {})
    relation_to_applicant = data.get('relationToApplicant', [])
    birth_certificate_details = data.get('BirthCertificateDetails', {})
    fees = data.get('fees', {})

Insert the extracted data into DynamoDB table using put_item() method:

table.put_item(Item={
'Id': event_id,
'applicantName': applicant_details.get('applicantName', ''),
'dayPhoneNumber': applicant_details.get('dayPhoneNumber', ''),
'address': applicant_details.get('address', ''),
'city': applicant_details.get('city', ''),
'state': applicant_details.get('state', ''),
'zipCode': applicant_details.get('zipCode', ''),
'email': applicant_details.get('email', ''),
'mailingAddressApplicantName': mailing_address.get('mailingAddressApplicantName', ''),
'mailingAddress': mailing_address.get('mailingAddress', ''),
'mailingAddressCity': mailing_address.get('mailingAddressCity', ''),
'mailingAddressState': mailing_address.get('mailingAddressState', ''),
'mailingAddressZipCode': mailing_address.get('mailingAddressZipCode', ''),
'relationToApplicant': ', '.join(relation_to_applicant),
'purposeOfRequest': data.get('purposeOfRequest', ''),
'nameOnBirthCertificate': birth_certificate_details.get('nameOnBirthCertificate', ''),
'dateOfBirth': birth_certificate_details.get('dateOfBirth', ''),
'sex': birth_certificate_details.get('sex', ''),
'cityOfBirth': birth_certificate_details.get('cityOfBirth', ''),
'countyOfBirth': birth_certificate_details.get('countyOfBirth', ''),
'mothersMaidenName': birth_certificate_details.get('mothersMaidenName', ''),
'fathersName': birth_certificate_details.get('fathersName', ''),
'mothersPlaceOfBirth': birth_certificate_details.get('mothersPlaceOfBirth', ''),
'fathersPlaceOfBirth': birth_certificate_details.get('fathersPlaceOfBirth', ''),
'parentsMarriedAtBirth': birth_certificate_details.get('parentsMarriedAtBirth', ''),
'numberOfChildrenBornInSCToMother': birth_certificate_details.get('numberOfChildrenBornInSCToMother', ''),
'diffNameAtBirth': birth_certificate_details.get('diffNameAtBirth', ''),
'searchFee': fees.get('searchFee', ''),
'eachAdditionalCopy': fees.get('eachAdditionalCopy', ''),
'expediteFee': fees.get('expediteFee', ''),
'totalFees': fees.get('totalFees', '')
})

Next, modify the IAM role of the Lambda function to grant the required permissions. Follow the same steps you used to modify the permissions for the invoke_bedrock_claude3 Lambda function, but enter the following JSON as the inline policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "dynamodb:PutItem",
            "Resource": "arn:aws:dynamodb:us-east-1::{AWS Account ID}:table/birth_certificates"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "sqs:DeleteMessage",
                "sqs:ReceiveMessage",
                "sqs:GetQueueAttributes"
            ],
            "Resource": "arn:aws:sqs:us-east-1::{AWS Account ID}:bedrock-idp-extracted-data"
        }
    ]
}

Enter a policy name (for example, insert_into_dynamodb-role-policy) and choose Create policy.

The policy will grant the following permissions:

  • Put records into the DynamoDB table
  • Read and delete messages from the SQS queue

Configure the Lambda function trigger for SQS

Complete the following steps to create a trigger for the Lambda function:

  1. On the Amazon SQS console, open the bedrock-idp-extracted-data queue.
  2. On the Lambda triggers tab, choose Configure Lambda function trigger.
  3. Select the insert_into_dynamodb Lambda function and choose Save.

Test the solution

Now that you have created all the necessary resources, permissions, and code, it’s time to test the solution.

In the S3 folder birth_certificates, upload the two scanned images that you downloaded earlier. Then open the DynamoDB console and explore the items in the birth_certificates table.

If everything is configured properly, you should see two items in DynamoDB in just a few seconds, as shown in the following screenshots. For the Spanish form, Anthropic Claude 3 automatically translated the keys and labels from Spanish to English based on the prompt.

Troubleshooting

If you don’t see the extracted data in the DynamoDB table, you can investigate the issue:

  • Check CloudWatch logs – Review the Amazon CloudWatch log streams of the Lambda functions involved in the data extraction and ingestion process. Look for any error messages or exceptions that may indicate the root cause of the issue.
  • Identify missing permissions – In many cases, errors can occur due to missing permissions. Confirm that the Lambda functions have the necessary permissions to access the required AWS resources, such as DynamoDB tables, S3 buckets, or other services involved in the solution.
  • Implement a dead-letter queue – In a production-scale solution, it is recommended to implement a dead letter queue (DLQ) to catch and handle any events or messages that fail to process or encounter errors.

Clean up

Clean up the resources created as part of this post to avoid incurring ongoing charges:

  1. Delete all the objects from the bedrock-claude3-idp... S3 bucket, then delete the bucket.
  2. Delete the two Lambda functions named invoke_bedrock_claude3 and insert_into_dynamodb.
  3. Delete the SQS queue named bedrock-idp-extracted-data.
  4. Delete the DynamoDB table named birth_certificates.

Example use cases and business value

The generative AI-powered IDP solution demonstrated in this post can benefit organizations across various industries, such as:

  • Government and public sector – Process and extract data from citizen applications, immigration documents, legal contracts, and other government-related forms, enabling faster turnaround times and improved service delivery
  • Healthcare – Extract and organize patient information, medical records, insurance claims, and other health-related documents, improving data accuracy and accessibility for better patient care
  • Finance and banking – Automate the extraction and processing of financial documents, loan applications, tax forms, and regulatory filings, reducing manual effort and increasing operational efficiency
  • Logistics and supply chain – Extract and organize data from shipping documents, invoices, purchase orders, and inventory records, streamlining operations and enhancing supply chain visibility
  • Retail and ecommerce – Automate the extraction and processing of customer orders, product catalogs, and marketing materials, enabling personalized experiences and efficient order fulfillment

By using the power of generative AI and Amazon Bedrock, organizations can unlock the true potential of their data, driving operational excellence, enhancing customer experiences, and fostering continuous innovation.

Conclusion

In this post, we demonstrated how to use Amazon Bedrock and the powerful Anthropic Claude 3 Sonnet model to develop an IDP solution. By harnessing the advanced multimodal capabilities of Anthropic Claude 3, we were able to accurately extract data from scanned documents and store it in a structured format in a DynamoDB table.

Although this solution showcases the potential of generative AI in IDP, it may not be suitable for all IDP use cases. The effectiveness of the solution may vary depending on the complexity and quality of the documents, the amount of training data available, and the specific requirements of the organization.

To further enhance the solution, consider implementing a human-in-the-loop workflow to review and validate the extracted data, especially for mission-critical or sensitive applications. This will provide data accuracy and compliance with regulatory requirements. You can also explore the model evaluation feature in Amazon Bedrock to compare model outputs, and then choose the model best suited for your downstream generative AI applications.

For further exploration and learning, we recommend checking out the following resources:


About the Authors

Govind Palanisamy is a Solutions Architect at AWS, where he helps government agencies migrate and modernize their workloads to increase citizen experience. He is passionate about technology and transformation, and he helps customers transform their businesses using AI/ML and generative AI-based solutions.

Bharath Gunapati is a Sr. Solutions architect at AWS, where he helps clinicians, researchers, and staff at academic medical centers to adopt and use cloud technologies. He is passionate about technology and the impact it can make on healthcare and research.

Read More

Metadata filtering for tabular data with Knowledge Bases for Amazon Bedrock

Metadata filtering for tabular data with Knowledge Bases for Amazon Bedrock

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API. To equip FMs with up-to-date and proprietary information, organizations use Retrieval Augmented Generation (RAG), a technique that fetches data from company data sources and enriches the prompt to provide more relevant and accurate responses. Knowledge Bases for Amazon Bedrock is a fully managed capability that helps you implement the entire RAG workflow, from ingestion to retrieval and prompt augmentation. However, information about one dataset can be in another dataset, called metadata. Without using metadata, your retrieval process can cause the retrieval of unrelated results, thereby decreasing FM accuracy and increasing cost in the FM prompt token.

On March 27, 2024, Amazon Bedrock announced a key new feature called metadata filtering and also changed the default engine. This change allows you to use metadata fields during the retrieval process. However, the metadata fields need to be configured during the knowledge base ingestion process. Often, you might have tabular data where details about one field are available in another field. Also, you could have a requirement to cite the exact text document or text field to prevent hallucination. In this post, we show you how to use the new metadata filtering feature with Knowledge Bases for Amazon Bedrock for such tabular data.

Solution overview

The solution consists of the following high-level steps:

  1. Prepare data for metadata filtering.
  2. Create and ingest data and metadata into the knowledge base.
  3. Retrieve data from the knowledge base using metadata filtering.

Prepare data for metadata filtering

As of this writing, Knowledge Bases for Amazon Bedrock supports Amazon OpenSearch Serverless, Amazon Aurora, Pinecone, Redis Enterprise, and MongoDB Atlas as underlying vector store providers. In this post, we create and access an OpenSearch Serverless vector store using the Amazon Bedrock Boto3 SDK. For more details, see Set up a vector index for your knowledge base in a supported vector store.

For this post, we create a knowledge base using the public dataset Food.com – Recipes and Reviews. The following screenshot shows an example of the dataset.

The TotalTime is in ISO 8601 format. You can convert that to minutes using the following logic:

# Function to convert ISO 8601 duration to minutes
def convert_to_minutes(duration):
    hours = 0
    minutes = 0
    
    # Find hours and minutes using regex
    match = re.match(r'PT(?:(d+)H)?(?:(d+)M)?', duration)
    
    if match:
        if match.group(1):
            hours = int(match.group(1))
        if match.group(2):
            minutes = int(match.group(2))
    
    # Convert total time to minutes
    total_minutes = hours * 60 + minutes
    return total_minutes

df['TotalTimeInMinutes'] = df['TotalTime'].apply(convert_to_minutes)

After converting some of the features like CholesterolContent, SugarContent, and RecipeInstructions, the data frame looks like the following screenshot.

To enable the FM to point to a specific menu with a link (cite the document), we split each row of the tabular data in a single text file, with each file containing RecipeInstructions as the data field and TotalTimeInMinutes, CholesterolContent, and SugarContent as metadata. The metadata should be kept in a separate JSON file with the same name as the data file and .metadata.json added to its name. For example, if the data file name is 100.txt, the metadata file name should be 100.txt.metadata.json. For more details, see Add metadata to your files to allow for filtering. Also, the content in the metadata file should be in the following format:

{
"metadataAttributes": {
"${attribute1}": "${value1}",
"${attribute2}": "${value2}",
...
}
}

For the sake of simplicity, we only process the top 2,000 rows to create the knowledge base.

  1. After you import the necessary libraries, create a local directory using the following Python code:
    import pandas as pd
    import os, json, tqdm, boto3
    
    metafolder = 'multi_file_recipe_data'os.mkdir(metafolder)

  2. Iterate over the top 2,000 rows to create data and metadata files to store in the local folder:
    for i in tqdm.trange(2000):
        desc = str(df['RecipeInstructions'][i])
        meta = {
        "metadataAttributes": {
            "Name": str(df['Name'][i]),
            "TotalTimeInMinutes": str(df['TotalTimeInMinutes'][i]),
            "CholesterolContent": str(df['CholesterolContent'][i]),
            "SugarContent": str(df['SugarContent'][i]),
        }
        }
        filename = metafolder+'/' + str(i+1)+ '.txt'
        f = open(filename, 'w')
        f.write(desc)
        f.close()
        metafilename = filename+'.metadata.json'
        with open( metafilename, 'w') as f:
            json.dump(meta, f)
    

  3. Create an Amazon Simple Storage Service (Amazon S3) bucket named food-kb and upload the files:
    # Upload data to s3
    s3_client = boto3.client("s3")
    bucket_name = "recipe-kb"
    data_root = metafolder+'/'
    def uploadDirectory(path,bucket_name):
        for root,dirs,files in os.walk(path):
            for file in tqdm.tqdm(files):
                s3_client.upload_file(os.path.join(root,file),bucket_name,file)
    
    uploadDirectory(data_root, bucket_name)

Create and ingest data and metadata into the knowledge base

When the S3 folder is ready, you can create the knowledge base on the Amazon Bedrock console using the SDK according to this example notebook.

Retrieve data from the knowledge base using metadata filtering

Now let’s retrieve some data from the knowledge base. For this post, we use Anthropic Claude Sonnet on Amazon Bedrock for our FM, but you can choose from a variety of Amazon Bedrock models. First, you need to set the following variables, where kb_id is the ID of your knowledge base. The knowledge base ID can be found programmatically, as shown in the example notebook, or from the Amazon Bedrock console by navigating to the individual knowledge base, as shown in the following screenshot.

Set the required Amazon Bedrock parameters using the following code:

import boto3
import pprint
from botocore.client import Config
import json

pp = pprint.PrettyPrinter(indent=2)
session = boto3.session.Session()
region = session.region_name
bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.client('bedrock-runtime', region_name = region)
bedrock_agent_client = boto3.client("bedrock-agent-runtime",
                              config=bedrock_config, region_name = region)
kb_id = "EIBBXVFDQP"
model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'

# retrieve api for fetching only the relevant context.

query = " Tell me a recipe that I can make under 30 minutes and has cholesterol less than 10 "

relevant_documents = bedrock_agent_runtime_client.retrieve(
    retrievalQuery= {
        'text': query
    },
    knowledgeBaseId=kb_id,
    retrievalConfiguration= {
        'vectorSearchConfiguration': {
            'numberOfResults': 2 
        }
    }
)
pp.pprint(relevant_documents["retrievalResults"])

The following code is the output of the retrieval from the knowledge base without metadata filtering for the query “Tell me a recipe that I can make under 30 minutes and has cholesterol less than 10.” As we can see, out of the two recipes, the preparation durations are 30 and 480 minutes, respectively, and the cholesterol contents are 86 and 112.4, respectively. Therefore, the retrieval isn’t following the query accurately.

The following code demonstrates how to use the Retrieve API with the metadata filters set to a cholesterol content less than 10 and minutes of preparation less than 30 for the same query:

def retrieve(query, kbId, numberOfResults=5):
    return bedrock_agent_client.retrieve(
        retrievalQuery= {
            'text': query
        },
        knowledgeBaseId=kbId,
        retrievalConfiguration= {
            'vectorSearchConfiguration': {
                'numberOfResults': numberOfResults,
                 "filter": {
                            'andAll':[
                                {
                                "lessThan": {
                                "key": "CholesterolContent",
                                "value": 10
                                }
                            },
                                {
                            "lessThan": {
                                "key": "TotalTimeInMinutes",
                                "value": 30
                            }
                                }
                            ]
                        }
            }
        }
    ) 
query = "Tell me a recipe that I can make under 30 minutes and has cholesterol less than 10" 
response = retrieve(query, kb_id, 2)
retrievalResults = response['retrievalResults']
pp.pprint(retrievalResults)

As we can see in the following results, out of the two  recipes, the preparation times are 27 and 20, respectively, and the cholesterol contents are 0 and 0, respectively. With the use of metadata filtering, we get more accurate results.

The following code shows how to get accurate output using the same metadata filtering with the retrieve_and_generate API. First, we set the prompt, then we set up the API with metadata filtering:

prompt = f"""
Human: You have great knowledge about food, so provide answers to questions by using fact. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Assistant:"""

def retrieve_and_generate(query, kb_id,modelId, numberOfResults=10):
    return bedrock_agent_client.retrieve_and_generate(
        input= {
            'text': query,
        },
        retrieveAndGenerateConfiguration={
        'knowledgeBaseConfiguration': {
            'generationConfiguration': {
                'promptTemplate': {
                    'textPromptTemplate': f"{prompt} $search_results$"
                }
            },
            'knowledgeBaseId': kb_id,
            'modelArn': model_id,
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': numberOfResults,
                    'overrideSearchType': 'HYBRID',
                     "filter": {
                            'andAll':[
                                {
                                "lessThan": {
                                "key": "CholesterolContent",
                                "value": 10
                                }
                            },
                                {
                            "lessThan": {
                                "key": "TotalTimeInMinutes",
                                "value": 30
                            }
                                }
                            ]
                        },
                }
        }
                    },
        'type': 'KNOWLEDGE_BASE'
    }
    )
    
query = "Tell me a recipe that I can make under 30 minutes and has cholesterol less than 10"
response = retrieve_and_generate(query, kb_id,modelId, numberOfResults=10)
pp.pprint(response['output']['text'])

As we can see in the following output, the model returns a detailed recipe that follows the instructed metadata filtering of less than 30 minutes of preparation time and a cholesterol content less than 10.

Clean up

Make sure to comment the following section if you’re planning to use the knowledge base that you created for building your RAG application. If you only wanted to try out creating the knowledge base using the SDK, make sure to delete all the resources that were created because you will incur costs for storing documents in the OpenSearch Serverless index. See the following code:

bedrock_agent_client.delete_data_source(dataSourceId = ds["dataSourceId"], knowledgeBaseId=kb['knowledgeBaseId'])
bedrock_agent_client.delete_knowledge_base(knowledgeBaseId=kb['knowledgeBaseId'])
oss_client.indices.delete(index=index_name)
aoss_client.delete_collection(id=collection_id)
aoss_client.delete_access_policy(type="data", name=access_policy['accessPolicyDetail']['name'])
aoss_client.delete_security_policy(type="network", name=network_policy['securityPolicyDetail']['name'])
aoss_client.delete_security_policy(type="encryption", name=encryption_policy['securityPolicyDetail']['name'])
# Delete roles and polices 
iam_client.delete_role(RoleName=bedrock_kb_execution_role)
iam_client.delete_policy(PolicyArn=policy_arn)

Conclusion

In this post, we explained how to split a large tabular dataset into rows to set up a knowledge base with metadata for each of those records, and how to then retrieve outputs with metadata filtering. We also showed how retrieving results with metadata is more accurate than retrieving results without metadata filtering. Lastly, we showed how to use the result with an FM to get accurate results.

To further explore the capabilities of Knowledge Bases for Amazon Bedrock, refer to the following resources:


About the Author

Tanay Chowdhury is a Data Scientist at Generative AI Innovation Center at Amazon Web Services. He helps customers to solve their business problem using Generative AI and Machine Learning.

Read More

Secure AccountantAI Chatbot: Lili’s journey with Amazon Bedrock

Secure AccountantAI Chatbot: Lili’s journey with Amazon Bedrock

This post was written in collaboration with Liran Zelkha and Eyal Solnik from Lili.

Small business proprietors tend to prioritize the operational aspects of their enterprises over administrative tasks, such as maintaining financial records and accounting. While hiring a professional accountant can provide valuable guidance and expertise, it can be cost-prohibitive for many small businesses. Moreover, the availability of accountants might not always align with the immediate needs of business owners, leaving them with unanswered questions or delayed decision-making processes.

In the rapidly evolving world of large language models (LLMs) and generative artificial intelligence (AI), Lili recognized an opportunity to use this technology to address the financial advisory needs of their small business customers. Using Anthropic’s Claude 3 Haiku on Amazon Bedrock, Lili developed an intelligent AccountantAI chatbot capable of providing on-demand accounting advice tailored to each customer’s financial history and unique business requirements. The AccountantAI chatbot serves as a virtual assistant, offering affordable and readily available financial guidance, empowering small business owners to focus on their core expertise while ensuring the financial health of their operations.

About Lili

Lili is a financial platform designed specifically for businesses, offering a combination of advanced business banking with built-in accounting and tax preparation software.

By consolidating financial tools into a user-friendly interface, Lili streamlines and simplifies managing business finances and makes it an attractive solution for business owners seeking a centralized and efficient way to manage their financial operations.

In this post, we’ll explore how Lili, a financial platform designed specifically for businesses, used Amazon Bedrock to build a secure and intelligent AccountantAI chatbot for small business owners. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like Anthropic, Meta, Mistral AI, Stability AI, Cohere, AI21 Labs, and Amazon through a single API, along with a broad set of capabilities that you need to build generative AI applications with security, privacy, and responsible AI.

Solution overview

The AccountantAI chatbot provides small business owners with accurate and relevant financial accounting advice in a secure manner. To achieve this, the solution is designed to address two key requirements:

  • Question validation: Implementing guardrails to ensure that the user’s input is a valid and a legitimate financial accounting question. This step helps filter out irrelevant or inappropriate queries, maintaining the integrity of the system.
  • Context enrichment: Augmenting the user’s question with relevant contextual data, such as up-to-date accounting information and user-specific financial data. This step ensures that the chatbot’s responses are tailored to the individual user’s business and financial situation, providing more personalized and actionable advice.

To address the two key requirements of question validation and context enrichment, the AccountantAI solution employs a two-stage architecture comprising an ingestion workflow and a retrieval workflow.

Ingestion workflow

Ingestion Workflow

The ingestion workflow is an offline process that prepares the system for serving customer queries. For this stage, Lili curated a comprehensive golden collection of financial accounting questions, drawing from common inquiries as well as real-world questions from their customer base over the years. This diverse and high-quality collection serves as a reference corpus, ensuring that the chatbot can handle a wide range of relevant queries. The ingestion workflow transforms these curated questions into vector embeddings using Amazon Titan Text Embeddings model API. This process occurs over AWS PrivateLink for Amazon Bedrock, a protected and private connection in your VPC. The vector embeddings are persisted in the application in-memory vector store. These vectors will help to validate user input during the retrieval workflow.

Each curated vector embedding is paired with a matching prompt template that was evaluated during testing to be the most effective.

Example prompt template

<role>
Provides context about the agent's role as Lili's AI assistant for financial questions and outlines the general guidelines applied to all queries.
</role>

<about>
Provides details on Lili platform.
</about>

<features>
Lists out all of Lili's product features in detail. This section aims to explain Lili's features in detail, ensuring that answers are aligned with the Lili platform. For instance, when addressing questions about tax reduction management, highlight the relevant features that Lili offers, which customers should be familiar with.
</features>

<output_format>
Outlines the required formatting for the response to ensure it meets the expected structure.
</output_format>

<user_data>
Data relevant to answering the customer's question.
</user_date>

<knowledge>
Specific accounting knowledge that is relevant to the question and the model is not familiar with, such as updated data for 2024.
<knowledge>

<question>
Contains the user's actual question.
</question>

<instructions>
Provides the core instructions on how to approach answering the question appropriately and meet expectations. It also defines the steps in providing a detailed and high-quality answer.
</instructions>

<reminders>
Important guidelines to remind the agent and make sure it follows them, such as the exact format of the answer.
</reminders>

Retrieval workflow

Retrieval Workflow

Lili’s web chatbot web interface allows users to submit queries and receive real-time responses. When a customer asks a question, it’s sent to the backend system for processing.

  1. The system first converts the query into a vector embedding using the Amazon Titan Text Embeddings model API, which is accessed securely through PrivateLink.
  2. Next, the system performs a similarity search on the pre-computed embeddings of the golden collection, to find the most relevant matches for the user’s query. The system evaluates the similarity scores of the search results against a predetermined threshold. If the user’s question yields matches with low similarity scores, it’s deemed malformed or unclear, and the user is prompted to rephrase or refine their query.
  3. However, if the user’s question produces matches with high similarity scores, it’s considered a legitimate query. In this case, Lili’s backend system proceeds with further processing using the golden question that has the highest similarity score to the user’s query.
  4. Based on the golden question with the highest similarity score, the system retrieves the corresponding prompt template.

This template is augmented with up-to-date accounting information and the customer’s specific financial data from external sources such as Amazon RDS for MySQL. The resulting contextualized prompt is sent to Anthropic’s Claude 3 Haiku on Amazon Bedrock, which generates a tailored response addressing the customer’s query within their unique business context.

Because model providers continually enhance their offerings with innovative updates, Amazon Bedrock simplifies the ability to adopt emerging advancements in generative AI across multiple model providers. This approach has demonstrated its advantages right from the initial rollout of AccountantAI. Lili transitioned from Anthropic’s Claude Instant to Claude 3 within two weeks of its official release on the Amazon Bedrock environment and three weeks after its general availability.

Lili selected Anthropic’s Claude model family for AccountantAI after reviewing industry benchmarks and conducting their own quality assessment. Anthropic Claude on Amazon Bedrock consistently outperformed other models in understanding financial concepts, generating coherent natural language, and providing accurate, tailored recommendations.

After the initial release of AcountantAI, Amazon Bedrock introduced Anthropic’s Claude 3 Haiku model, which Lili evaluated against Anthropic Claude Instant version. The Anthropic Claude 3 Haiku model demonstrated significant improvements across three key evaluation metrics:

  • Quality – Anthropic Claude 3 Haiku delivered higher quality outputs, providing more detailed and better-phrased responses compared to its predecessor.
  • Response time – Anthropic Claude 3 Haiku exhibited a 10 percent to 20 percent improvement in response times over Claude Instant, offering faster performance.
  • Cost – Anthropic Claude 3 Haiku on Amazon Bedrock is the most cost-effective choice. For instance, it is up to 68 percent less costly per 1,000 input/output tokens compared to Anthropic Claude Instant, while delivering higher levels of intelligence and performance. See Anthropic’s Claude 3 models on Amazon Bedrock for more information.

For customers like Lili, this underscores the importance of having access to a fully managed service like Amazon Bedrock, which offers a choice of high-performing foundation models to meet diverse enterprise AI needs. There is no “one size fits all” model, and the ability to select from a range of cutting-edge FMs is crucial for organizations seeking to use the latest advancements in generative AI effectively and cost-efficiently.

Conclusion

The AccountantAI feature, exclusively available to Lili customers, reduces the need for hiring a professional accountant. While professional accountants can provide valuable guidance and expertise, their services can be cost-prohibitive for many small businesses. AccountantAI has already answered thousands of questions, delivering real value to businesses and providing quality responses to financial, tax, and accounting inquiries.

Using Amazon Bedrock for easy, secure, and reliable access to high-performing foundation models from leading AI companies, Lili integrates accounting knowledge at scale with each customer’s unique data. This innovative solution offers affordable expertise on optimizing cash flow, streamlining tax planning, and enabling informed decisions to drive growth. AccountantAI bridges the gap in accounting resources, democratizing access to high-quality financial intelligence for every business.

Explore Lili’s AccountantAI feature powered by Amazon Bedrock to gain affordable and accessible financial intelligence for your business today, or use Amazon Bedrock Playgrounds to experiment with running inference on different models on your data.


About the authors

Doron BleibergDoron Bleiberg is a senior AWS Startups Solution Architect helping Fintech customers in their cloud journey.

Liran ZelkhaLiran Zelkha is the co-founder and CTO at Lili, leading our development and data efforts.

Eyal SolnikEyal Solnik is the head of Data at Lili and leads our AccountantAI product.

Read More

How Mend.io unlocked hidden patterns in CVE data with Anthropic Claude on Amazon Bedrock

How Mend.io unlocked hidden patterns in CVE data with Anthropic Claude on Amazon Bedrock

This post is co-written with Maciej Mensfeld from Mend.io.

In the ever-evolving landscape of cybersecurity, the ability to effectively analyze and categorize Common Vulnerabilities and Exposures (CVEs) is crucial. This post explores how Mend.io, a cybersecurity firm, used Anthropic Claude on Amazon Bedrock to classify and identify CVEs containing specific attack requirements details. By using the power of large language models (LLMs), Mend.io streamlined the analysis of over 70,000 vulnerabilities, automating a process that would have been nearly impossible to accomplish manually. With this capability, they manage to reduce 200 days of human experts’ work. This also allows them to provide higher quality of verdicts to their customers, allowing them to prioritize vulnerabilities better. It gives Mend.io a competitive advantage. This initiative not only underscores the transformative potential of AI in cybersecurity, but also provides valuable insights into the challenges and best practices for integrating LLMs into real-world applications.

The post delves into the challenges faced, such as managing quota limitations, estimating costs, and handling unexpected model responses. We also provide insights into the model selection process, results analysis, conclusions, recommendations, and Mend.io’s future outlook on integrating artificial intelligence (AI) in cybersecurity.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Mend.io is a cybersecurity company dedicated to safeguarding digital ecosystems through innovative solutions. With a deep commitment to using cutting-edge technologies, Mend.io has been at the forefront of integrating AI and machine learning (ML) capabilities into its operations. By continuously pushing the boundaries of what’s possible, Mend.io empowers organizations to stay ahead of evolving cyber threats and maintain a proactive, intelligent approach to security.

Uncovering attack requirements in CVE data

In the cybersecurity domain, the constant influx of CVEs presents a significant challenge. Each year, thousands of new vulnerabilities are reported, with descriptions varying in clarity, completeness, and structure. These reports, often contributed by a diverse global community, can be concise, ambiguous, or lack crucial details, burying critical information such as attack requirements, potential impact, and suggested mitigation steps. The unstructured nature of CVE reports poses a significant obstacle in extracting actionable insights. Automated systems struggle to accurately parse and comprehend the inconsistent and complex narratives, increasing the risk of overlooking or misinterpreting vital details—a scenario with severe implications for security postures.

For cybersecurity professionals, one of the most daunting tasks is identifying the attack requirements—the specific conditions and prerequisites needed for a vulnerability to be successfully exploited—from these vast and highly variable natural language descriptions. Determining whether attack requirements are present or absent is equally crucial, as this information is vital for assessing and mitigating potential risks. With tens of thousands of CVE reports to analyze, manually sifting through each description to extract this nuanced information is impractical and nearly impossible, given the sheer volume of data involved

The decision to use Anthropic Claude on Amazon Bedrock and the advantages it offered

In the face of this daunting challenge, the power of LLMs offered a promising solution. These advanced generative AI models are great at understanding and analyzing vast amounts of text, making them the perfect tool for sifting through the flood of CVE reports to pinpoint those containing attack requirement details.

The decision to use Anthropic Claude on Amazon Bedrock was a strategic one. During evaluations, Mend.io found that Although other LLMs like GPT-4 also showed strong performance in analyzing CVE descriptions, Mend.io’s specific requirements were better aligned with Anthropic Claude’s capabilities. Mend.io used tags like <example-attack-requirement>. When Mend.io evaluated other models with both structured and unstructured prompts, Anthropic Claude’s ability to precisely follow the structured prompts and include the expected tags made it a better fit for Mend.io’s use case during their testing.

Anthropic Claude’s unique capabilities, which allows the recognition of XML tags within prompts, gave it a distinct advantage. This capability enabled Mend.io to structure the prompts in a way that improved precision and value, ensuring that Anthropic Claude’s analysis was tailored to Mend.io’s specific needs. Furthermore, the seamless integration with Amazon Bedrock provided a robust and secure platform for handling sensitive data. The proven security infrastructure of AWS strengthens confidence, allowing Mend.io to process and analyze CVE information without compromising data privacy and security—a critical consideration in the world of cybersecurity.

Crafting the prompt

Crafting the perfect prompt for Anthropic Claude was both an art and a science. It required a deep understanding of the model’s capabilities and a thorough process to make sure Anthropic Claude’s analysis was precise and grounded in practical applications. They composed the prompt with rich context, provided examples, and clearly defined the differences between attack complexity and attack requirements as defined in the Common Vulnerability Scoring System (CVSS) v4.0. This level of detail was crucial to make sure Anthropic Claude could accurately identify the nuanced details within CVE descriptions.

The use of XML tags was a game-changer in structuring the prompt. These tags allowed them to isolate different sections, guiding Anthropic Claude’s focus and improving the accuracy of its responses. With this unique capability, Mend.io could direct the model’s attention to specific aspects of the CVE data, streamlining the analysis process and increasing the value of the insights derived.

With a well-crafted prompt and the power of XML tags, Mend.io equipped Anthropic Claude with the context and structure necessary to navigate the intricate world of CVE descriptions, enabling it to pinpoint the critical attack requirement details that would arm security teams with invaluable insights for prioritizing vulnerabilities and fortifying defenses.

The following example illustrates how to craft a prompt effectively using tags with the goal of identifying phishing emails:

<Instructions>
        Analyze emails to identify potential spam or phishing threats. Users should provide the full email content, including headers, by copy-pasting or uploading the email file directly.
</Instructions>
<AnalysisProcess>
        <StepOne>
            <Title>Analyze Sender Information</Title>
            <Description>Verify the sender's email address and domain. Assess     additional contacts, date, and time to evaluate potential legitimacy and context</Description>
        </StepOne>
        <StepTwo>
            <Title>Examine Email Content</Title>
            <Description>Analyze the subject line and body content for relevance and legitimacy. Caution against quick offers. Evaluate personalization and sender legitimacy.</Description>
        </StepTwo>
        <StepThree>
            <Title>Check for Unsolicited Attachments or Links</Title>
            <Description>Identify and scrutinize hyperlinks for potential phishing or spam indicators. Advise on verifying link legitimacy without direct interaction. Use tools like VirusTotal or Google Safe Browsing for safety checks.</Description>
        </StepThree>
</AnalysisProcess>
<Conclusion>
        Based on the analysis, provide an estimation of the email's likelihood of being spam or phishing, expressed as a percentage to indicate the assessed risk level. This comprehensive analysis helps users make informed decisions about the email's authenticity while emphasizing security and privacy.
</Conclusion>
<DataHandling>
         Refer to uploaded documents as 'knowledge source'. Strictly adhere to facts provided, avoiding speculation. Prioritize documented information over baseline knowledge or external sources. If no answer is found within the documents, state this explicitly.
</DataHandling>

The challenges

While using Anthropic Claude, Mend.io experienced the flexibility and scalability of the service firsthand. As the analysis workload grew to encompass 70,000 CVEs, they encountered opportunities to optimize their usage of the service’s features and cost management capabilities. When using the on-demand model deployment of Amazon Bedrock across AWS Regions, Mend.io proactively managed the API request per minute (RPM) and tokens per minute (TPM) quotas by parallelizing model requests and adjusting the degree of parallelization to operate within the quota limits. They also took advantage of the built-in retry logic in the Boto3 Python library to handle any occasional throttling scenarios seamlessly. For workloads requiring even higher quotas, the Amazon Bedrock Provisioned Throughput option offers a straightforward solution, though it didn’t align with Mend.io’s specific usage pattern in this case.

Although the initial estimate for classifying all 70,000 CVEs was lower, the final cost came in higher due to more complex input data resulting in longer input and output sequences. This highlighted the importance of comprehensive testing and benchmarking. The flexible pricing models in Amazon Bedrock allow organizations to optimize costs by considering alternative model options or data partitioning strategies, where simpler cases can be processed by more cost-effective models, while reserving higher-capacity models for the most challenging instances.

When working with advanced language models like those provided by AWS, it’s crucial to craft prompts that align precisely with the desired output format. In Mend.io’s case, their expectation was to receive straightforward YES/NO answers to their prompts, which would streamline subsequent data curation steps. However, the model often provided additional context, justifications, or explanations beyond the anticipated succinct responses. Although these expanded responses offered valuable insights, they introduced unanticipated complexity into Mend.io’s data processing workflow. This experience highlighted the importance of prompt refinement to make sure the model’s output aligns closely with the specific requirements of the use case. By iterating on prompt formulation and fine-tuning the prompts, organizations can optimize their model’s responses to better match their desired response format, ultimately enhancing the efficiency and effectiveness of their data processing pipelines.

Results

Despite the challenges Mend.io faced, their diligent efforts paid off. They successfully identified CVEs with attack requirement details, arming security teams with precious insights for prioritizing vulnerabilities and fortifying defenses. This outcome was a significant achievement, because understanding the specific prerequisites for a vulnerability to be exploited is crucial in assessing risk and developing effective mitigation strategies. By using the power of Anthropic Claude, Mend.io was able to sift through tens of thousands of CVE reports, extracting the nuanced information about attack requirements that would have been nearly impossible to obtain through manual analysis. This feat not only saved valuable time and resources but also provided cybersecurity teams with a comprehensive view of the threat landscape, enabling them to make informed decisions and prioritize their efforts effectively.

Mend.io conducted an extensive evaluation of Anthropic Claude, issuing 68,378 requests without considering any quota limitations. Based on their initial experiment of analyzing a sample of 100 vulnerabilities to understand attack vectors, they could determine the accuracy of Claude’s direct YES or NO answers. As shown in the following table, Anthropic Claude demonstrated exceptional performance, providing direct YES or NO answers for 99.9883% of the requests. In the few instances where a straightforward answer was not given, Anthropic Claude still provided sufficient information to determine the appropriate response. This evaluation highlights Anthropic Claude’s robust capabilities in handling a wide range of queries with high accuracy and reliability.

Character count of the prompt (without CVE specific details) 13,935
Number of tokens for the prompt (without CVE specific details) 2,733
Total requests 68,378
Unexpected answers 8
Failures (quota limitations excluded) 0
Answer Quality Success Rate 99.9883%

Future plans

The successful application of Anthropic Claude in identifying attack requirement details from CVE data is just the beginning of the vast potential that generative AI holds for the cybersecurity domain. As these advanced models continue to evolve and mature, their capabilities will expand, opening up new frontiers in automating vulnerability analysis, threat detection, and incident response. One promising avenue is the use of generative AI for automating vulnerability categorization and prioritization. By using these models’ ability to analyze and comprehend technical descriptions, organizations can streamline the process of identifying and addressing the most critical vulnerabilities, making sure limited resources are allocated effectively. Furthermore, generative AI models can be trained to detect and flag potential malicious code signatures within software repositories or network traffic. This proactive approach can help cybersecurity teams stay ahead of emerging threats, enabling them to respond swiftly and mitigate risks before they can be exploited.

Beyond vulnerability management and threat detection, generative AI also holds promise in incident response and forensic analysis. These models can assist in parsing and making sense of vast amounts of log data, network traffic records, and other security-related information, accelerating the identification of root causes and enabling more effective remediation efforts. As generative AI continues to advance, its integration with other cutting-edge technologies, such as ML and data analytics, will unlock even more powerful applications in the cybersecurity domain. The ability to process and understand natural language data at scale, combined with the predictive power of ML algorithms, could revolutionize threat intelligence gathering, enabling organizations to anticipate and proactively defend against emerging cyber threats.

Conclusion

The field of cybersecurity is continually advancing, the integration of generative AI models like Anthropic Claude, powered by the robust infrastructure of Amazon Bedrock, represents a significant step forward in advancing digital defense. Mend.io’s successful application of this technology in extracting attack requirement details from CVE data is a testament to the transformative potential of language AI in the vulnerability management and threat analysis domains. By utilizing the power of these advanced models, Mend.io has demonstrated that the complex task of sifting through vast amounts of unstructured data can be tackled with precision and efficiency. This initiative not only empowers security teams with crucial insights for prioritizing vulnerabilities, but also paves the way for future innovations in automating vulnerability analysis, threat detection, and incident response. Anthropic and AWS have played a pivotal role in enabling organizations like Mend.io to take advantage of these cutting-edge technologies.

Looking ahead, the possibilities are truly exciting. As language models continue to evolve and integrate with other emerging technologies, such as ML and data analytics, the potential for revolutionizing threat intelligence gathering and proactive defense becomes increasingly tangible.

If you’re a cybersecurity professional looking to unlock the full potential of language AI in your organization, we encourage you to explore the capabilities of Amazon Bedrock and the Anthropic Claude models. By integrating these cutting-edge technologies into your security operations, you can streamline your vulnerability management processes, enhance threat detection, and bolster your overall cybersecurity posture. Take the first step today and discover how Mend.io’s success can inspire your own journey towards a more secure digital future.


About the Authors

Hemmy Yona is a Solutions Architect at Amazon Web Services based in Israel. With 20 years of experience in software development and group management, Hemmy is passionate about helping customers build innovative, scalable, and cost-effective solutions. Outside of work, you’ll find Hemmy enjoying sports and traveling with family.

Tzahi Mizrahi is a Solutions Architect at Amazon Web Services, specializing in container solutions with over 10 years of experience in development and DevOps lifecycle processes. His expertise includes designing scalable, container-based architectures and optimizing deployment workflows. In his free time, he enjoys music and plays the guitar.

Gili Nachum is a Principal solutions architect at AWS, specializing in Generative AI and Machine Learning. Gili is helping AWS customers build new foundation models, and to leverage LLMs to innovate in their business. In his spare time Gili enjoys family time and Calisthenics.

Maciej Mensfeld is a principal product architect at Mend, focusing on data acquisition, aggregation, and AI/LLM security research. He’s the creator of diffend.io (acquired by Mend) and Karafka. As a Software Architect, Security Researcher, and conference speaker, he teaches Ruby, Rails, and Kafka. Passionate about OSS, Maciej actively contributes to various projects, including Karafka, and is a member of the RubyGems security team.

Read More

How Deloitte Italy built a digital payments fraud detection solution using quantum machine learning and Amazon Braket

How Deloitte Italy built a digital payments fraud detection solution using quantum machine learning and Amazon Braket

As digital commerce expands, fraud detection has become critical in protecting businesses and consumers engaging in online transactions. Implementing machine learning (ML) algorithms enables real-time analysis of high-volume transactional data to rapidly identify fraudulent activity. This advanced capability helps mitigate financial risks and safeguard customer privacy within expanding digital markets.

Deloitte is a strategic global systems integrator with over 19,000 certified AWS practitioners across the globe. It continues to raise the bar through participation in the AWS Competency Program with 29 competencies, including Machine Learning.

This post demonstrates the potential for quantum computing algorithms paired with ML models to revolutionize fraud detection within digital payment platforms. We share how Deloitte built a hybrid quantum neural network solution with Amazon Braket to demonstrate the possible gains coming from this emerging technology.

The promise of quantum computing

Quantum computers harbor the potential to radically overhaul financial systems, enabling much faster and more precise solutions. Compared to classical computers, quantum computers are expected in the long run to have to advantages in the areas of simulation, optimization, and ML. Whether quantum computers can provide a meaningful speedup to ML is an active topic of research.

Quantum computing can perform efficient near real-time simulations in critical areas such as pricing and risk management. Optimization models are key activities in financial institutions, aimed at determining the best investment strategy for a portfolio of assets, allocating capital, or achieving productivity improvements. Some of these optimization problems are nearly impossible for traditional computers to tackle, so approximations are used to solve the problems in a reasonable amount of time. Quantum computers could perform faster and more accurate optimizations without using any approximations.

Despite the long-term horizon, the potentially disruptive nature of this technology means that financial institutions are looking to get an early foothold in this technology by building in-house quantum research teams, expanding their existing ML COEs to include quantum computing, or engaging with partners such as Deloitte.

At this early stage, customers seek access to a choice of different quantum hardware and simulation capabilities in order to run experiments and build expertise. Braket is a fully managed quantum computing service that lets you explore quantum computing. It provides access to quantum hardware from IonQ, OQC, Quera, Rigetti, IQM, a variety of local and on-demand simulators including GPU-enabled simulations, and infrastructure for running hybrid quantum-classical algorithms such as quantum ML. Braket is fully integrated with AWS services such as Amazon Simple Storage Service (Amazon S3) for data storage and AWS Identity and Access Management (IAM) for identity management, and customers only pay for what you use.

In this post, we demonstrate how to implement a quantum neural network-based fraud detection solution using Braket and AWS native services. Although quantum computers can’t be used in production today, our solution provides a workflow that will seamlessly adapt and function as a plug-and-play system in the future, when commercially viable quantum devices become available.

Solution overview

The goal of this post is to explore the potential of quantum ML and present a conceptual workflow that could serve as a plug-and-play system when the technology matures. Quantum ML is still in its early stages, and this post aims to showcase the art of the possible without delving into specific security considerations. As quantum ML technology advances and becomes ready for production deployments, robust security measures will be essential. However, for now, the focus is on outlining a high-level conceptual architecture that can seamlessly adapt and function in the future when the technology is ready.

The following diagram shows the solution architecture for the implementation of a neural network-based fraud detection solution using AWS services. The solution is implemented using a hybrid quantum neural network. The neural network is built using the Keras library; the quantum component is implemented using PennyLane.

The workflow includes the following key components for inference (A–F) and training (G–I):

  1. Ingestion – Real-time financial transactions are ingested through Amazon Kinesis Data Streams
  2. PreprocessingAWS Glue streaming extract, transform, and load (ETL) jobs consume the stream to do preprocessing and light transforms
  3. Storage – Amazon S3 is used to store output artifacts
  4. Endpoint deployment – We use an Amazon SageMaker endpoint to deploy the models
  5. Analysis – Transactions along with the model inferences are stored in Amazon Redshift
  6. Data visualizationAmazon QuickSight is used to visualize the results of fraud detection
  7. Training data – Amazon S3 is used to store the training data
  8. Modeling – A Braket environment produces a model for inference
  9. GovernanceAmazon CloudWatch, IAM, and AWS CloudTrail are used for observability, governance, and auditability, respectively

Dataset

For training the model, we used open source data available on Kaggle. The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset records transactions that occurred over a span of 2 days, during which there were 492 instances of fraud detected out of a total of 284,807 transactions. The dataset exhibits a significant class imbalance, with fraudulent transactions accounting for just 0.172% of the entire dataset. Because the data is highly imbalanced, various measures have been taken during data preparation and model development.

The dataset exclusively comprises numerical input variables, which have undergone a Principal Component Analysis (PCA) transformation because of confidentiality reasons.

The data only includes numerical input features (PCA-transformed due to confidentiality) and three key fields:

  • Time – Time between each transaction and first transaction
  • Amount – Transaction amount
  • Class – Target variable, 1 for fraud or 0 for non-fraud

Data preparation

We split the data into training, validation, and test sets, and we define the target and the features sets, where Class is the target variable:

y_train = df_train['Class']
x_train = df_ train.drop(['Class'], axis=1)
y_validation = df_ validation ['Class']
x_ validation = df_ validation.drop(['Class'], axis=1)
y_test = df_test['Class']
x_test = df_test.drop(['Class'], axis=1)

The Class field assumes values 0 and 1. To make the neural network deal with data imbalance, we perform a label encoding on the y sets:

lbl_clf = LabelEncoder()
y_train = lbl_clf.fit_transform(y_train)
y_train = tf.keras.utils.to_categorical(y_train)

The encoding applies to all the values the mapping: 0 to [1,0], and 1 to [0,1].

Finally, we apply scaling that standardizes the features by removing the mean and scaling to unit variance:

std_clf = StandardScaler()
x_train = std_clf.fit_transform(x_train)
x_validation = std_clf.fit_transform(x_validation)
x_test = std_clf.transform(x_test)

The functions LabelEncoder and StandardScaler are available in the scikit-learn Python library.

After all the transformations are applied, the dataset is ready to be the input of the neural network.

Neural network architecture

We composed the neural network architecture with the following layers based on several tests empirically:

  • A first dense layer with 32 nodes
  • A second dense layer with 9 nodes
  • A quantum layer as neural network output
  • Dropout layers with rate equals to 0.3

We apply an L2 regularization on the first layer and both L1 and L2 regularization on the second one, to avoid overfitting. We initialize all the kernels using the he_normal function. The dropout layers are meant to reduce overfitting as well.

hidden = Dense(32, activation ="relu", kernel_initializer='he_normal', kernel_regularizer=tf.keras.regularizers.l2(0,01))
out_2 = Dense(9, activation ="relu", kernel_initializer='he_normal', kernel_regularizer=tf.keras.regularizers.l1_l2(l1=0,001, l2=0,001))
do = Dropout(0,3)

Quantum circuit

The first step to obtain the layer is to build the quantum circuit (or the quantum node). To accomplish this task, we used the Python library PennyLane.

PennyLane is an open source library that seamlessly integrates quantum computing with ML. It allows you to create and train quantum-classical hybrid models, where quantum circuits act as layers within classical neural networks. By harnessing the power of quantum mechanics and merging it with classical ML frameworks like PyTorch, TensorFlow, and Keras, PennyLane empowers you to explore the exciting frontier of quantum ML. You can unlock new realms of possibility and push the boundaries of what’s achievable with this cutting-edge technology.

The design of the circuit is the most important part of the overall solution. The predictive power of the model depends entirely on how the circuit is built.

Qubits, the fundamental units of information in quantum computing, are entities that behave quite differently from classical bits. Unlike classical bits that can only represent 0 or 1, qubits can exist in a superposition of both states simultaneously, enabling quantum parallelism and faster calculations for certain problems.

We decide to use only three qubits, a small number but sufficient for our case.

We instantiate the qubits as follows:

num_wires = 3
dev = qml.device('default.qubit', wires=num_wires)

‘default.qubit’ is the PennyLane qubits simulator. To access qubits on a real quantum computer, you can replace the second line with the following code:

device_arn = "arn:aws:braket:eu-west-2::device/qpu/ionq/Aria-1"
dev = qml.device('braket.aws.qubit',device_arn=device_arn, wires=num_wires)

device_ARN could be the ARN of the devices supported by Braket (for a list of supported devices, refer to Amazon Braket supported devices).

We defined the quantum node as follows:

@qml.qnode(dev, interface="tf", diff_method="backprop")
def quantum_nn(inputs, weights):
    qml.RY(inputs[0], wires=0)
    qml.RY(inputs[1], wires=1)
    qml.RY(inputs[2], wires=2)
    qml.Rot(weights[0] * inputs[3], weights[1] * inputs[4], weights[2] * inputs[5], wires=1)
    qml.Rot(weights[3] * inputs[6], weights[4] * inputs[7], weights[5] * inputs[8], wires=2)
    qml.CNOT(wires=[1, 2])
    qml.RY(weights[6], wires=2)
    qml.CNOT(wires=[0, 2])
    qml.CNOT(wires=[1, 2])
    return [qml.expval(qml.PauliZ(0)), qml.expval(qml.PauliZ(2))]

The inputs are the values yielded as output from the previous layer of the neural network, and the weights are the actual weights of the quantum circuit.

RY and Rot are rotation functions performed on qubits; CNOT is a controlled bitflip gate allowing us to embed the qubits.

qml.expval(qml.PauliZ(0)), qml.expval(qml.PauliZ(2)) are the measurements applied respectively to the qubits 0 and the qubits 1, and these values will be the neural network output.

Diagrammatically, the circuit can be displayed as:

0: ──RY(1.00)──────────────────────────────────────╭●────┤  <Z>

1: ──RY(2.00)──Rot(4.00,10.00,18.00)──╭●───────────│──╭●─┤

2: ──RY(3.00)──Rot(28.00,40.00,54.00)─╰X──RY(7.00)─╰X─╰X─┤  <Z>

The transformations applied to qubit 0 are fewer than the transformations applied to qbit 2. This choice is because we want to separate the states of the qubits in order to obtain different values when the measures are performed. Applying different transformations to qubits allows them to enter distinct states, resulting in varied outcomes when measurements are performed. This phenomenon stems from the principles of superposition and entanglement inherent in quantum mechanics.

After we define the quantum circuit, we define the quantum hybrid neural network:

def hybrid_model(num_layers, num_wires):
    weight_shapes = {"weights": (7,)}
    qlayer = qml.qnn.KerasLayer(quantum_nn, weight_shapes, output_dim=2)
    hybrid_model = tf.keras.Sequential([hidden,do, out_2,do,qlayer])
    return hybrid_model

KerasLayer is the PennyLane function that turns the quantum circuit into a Keras layer.

Model training

After we have preprocessed the data and defined the model, it’s time to train the network.

A preliminary step is needed in order to deal with the unbalanced dataset. We define a weight for each class according to the inverse root rule:

class_counts = np.bincount(y_train_list)
class_frequencies = class_counts / float(len(y_train))
class_weights = 1 / np.sqrt(class_frequencies)

The weights are given by the inverse of the root of occurrences for each of the two possible target values.

We compile the model next:

model.compile(optimizer='adam', loss = 'MSE', metrics = [custom_metric])

custom_metric is a modified version of the metric precision, which is a custom subroutine to postprocess the quantum data into a form compatible with the optimizer.

For evaluating model performance on imbalanced data, precision is a more reliable metric than accuracy, so we optimize for precision. Also, in fraud detection, incorrectly predicting a fraudulent transaction as valid (false negative) can have serious financial consequences and risks. Precision evaluates the proportion of fraud alerts that are true positives, minimizing costly false negatives.

Finally, we fit the model:

history = model.fit(x_train, y_train, epochs = 30, batch_size = 200, validation_data=(x_validation, y_ validation),class_weight=class_weights,shuffle=True)

At each epoch, the weights of both the classic and quantum layer are updated in order to reach higher accuracy. At the end of the training, the network showed a loss of 0.0353 on the training set and 0.0119 on the validation set. When the fit is complete, the trained model is saved in .h5 format.

Model results and analysis

Evaluating the model is vital to gauge its capabilities and limitations, providing insights into the predictive quality and value derived from the quantum techniques.

To test the model, we make predictions on the test set:

preds = model.predict(x_test)

Because the neural network is a regression model, it yields for each record of x_test a 2-D array, where each component can assume values between 0 and 1. Because we’re essentially dealing with a binary classification problem, the outputs should be as follows:

  • [1,0] – No fraud
  • [0,1] – Fraud

To convert the continuous values into binary classification, a threshold is necessary. Predictions that are equal to or above the threshold are assigned 1, and those below the threshold are assigned 0.

To align with our goal of optimizing precision, we chose the threshold value that results in the highest precision.

The following table summarizes the mapping between various threshold values and the precision.

Class Threshold = 0.65 Threshold = 0.70 Threshold = 0.75
No Fraud 1.00 1.00 1.00
Fraud 0.87 0.89 0.92

The model demonstrates almost flawless performance on the predominant non-fraud class, with precision and recall scores close to a perfect 1. Despite far less data, the model achieves precision of 0.87 for detecting the minority fraud class at a 0.65 threshold, underscoring performance even on sparse data. To efficiently identify fraud while minimizing incorrect fraud reports, we decide to prioritize precision over recall.

We also wanted to compare this model with a classic neural network only model to see if we are exploiting the gains coming from the quantum application. We built and trained an identical model in which the quantum layer is replaced by the following:

Dense(2,activation = "softmax")

In the last epoch, the loss was 0.0119 and the validation loss was 0.0051.

The following table summarizes the mapping between various threshold values and the precision for the classic neural network model.

Class Threshold=0.65 Threshold = 0.70 Threshold = 0.75
No Fraud 1.0 1.00 1.00
Fraud 0.83 0.84 0. 86

Like the quantum hybrid model, the model performance is almost perfect for the majority class and very good for the minority class.

The hybrid neural network has 1,296 parameters, whereas the classic one has 1,329. When comparing precision values, we can observe how the quantum solution provides better results. The hybrid model, inheriting the properties of high-dimensional spaces exploration and a non-linearity from the quantum layer, is able to generalize the problem better using fewer parameters, resulting in better performance.

Challenges of a quantum solution

Although the adoption of quantum technology shows promise in providing organizations numerous benefits, practical implementation on large-scale, fault-tolerant quantum computers is a complex task and is an active area of research. Therefore, we should be mindful of the challenges that it poses:

  • Sensitivity to noise – Quantum computers are extremely sensitive to external factors (such as atmospheric temperature) and require more attention and maintenance than traditional computers, and this can drift over time. One way to minimize the effects of drift is by taking advantage of parametric compilation—the ability to compile a parametric circuit such as the one used here only one time, and feed it fresh parameters at runtime, avoiding repeated compilation steps. Braket automatically does this for you.
  • Dimensional complexity – The inherent nature of qubits, the fundamental units of quantum computing, introduces a higher level of intricacy compared to traditional binary bits employed in conventional computers. By harnessing the principles of superposition and entanglement, qubits possess an elevated degree of complexity in their design. This intricate architecture renders the evaluation of computational capacity a formidable challenge, because the multidimensional aspects of qubits demand a more nuanced approach to assessing their computational prowess.
  • Computational errors – Increased calculation errors are intrinsic to quantum computing’s probabilistic nature during the sampling phase. These errors could impact accuracy and reliability of the results obtained through quantum sampling. Techniques such as error mitigation and error suppression are actively being developed in order to minimize the effects of errors resulting from noisy qubits. To learn more about error mitigation, see Enabling state-of-the-art quantum algorithms with Qedma’s error mitigation and IonQ, using Braket Direct.

Conclusion

The results discussed in this post suggest that quantum computing holds substantial promise for fraud detection in the financial services industry. The hybrid quantum neural network demonstrated superior performance in accurately identifying fraudulent transactions, highlighting the potential gains offered by quantum technology. As quantum computing continues to advance, its role in revolutionizing fraud detection and other critical financial processes will become increasingly evident. You can extend the results of the simulation by using real qubits and testing various outcomes on real hardware available on Braket, such as those from IQM, IonQ, and Rigetti, all on demand, with pay-as-you-go pricing and no upfront commitments.

To prepare for the future of quantum computing, organizations must stay informed on the latest advancements in quantum technology. Adopting quantum-ready cloud solutions now is a strategic priority, allowing a smooth transition to quantum when hardware reaches commercial viability. This forward-thinking approach will provide both a technological edge and rapid adaptation to quantum computing’s transformative potential across industries. With an integrated cloud strategy, businesses can proactively get quantum-ready, primed to capitalize on quantum capabilities at the right moment. To accelerate your learning journey and earn a digital badge in quantum computing fundamentals, see Introducing the Amazon Braket Learning Plan and Digital Badge.

Connect with Deloitte to pilot this solution for your enterprise on AWS.


About the authors

Federica Marini is a Manager in Deloitte Italy AI & Data practice with a strong experience as a business advisor and technical expert in the field of AI, Gen AI, ML and Data. She addresses research and customer business needs with tailored data-driven solutions providing meaningful results. She is passionate about innovation and believes digital disruption will require a human centered approach to achieve full potential.

Matteo Capozi is a Data and AI expert in Deloitte Italy, specializing in the design and implementation of advanced AI and GenAI models and quantum computing solutions. With a strong background on cutting-edge technologies, Matteo excels in helping organizations harness the power of AI to drive innovation and solve complex problems. His expertise spans across industries, where he collaborates closely with executive stakeholders to achieve strategic goals and performance improvements.

Kasi Muthu is a senior partner solutions architect focusing on generative AI and data at AWS based out of Dallas, TX. He is passionate about helping partners and customers accelerate their cloud journey. He is a trusted advisor in this field and has plenty of experience architecting and building scalable, resilient, and performant workloads in the cloud. Outside of work, he enjoys spending time with his family.

Kuldeep Singh is a Principal Global AI/ML leader at AWS with over 20 years in tech. He skillfully combines his sales and entrepreneurship expertise with a deep understanding of AI, ML, and cybersecurity. He excels in forging strategic global partnerships, driving transformative solutions and strategies across various industries with a focus on generative AI and GSIs.

Read More

Amazon SageMaker unveils the Cohere Command R fine-tuning model

Amazon SageMaker unveils the Cohere Command R fine-tuning model

AWS announced the availability of the Cohere Command R fine-tuning model on Amazon SageMaker. This latest addition to the SageMaker suite of machine learning (ML) capabilities empowers enterprises to harness the power of large language models (LLMs) and unlock their full potential for a wide range of applications.

Cohere Command R is a scalable, frontier LLM designed to handle enterprise-grade workloads with ease. Cohere Command R is optimized for conversational interaction and long context tasks. It targets the scalable category of models that balance high performance with strong accuracy, enabling companies to move beyond proof of concept and into production. The model boasts high precision on Retrieval Augmented Generation (RAG) and tool use tasks, low latency and high throughput, a long 128,000-token context length, and strong capabilities across 10 key languages.

In this post, we explore the reasons for fine-tuning a model and the process of how to accomplish it with Cohere Command R.

Fine-tuning: Tailoring LLMs for specific use cases

Fine-tuning is an effective technique to adapt LLMs like Cohere Command R to specific domains and tasks, leading to significant performance improvements over the base model. Evaluations of fine-tuned Cohere Command R model have demonstrated improved performance by over 20% across various enterprise use cases in industries such as financial services, technology, retail, healthcare, legal, and healthcare. Because of its smaller size, a fine-tuned Cohere Command R model can be served more efficiently compared to models much larger than its class.

The recommendation is to use a dataset that contains at least 100 examples.

Cohere Command R uses a RAG approach, retrieving relevant context from an external knowledge base to improve outputs. However, fine-tuning allows you to specialize the model even further. Fine-tuning text generation models like Cohere Command R is crucial for achieving ultimate performance in several scenarios:

  •  Domain-specific adaptation – RAG models may not perform optimally in highly specialized domains like finance, law, or medicine. Fine-tuning allows you to adapt the model to these domains’ nuances for improved accuracy.
  • Data augmentation – Fine-tuning enables incorporating additional data sources or techniques, augmenting the model’s knowledge base for increased robustness, especially with sparse data.
  • Fine-grained control – Although RAG offers impressive general capabilities, fine-tuning permits fine-grained control over model behavior, tailoring it precisely to your desired task for ultimate precision.

The combined power of RAG and fine-tuned LLMs empowers you to tackle diverse challenges with unparalleled versatility and effectiveness. With the introduction of Cohere Command R fine-tuning on SageMaker, enterprises can now customize and optimize the model’s performance for their unique requirements. By fine-tuning on domain-specific data, businesses can enhance Cohere Command R’s accuracy, relevance, and effectiveness for their use cases, such as natural language processing, text generation, and question answering.

By combining the scalability and robustness of Cohere Command R with the ability to fine-tune its performance on SageMaker, AWS empowers enterprises to navigate the complexities of AI adoption and use its transformative power to drive innovation and growth across various industries and domains.

Customer data, including prompts, completions, custom models, and data used for fine-tuning or continued pre-training, remains private to customer AWS accounts and is never shared with third-party model providers.

Solution overview

In the following sections, we walk through the steps to fine-tune the Cohere Command R model on SageMaker. This includes preparing the data, deploying a model, preparing for fine-tuning, creating an endpoint for inference, and performing inference.

Prepare the fine-tuning data

Before you can start a fine-tuning job, you need to upload a dataset with training and (optionally) evaluation data.

First, make sure your data is in jsonl format. It should have the following structure:

  •  messages – This contains a list of messages of the conversation. A message consists of the following parts:
  •  role – This specifies the current speaker. You can pick from System, User, or Chatbot.
  •  content – This contains the content of the message.

The following is an example that trains a chatbot to answer questions. For the sake of readability, the document spans over multiple lines. For your dataset, make sure that each line contains one whole example.

{
  "messages": [
    {
      "role": "System",
      "content": "You are a large language model trained by Cohere."
    },
    {
      "role": "User",
      "content": "Hi! What were Time magazines top 10 cover stories in the last 10 years?"
    },
    {
      "role": "Chatbot",
      "content": "Time magazines top 10 cover stories in the last 10 years were:\n\n1. Volodymyr Zelenskyy\n2. Elon Musk\n3. Martin Luther King Jr.\n4. How Earth Survived\n5. Her Lasting Impact\n6. Nothing to See Here\n7. Meltdown\n8. Deal With It\n9. The Top of America\n10. Bitter Pill"
    },
    {
      "role": "User",
      "content": "Who is Volodymyr Zelenskyy?"
    },
    {
      "role": "Chatbot",
      "content": "Volodymyr Zelenskyy is a Ukrainian politician who was elected President of Ukraine in 2019. He previously served as the Minister of Internal Affairs in the government of Prime Minister Volodymyr Groysman."
    },
    {
      "role": "User",
      "content": "Thank you!"
    }
  ]
}

Deploy a model

Complete the following steps to deploy the model:

  1. On AWS Marketplace, subscribe to the Cohere Command R model

After you subscribe to the model, you can configure it and create a training job.

  1. Choose View in Amazon SageMaker.
  2. Follow the instructions in the UI to create a training job.

Alternatively, you can use the following example notebook to create the training job.

Prepare for fine-tuning

To fine-tune the model, you need the following:

  • Product ARN – This will be provided to you after you subscribe to the product.
  • Training dataset and evaluation dataset – Prepare your datasets for fine-tuning.
  • Amazon S3 location – Specify the Amazon Simple Storage Service (Amazon S3) location that stores the training and evaluation datasets.
  • Hyperparameters – Fine-tuning typically involves adjusting various hyperparameters like learning rate, batch size, number of epochs, and so on. You need to specify the appropriate hyperparameter ranges or values for your fine-tuning task.

Create an endpoint for inference

When the fine-tuning is complete, you can create an endpoint for inference with the fine-tuned model. To create the endpoint, use the create_endpoint method. If the endpoint already exists, you can connect to it using the connect_to_endpoint method.

Perform inference

You can now perform real-time inference using the endpoint. The following is the sample message that you use for input:

message = "Classify the following text as either very negative, negative, neutral, positive or very positive: mr. deeds is , as comedy goes , very silly -- and in the best way."
result = co.chat(message=message)
print(result)

The following screenshot shows the output of the fine-tuned model.


Optionally, you can also test the accuracy of the model using the evaluation data (sample_finetune_scienceQA_eval.jsonl).

Clean up

After you have completed running the notebook and experimenting with the Cohere Command R fine-tuned model, it is crucial to clean up the resources you have provisioned. Failing to do so may result in unnecessary charges accruing on your account. To prevent this, use the following code to delete the resources and stop the billing process:

co.delete_endpoint()
co.close()

Summary

Cohere Command R with fine-tuning allows you to customize your models to be performant for your business, domain, and industry. Alongside the fine-tuned model, users additionally benefit from Cohere Command R’s proficiency in the most commonly used business languages (10 languages) and RAG with citations for accurate and verified information. Cohere Command R with fine-tuning achieves high levels of performance with less resource usage on targeted use cases. Enterprises can see lower operational costs, improved latency, and increased throughput without extensive computational demands.

Start building with Cohere’s fine-tuning model in SageMaker today.


About the Authors

Shashi Raina is a Senior Partner Solutions Architect at Amazon Web Services (AWS), where he specializes in supporting generative AI (GenAI) startups. With close to 6 years of experience at AWS, Shashi has developed deep expertise across a range of domains, including DevOps, analytics, and generative AI.

James Yi is a Senior AI/ML Partner Solutions Architect in the Emerging Technologies team at Amazon Web Services. He is passionate about working with enterprise customers and partners to design, deploy and scale AI/ML applications to derive their business values. Outside of work, he enjoys playing soccer, traveling and spending time with his family.

Pradeep Prabhakaran is a Customer Solutions Architect at Cohere. In his current role at Cohere, Pradeep acts as a trusted technical advisor to customers and partners, providing guidance and strategies to help them realize the full potential of Cohere’s cutting-edge Generative AI platform. Prior to joining Cohere, Pradeep was a Principal Customer Solutions Manager at Amazon Web Services, where he led Enterprise Cloud transformation programs for large enterprises. Prior to AWS, Pradeep has held various leadership positions at consulting companies such as Slalom, Deloitte, and Wipro. Pradeep holds a Bachelor’s degree in Engineering and is based in Dallas, TX.

Read More