Fine-tune Falcon 7B and other LLMs on Amazon SageMaker with @remote decorator

Fine-tune Falcon 7B and other LLMs on Amazon SageMaker with @remote decorator

Today, generative AI models cover a variety of tasks from text summarization, Q&A, and image and video generation. To improve the quality of output, approaches like n-short learning, Prompt engineering, Retrieval Augmented Generation (RAG) and fine tuning are used. Fine-tuning allows you to adjust these generative AI models to achieve improved performance on your domain-specific tasks.

With Amazon SageMaker, now you can run a SageMaker training job simply by annotating your Python code with @remote decorator. The SageMaker Python SDK automatically translates your existing workspace environment, and any associated data processing code and datasets, into an SageMaker training job that runs on the training platform. This has the advantage of writing the code in a more natural, object-oriented way, and still uses SageMaker capabilities to run training jobs on a remote cluster with minimal changes.

In this post, we showcase how to fine-tune a Falcon-7B Foundation Models (FM) using @remote decorator from SageMaker Python SDK. It also uses Hugging Face’s parameter-efficient fine-tuning (PEFT) library and quantization techniques through bitsandbytes to support fine-tuning. The code presented in this blog can also be used to fine-tune other FMs, such as Llama-2 13b.

The full precision representations of this model might have challenges to fit into memory on a single or even several Graphic Processing Units (GPUs) — or may even need a bigger instance. Hence, in order to fine-tune this model without increasing cost, we use the technique known as Quantized LLMs with Low-Rank Adapters (QLoRA). QLoRA is an efficient fine-tuning approach that reduces memory usage of LLMs while maintaining very good performance.

Advantages of using @remote decorator

Before going further, let’s understand how remote decorator improves developer productivity while working with SageMaker:

  • @remote decorator triggers a training job directly using native python code, without the explicit invocation of SageMaker Estimators and SageMaker input channels
  • Low barrier for entry for developers training models on SageMaker.
  • No need to switch Integrated development environments (IDEs). Continue writing code in your choice of IDE and invoke SageMaker training jobs.
  • No need to learn about containers. Continue providing dependencies in a requirements.txt and supply that to remote decorator.

Prerequisites

An AWS account is needed with an AWS Identity and Access Management (AWS IAM) role that has permissions to manage resources created as part of the solution. For details, refer to Creating an AWS account.

In this post, we use Amazon SageMaker Studio with the Data Science 3.0 image and a ml.t3.medium fast launch instance. However, you can use any integrated development environment (IDE) of your choice. You just need to set up your AWS Command Line Interface (AWS CLI) credentials correctly. For more information, refer to Configure the AWS CLI.

For fine-tuning, the Falcon-7B, an ml.g5.12xlarge instance is used in this post. Please ensure sufficient capacity for this instance in AWS account.

You need to clone this Github repository for replicating the solution demonstrated in this post.

Solution overview

  1. Install pre-requisites to fine tuning the Falcon-7B model
  2. Set up remote decorator configurations
  3. Preprocess the dataset containing AWS services FAQs
  4. Fine-tune Falcon-7B on AWS services FAQs
  5. Test the fine-tune models on sample questions related to AWS services

1. Install prerequisites to fine tuning the Falcon-7B model

Launch the notebook falcon-7b-qlora-remote-decorator_qa.ipynb in SageMaker Studio by selecting the Image as Data Science and Kernel as Python 3. Install all the required libraries mentioned in the requirements.txt. Few of the libraries need to be installed on the notebook instance itself. Perform other operations needed for dataset processing and triggering a SageMaker training job.

%pip install -r requirements.txt

%pip install -q -U transformers==4.31.0
%pip install -q -U datasets==2.13.1
%pip install -q -U peft==0.4.0
%pip install -q -U accelerate==0.21.0
%pip install -q -U bitsandbytes==0.40.2
%pip install -q -U boto3
%pip install -q -U sagemaker==2.154.0
%pip install -q -U scikit-learn

2. Setup remote decorator configurations

Create a configuration file where all the configurations related to Amazon SageMaker training job are specified. This file is read by @remote decorator while running the training job. This file contains settings like dependencies, training image, instance, and the execution role to be used for training job. For a detailed reference of all the settings supported by config file, check out Configuring and using defaults with the SageMaker Python SDK.

SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        Dependencies: ./requirements.txt
        ImageUri: '{aws_account_id}.dkr.ecr.{region}.amazonaws.com/huggingface-pytorch-training:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04'
        InstanceType: ml.g5.12xlarge
        RoleArn: arn:aws:iam::111122223333:role/ExampleSageMakerRole

It’s not mandatory to use the config.yaml file in order to work with the @remote decorator. This is just a cleaner way to supply all configurations to the @remote decorator. This keeps SageMaker and AWS related parameters outside of code with a one time effort for setting up the config file used across the team members. All the configurations could also be supplied directly in the decorator arguments, but that reduces readability and maintainability of changes in the long run. Also, the configuration file can be created by an administrator and shared with all the users in an environment.

Preprocess the dataset containing AWS services FAQs

Next step is to load and preprocess the dataset to make it ready for training job. First, let us have a look at the dataset:

It shows FAQ for one of the AWS services. In addition to QLoRA, bitsanbytes is used to convert to 4-bit precision to quantize frozen LLM to 4-bit and attach LoRA adapters on it.

Create a prompt template to convert each FAQ sample to a prompt format:

from random import randint

# custom instruct prompt start
prompt_template = f"{{question}}n---nAnswer:n{{answer}}{{eos_token}}"

# template dataset to add prompt to each sample
def template_dataset(sample):
    sample["text"] = prompt_template.format(question=sample["question"],
                                            answer=sample["answers"],
                                            eos_token=tokenizer.eos_token)
    return sample

Next step is to convert the inputs (text) to token IDs. This is done by a Hugging Face Transformers Tokenizer.

from transformers import AutoTokenizer

model_id = "tiiuae/falcon-7b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
# Set the Falcon tokenizer
tokenizer.pad_token = tokenizer.eos_token

Now simply use the prompt_template function to convert all the FAQ to prompt format and set up train and test datasets.

4. Fine tune Falcon-7B on AWS services FAQs

Now you can prepare the training script and define the training function train_fn and put @remote decorator on the function.

The training function does the following:

  • tokenizes and chunks the dataset
  • set up BitsAndBytesConfig, which specifies the model should be loaded in 4-bit but while computation should be converted to bfloat16.
  • Load the model
  • Find target modules and update the necessary matrices by using the utility method find_all_linear_names
  • Create LoRA configurations that specify ranking of update matrices (s), scaling factor (lora_alpha), the modules to apply the LoRA update matrices (target_modules), dropout probability for Lora layers(lora_dropout), task_type, etc.
  • Start the training and evaluation
import bitsandbytes as bnb

def find_all_linear_names(hf_model):
    lora_module_names = set()
    for name, module in hf_model.named_modules():
        if isinstance(module, bnb.nn.Linear4bit):
            names = name.split(".")
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if "lm_head" in lora_module_names:
        lora_module_names.remove("lm_head")
    return list(lora_module_names)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from sagemaker.remote_function import remote
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import transformers

# Start training
@remote(volume_size=50)
def train_fn(
        model_name,
        train_ds,
        test_ds,
        lora_r=8,
        lora_alpha=32,
        lora_dropout=0.05,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        learning_rate=2e-4,
        num_train_epochs=1
):
    # tokenize and chunk dataset
    lm_train_dataset = train_ds.map(
        lambda sample: tokenizer(sample["text"]), batched=True, batch_size=24, remove_columns=list(train_dataset.features)
    )


    lm_test_dataset = test_ds.map(
        lambda sample: tokenizer(sample["text"]), batched=True, remove_columns=list(test_dataset.features)
    )

    # Print total number of samples
    print(f"Total number of train samples: {len(lm_train_dataset)}")

    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )
    # Falcon requires you to allow remote code execution. This is because the model uses a new architecture that is not part of transformers yet.
    # The code is provided by the model authors in the repo.
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        trust_remote_code=True,
        quantization_config=bnb_config,
        device_map="auto")

    model.gradient_checkpointing_enable()
    model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)

    # get lora target modules
    modules = find_all_linear_names(model)
    print(f"Found {len(modules)} modules to quantize: {modules}")

    config = LoraConfig(
        r=lora_r,
        lora_alpha=lora_alpha,
        target_modules=modules,
        lora_dropout=lora_dropout,
        bias="none",
        task_type="CAUSAL_LM"
    )

    model = get_peft_model(model, config)
    print_trainable_parameters(model)

    trainer = transformers.Trainer(
        model=model,
        train_dataset=lm_train_dataset,
        eval_dataset=lm_test_dataset,
        args=transformers.TrainingArguments(
            per_device_train_batch_size=per_device_train_batch_size,
            per_device_eval_batch_size=per_device_eval_batch_size,
            logging_steps=2,
            num_train_epochs=num_train_epochs,
            learning_rate=learning_rate,
            bf16=True,
            save_strategy="no",
            output_dir="outputs"
        ),
        data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
    )
    model.config.use_cache = False

    trainer.train()
    trainer.evaluate()

    model.save_pretrained("/opt/ml/model")

And invoke the train_fn()

train_fn(model_id, train_dataset, test_dataset)

The tuning job would be running on the Amazon SageMaker training cluster. Wait for tuning job to finish.

5. Test the fine tune models on sample questions related to AWS services

Now, it’s time to run some tests on the model. First, let us load the model:

from peft import PeftModel, PeftConfig
import torch
from transformers import AutoModelForCausalLM

device = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'

config = PeftConfig.from_pretrained("./model")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, trust_remote_code=True)
model = PeftModel.from_pretrained(model, "./model")
model.to(device)

Now load a sample question from the training dataset to see the original answer and then ask the same question from the tuned model to see the answer in comparison.

Here is a sample a question from training set and the original answer:

Now, same question being asked to tuned Falcon-7B model:

This concludes the implementation of fine tuning Falcon-7B on AWS services FAQ dataset using @remote decorator from Amazon SageMaker Python SDK.

Cleaning up

Complete the following steps to clean up your resources:

  • Shut down the Amazon SageMaker Studio instances to avoid incurring additional costs.
  • Clean up your Amazon Elastic File System (Amazon EFS) directory by clearing the Hugging Face cache directory:
    rm -R ~/.cache/huggingface/hub

Conclusion

In this post, we showed you how to effectively use the @remote decorator’s capabilities to fine-tune Falcon-7B model using QLoRA, Hugging Face PEFT with bitsandbtyes without applying significant changes in the training notebook, and used Amazon SageMaker capabilities to run training jobs on a remote cluster.

All the code shown as part of this post to fine-tune Falcon-7B is available in the GitHub repository. The repository also contains notebook showing how to fine-tune Llama-13B.

As a next step, we encourage you to check out the @remote decorator functionality and Python SDK API and use it in your choice of environment and IDE. Additional examples are available in the amazon-sagemaker-examples repository to get you started quickly. You can also check out the following posts:


About the Authors

Bruno Pistone is an AI/ML Specialist Solutions Architect for AWS based in Milan. He works with large customers helping them to deeply understand their technical needs and design AI and Machine Learning solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. His expertise include: Machine Learning end to end, Machine Learning Industrialization, and Generative AI. He enjoys spending time with his friends and exploring new places, as well as travelling to new destinations.

Vikesh Pandey is a Machine Learning Specialist Solutions Architect at AWS, helping customers from financial industries design and build solutions on generative AI and ML. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.

Read More

Simplify access to internal information using Retrieval Augmented Generation and LangChain Agents

Simplify access to internal information using Retrieval Augmented Generation and LangChain Agents

This post takes you through the most common challenges that customers face when searching internal documents, and gives you concrete guidance on how AWS services can be used to create a generative AI conversational bot that makes internal information more useful.

Unstructured data accounts for 80% of all the data found within organizations, consisting of repositories of manuals, PDFs, FAQs, emails, and other documents that grows daily. Businesses today rely on continuously growing repositories of internal information, and problems arise when the amount of unstructured data becomes unmanageable. Often, users find themselves reading and checking many different internal sources to find the answers they need.

Internal question and answer forums can help users get highly specific answers but also require longer wait times. In the case of company-specific internal FAQs, long wait times result in lower employee productivity. Question and answer forums are difficult to scale as they rely on manually written answers. With generative AI, there is currently a paradigm shift in how users search and find information. The next logical step is to use generative AI to condense large documents into smaller bite sized information for easier user consumption. Instead of spending a long time reading text or waiting for answers, users can generate summaries in real-time based on multiple existing repositories of internal information.

Solution overview

The solution allows customers to retrieve curated responses to questions asked about internal documents by using a transformer model to generate answers to questions about data that it has not been trained on, a technique known as zero-shot prompting. By adopting this solution, customers can gain the following benefits:

  • Find accurate answers to questions based on existing sources of internal documents
  • Reduce the time users spend searching for answers by using Large Language Models (LLMs) to provide near-immediate answers to complex queries using documents with the most updated information
  • Search previously answered questions through a centralized dashboard
  • Reduce stress caused by spending time manually reading information to look for answers

Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) reduces some of the shortcomings of LLM based queries by finding the answers from your knowledge base and using the LLM to summarize the documents into concise responses. Please read this post to learn how to implement the RAG approach with Amazon Kendra. The following risks and limitations are associated with LLM based queries that a RAG approach with Amazon Kendra addresses:

  • Hallucinations and traceability – LLMS are trained on large data sets and generate responses on probabilities. This can lead to inaccurate answers, which are known as hallucinations.
  • Multiple data silos – In order to reference data from multiple sources within your response, one needs to set up a connector ecosystem to aggregate the data. Accessing multiple repositories is manual and time-consuming.
  • Security – Security and privacy are critical considerations when deploying conversational bots powered by RAG and LLMs. Despite using Amazon Comprehend to filter out personal data that may be provided through user queries, there remains a possibility of unintentionally surfacing personal or sensitive information, depending on the ingested data. This means that controlling access to the chatbot is crucial to prevent unintended access to sensitive information.
  • Data relevance – LLMS are trained on data up to certain date, which means information is often not current. The cost associated with training models on recent data is high. To ensure accurate and up-to-date responses, organizations bear the responsibility of regularly updating and enriching the content of the indexed documents.
  • Cost – The cost associated with deploying this solution should be a consideration for businesses. Businesses need to carefully assess their budget and performance requirements when implementing this solution. Running LLMs can require substantial computational resources, which may increase operational costs. These costs can become a limitation for applications that need to operate at a large scale. However, one of the benefits of the AWS Cloud is the flexibility to only pay for what you use. AWS offers a simple, consistent, pay-as-you-go pricing model, so you are charged only for the resources you consume.

Usage of Amazon SageMaker JumpStart

For transformer-based language models, organizations can benefit from using Amazon SageMaker JumpStart, which offers a collection of pre-built machine learning models. Amazon SageMaker JumpStart offers a wide range of text generation and question-answering (Q&A) foundational models that can be easily deployed and utilized. This solution integrates a FLAN T5-XL Amazon SageMaker JumpStart model, but there are different aspects to keep in mind when choosing a foundation model.

Integrating security in our workflow

Following the best practices of the Security Pillar of the Well-Architected Framework, Amazon Cognito is used for authentication. Amazon Cognito User Pools can be integrated with third-party identity providers that support several frameworks used for access control, including Open Authorization (OAuth), OpenID Connect (OIDC), or Security Assertion Markup Language (SAML). Identifying users and their actions allows the solution to maintain traceability. The solution also uses the Amazon Comprehend personally identifiable information (PII) detection feature to automatically identity and redact PII. Redacted PII includes addresses, social security numbers, email addresses, and other sensitive information. This design ensures that any PII provided by the user through the input query is redacted. The PII is not stored, used by Amazon Kendra, or fed to the LLM.

Solution Walkthrough

The following steps describe the workflow of the Question answering over documents flow:

  1. Users send a query through a web interface.
  2. Amazon Cognito is used for authentication, ensuring secure access to the web application.
  3. The web application front-end is hosted on AWS Amplify.
  4. Amazon API Gateway hosts a REST API with various endpoints to handle user requests that are authenticated using Amazon Cognito.
  5. PII redaction with Amazon Comprehend:
    • User Query Processing: When a user submits a query or input, it is first passed through Amazon Comprehend. The service analyzes the text and identifies any PII entities present within the query.
    • PII Extraction: Amazon Comprehend extracts the detected PII entities from the user query.
  6. Relevant Information Retrieval with Amazon Kendra:
    • Amazon Kendra is used to manage an index of documents that contains the information used to generate answers to the user’s queries.
    • The LangChain QA retrieval module is used to build a conversation chain that has relevant information about the user’s queries.
  7. Integration with Amazon SageMaker JumpStart:
    • The AWS Lambda function uses the LangChain library and connects to the Amazon SageMaker JumpStart endpoint with a context-stuffed query. The Amazon SageMaker JumpStart endpoint serves as the interface of the LLM used for inference.
  8. Storing responses and returning it to the user:
    • The response from the LLM is stored in Amazon DynamoDB along with the user’s query, the timestamp, a unique identifier, and other arbitrary identifiers for the item such as question category. Storing the question and answer as discrete items allows the AWS Lambda function to easily recreate a user’s conversation history based on the time when questions were asked.
    • Finally, the response is sent back to the user via a HTTPs request through the Amazon API Gateway REST API integration response.

The following steps describe the AWS Lambda functions and their flow through the process:

  1. Check and redact any PII / Sensitive info
  2. LangChain QA Retrieval Chain
    • Search and retrieve relevant info
  3. Context Stuffing & Prompt Engineering
    • LangChain
  4. Inference with LLM
  5. Return response & Save it

Use cases

There are many business use cases where customers can use this workflow. The following section explains how the workflow can be used in different industries and verticals.

Employee Assistance

Well-designed corporate training can improve employee satisfaction and reduce the time required for onboarding new employees. As organizations grow and complexity increases, employees find it difficult to understand the many sources of internal documents. Internal documents in this context include company guidelines, policies, and Standard Operating Procedures. For this scenario, an employee has a question in how to proceed and edit an internal issue ticketing ticket. The employee can access and use the generative artificial intelligence (AI) conversational bot to ask and execute the next steps for a specific ticket.

Specific use case: Automate issue resolution for employees based on corporate guidelines.

The following steps describe the AWS Lambda functions and their flow through the process:

  1. LangChain agent to identify the intent
  2. Send notification based on employee request
  3. Modify ticket status

In this architecture diagram, corporate training videos can be ingested through Amazon Transcribe to collect a log of these video scripts. Additionally, corporate training content stored in various sources (i.e., Confluence, Microsoft SharePoint, Google Drive, Jira, etc.) can be used to create indexes through Amazon Kendra connectors. Read this article to learn more on the collection of native connectors you can utilize in Amazon Kendra as a source point. The Amazon Kendra crawler is then able to use both the corporate training video scripts and documentation stored in these other sources to assist the conversational bot in answering questions specific to company corporate training guidelines. The LangChain agent verifies permissions, modifies ticket status, and notifies the correct individuals using Amazon Simple Notification Service (Amazon SNS).

Customer Support Teams

Quickly resolving customer queries improves the customer experience and encourages brand loyalty. A loyal customer base helps drive sales, which contributes to the bottom line and increases customer engagement. Customer support teams spend lots of energy referencing many internal documents and customer relationship management software to answer customer queries about products and services. Internal documents in this context can include generic customer support call scripts, playbooks, escalation guidelines, and business information. The generative AI conversational bot helps with cost optimization because it handles queries on behalf of the customer support team.

Specific use case: Handling an oil change request based on service history and customer service plan purchased.

In this architecture diagram, the customer is routed to either the generative AI conversational bot or the Amazon Connect contact center. This decision can be based on the level of support needed or the availability of customer support agents. The LangChain agent identifies the customer’s intent and verifies identity. The LangChain agent also checks the service history and purchased support plan.

The following steps describe the AWS Lambda functions and their flow through the process:

  1. LangChain agent identifies the intent
  2. Retrieve Customer Information
  3. Check customer service history and warranty information
  4. Book appointment, provide more information, or route to contact center
  5. Send email confirmation

Amazon Connect is used to collect the voice and chat logs, and Amazon Comprehend is used to remove personally identifiable information (PII) from these logs. The Amazon Kendra crawler is then able to use the redacted voice and chat logs, customer call scripts, and customer service support plan policies to create the index. Once a decision is made, the generative AI conversational bot decides whether to book an appointment, provide more information, or route the customer to the contact center for further assistance. For cost optimization, the LangChain agent can also generate answers using fewer tokens and a less expensive large language model for lower priority customer queries.

Financial Services

Financial services companies rely on timely use of information to stay competitive and comply with financial regulations. Using a generative AI conversational bot, financial analysts and advisors can interact with textual information in a conversational manner and reduce the time and effort it takes to make better informed decisions. Outside of investment and market research, a generative AI conversational bot can also augment human capabilities by handling tasks that would traditionally require more human effort and time. For example, a financial institution specializing in personal loans can increase the rate at which loans are processed while providing better transparency to customers.

Specific use case: Use customer financial history and previous loan applications to decide and explain loan decision.

The following steps describe the AWS Lambda functions and their flow through the process:

  1. LangChain agent to identify the intent
  2. Check customer financial and credit score history
  3. Check internal customer relationship management system
  4. Check standard loan policies and suggest decision for employee qualifying the loan
  5. Send notification to customer

This architecture incorporates customer financial data stored in a database and data stored in a customer relationship management (CRM) tool. These data points are used to inform a decision based on the company’s internal loan policies. The customer is able to ask clarifying questions to understand what loans they qualify for and the terms of the loans they can accept. If the generative AI conversational bot is unable to approve a loan application, the user can still ask questions about improving credit scores or alternative financing options.

Government

Generative AI conversational bots can greatly benefit government institutions by speeding up communication, efficiency, and decision-making processes. Generative AI conversational bots can also provide instant access to internal knowledge bases to help government employees to quickly retrieve information, policies, and procedures (i.e., eligibility criteria, application processes, and citizen’s services and support). One solution is an interactive system, which allows tax payers and tax professionals to easily find tax-related details and benefits. It can be used to understand user questions, summarize tax documents, and provide clear answers through interactive conversations.

Users can ask questions such as:

  • How does inheritance tax work and what are the tax thresholds?
  • Can you explain the concept of income tax?
  • What are the tax implications when selling a second property?

Additionally, users can have the convenience of submitting tax forms to a system, which can help verify the correctness of the information provided.

This architecture illustrates how users can upload completed tax forms to the solution and utilize it for interactive verification and guidance on how to accurately completing the necessary information.

Healthcare

Healthcare businesses have the opportunity to automate the use of large amounts of internal patient information, while also addressing common questions regarding use cases such as treatment options, insurance claims, clinical trials, and pharmaceutical research. Using a generative AI conversational bot enables quick and accurate generation of answers about health information from the provided knowledge base. For example, some healthcare professionals spend a lot of time filling in forms to file insurance claims.

In similar settings, clinical trial administrators and researchers need to find information about treatment options. A generative AI conversational bot can use the pre-built connectors in Amazon Kendra to retrieve the most relevant information from the millions of documents published through ongoing research conducted by pharmaceutical companies and universities.

Specific use case: Reduce the errors and time needed to fill out and send insurance forms.

In this architecture diagram, a healthcare professional is able to use the generative AI conversational bot to figure out what forms need to be filled out for the insurance. The LangChain agent is then able to retrieve the right forms and add the needed information for a patient as well as giving responses for descriptive parts of the forms based on insurance policies and previous forms. The healthcare professional can edit the responses given by the LLM before approving and having the form delivered to the insurance portal.

The following steps describe the AWS Lambda functions and their flow through the process:

  1. LangChain agent to identify the intent
  2. Retrieve the patient information needed
  3. Fill out the insurance form based on the patient information and form guideline
  4. Submit the form to the insurance portal after user approval

AWS HealthLake is used to securely store the health data including previous insurance forms and patient information, and Amazon Comprehend is used to remove personally identifiable information (PII) from the previous insurance forms. The Amazon Kendra crawler is then able to use the set of insurance forms and guidelines to create the index. Once the form(s) are filled out by the generative AI, then the form(s) reviewed by the medical professional can be sent to the insurance portal.

Cost estimate

The cost of deploying the base solution as a proof-of-concept is shown in the following table. Since the base solution is considered a proof-of-concept, Amazon Kendra Developer Edition was used as a low-cost option since the workload would not be in production. Our assumption for Amazon Kendra Developer Edition was 730 active hours for the month.

For Amazon SageMaker, we made an assumption that the customer would be using the ml.g4dn.2xlarge instance for real-time inference, with a single inference endpoint per instance. You can find more information on Amazon SageMaker pricing and available inference instance types here.

Service Resources Consumed Cost Estimate Per Month in USD
AWS Amplify 150 build minutes
1 GB of Data served
500,000 requests
15.71
Amazon API Gateway 1M REST API Calls 3.5
AWS Lambda 1 Million requests
5 seconds duration per request
2 GB memory allocated
160.23
Amazon DynamoDB 1 million reads
1 million writes
100 GB storage
26.38
Amazon Sagemaker Real-time inference with ml.g4dn.2xlarge 676.8
Amazon Kendra Developer Edition with 730 hours/month
10,000 Documents scanned
5,000 queries/day
821.25
. . Total Cost: 1703.87

*  Amazon Cognito has a free tier of 50,000 Monthly Active Users who use Cognito User Pools or 50 Monthly Active Users who use SAML 2.0 identity providers

Clean Up

To save costs, delete all the resources you deployed as part of the tutorial. You can delete any SageMaker endpoints you may have created via the SageMaker console. Remember, deleting an Amazon Kendra index doesn’t remove the original documents from your storage.

Conclusion

In this post, we showed you how to simplify access to internal information by summarizing from multiple repositories in real-time. After the recent developments of commercially available LLMs, the possibilities of generative AI have become more apparent. In this post, we showcased ways to use AWS services to create a serverless chatbot that uses generative AI to answer questions. This approach incorporates an authentication layer and Amazon Comprehend’s PII detection to filter out any sensitive information provided in the user’s query. Whether it be individuals in healthcare understanding the nuances to file insurance claims or HR understanding specific company-wide regulations, there’re multiple industries and verticals that can benefit from this approach. An Amazon SageMaker JumpStart foundation model is the engine behind the chatbot, while a context stuffing approach using the RAG technique is used to ensure that the responses more accurately reference internal documents.

To learn more about working with generative AI on AWS, refer to Announcing New Tools for Building with Generative AI on AWS. For more in-depth guidance on using the RAG technique with AWS services, refer to Quickly build high-accuracy Generative AI applications on enterprise data using Amazon Kendra, LangChain, and large language models. Since the approach in this blog is LLM agnostic, any LLM can be used for inference. In our next post, we’ll outline ways to implement this solution using Amazon Bedrock and the Amazon Titan LLM.


About the Authors

Abhishek Maligehalli Shivalingaiah is a Senior AI Services Solution Architect at AWS. He is passionate about building applications using Generative AI, Amazon Kendra and NLP. He has around 10 years of experience in building Data & AI solutions to create value for customers and enterprises. He has even built a (personal) chatbot for fun to answers questions about his career and professional journey. Outside of work he enjoys making portraits of family & friends, and loves creating artworks.

Medha Aiyah is an Associate Solutions Architect at AWS, based in Austin, Texas. She recently graduated from the University of Texas at Dallas in December 2022 with her Masters of Science in Computer Science with a specialization in Intelligent Systems focusing on AI/ML. She is interested to learn more about AI/ML and utilizing AWS services to discover solutions customers can benefit from.

Hugo Tse is an Associate Solutions Architect at AWS based in Seattle, Washington. He holds a Master’s degree in Information Technology from Arizona State University and a bachelor’s degree in Economics from the University of Chicago. He is a member of the Information Systems Audit and Control Association (ISACA) and International Information System Security Certification Consortium (ISC)2. He enjoys helping customers benefit from technology.

Ayman Ishimwe is an Associate Solutions Architect at AWS based in Seattle, Washington. He holds a Master’s degree in Software Engineering and IT from Oakland University. He has a prior experience in software development, specifically in building microservices for distributed web applications. He is passionate about helping customers build robust and scalable solutions on AWS cloud services following best practices.

Shervin Suresh is an Associate Solutions Architect at AWS based in Austin, Texas. He has graduated with a Masters in Software Engineering with a Concentration in Cloud Computing and Virtualization and a Bachelors in Computer Engineering from San Jose State University. He is passionate about leveraging technology to help improve the lives of people from all backgrounds.

Read More

Visualize an Amazon Comprehend analysis with a word cloud in Amazon QuickSight

Visualize an Amazon Comprehend analysis with a word cloud in Amazon QuickSight

Searching for insights in a repository of free-form text documents can be like finding a needle in a haystack. A traditional approach might be to use word counting or other basic analysis to parse documents, but with the power of Amazon AI and machine learning (ML) tools, we can gather deeper understanding of the content.

Amazon Comprehend is a fully, managed service that uses natural language processing (NLP) to extract insights about the content of documents. Amazon Comprehend develops insights by recognizing the entities, key phrases, sentiment, themes, and custom elements in a document. Amazon Comprehend can create new insights based on understanding the document structure and entity relationships. For example, with Amazon Comprehend, you can scan an entire document repository for key phrases.

Amazon Comprehend lets non-ML experts easily do tasks that normally take hours of time. Amazon Comprehend eliminates much of the time needed to clean, build, and train your own model. For building deeper custom models in NLP or any other domain, Amazon SageMaker enables you to build, train, and deploy models in a much more conventional ML workflow if desired.

In this post, we use Amazon Comprehend and other AWS services to analyze and extract new insights from a repository of documents. Then, we use Amazon QuickSight to generate a simple yet powerful word cloud visual to easily spot themes or trends.

Overview of solution

The following diagram illustrates the solution architecture.

To begin, we gather the data to be analyzed and load it into an Amazon Simple Storage Service (Amazon S3) bucket in an AWS account. In this example, we use text formatted files. The data is then analyzed by Amazon Comprehend. Amazon Comprehend creates a JSON formatted output that needs to be transformed and processed into a database format using AWS Glue. We verify the data and extract specific formatted data tables using Amazon Athena for a QuickSight analysis using a word cloud. For more information about visualizations, refer to Visualizing data in Amazon QuickSight.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Upload data to an S3 bucket

Upload your data to an S3 bucket. For this post, we use UTF-8 formatted text of the US Constitution as the input file. Then you’re ready to analyze the data and create visualizations.

Analyze data using Amazon Comprehend

There are many types of text-based and image information that can be processed using Amazon Comprehend. In addition to text files, you can use Amazon Comprehend for one-step classification and entity recognition to to accept image files, PDF files, and Microsoft Word files as input, which are not discussed in this post.

To analyze your data, complete the following steps:

  1. On the Amazon Comprehend console, choose Analysis jobs in the navigation pane.
  2. Choose Create analysis job.
  3. Enter a name for your job.
  4. For Analysis type, choose Key phrases.
  5. For Language¸ choose English.
  6. For Input data location, specify the folder you created as a prerequisite.
  7. For Output data location, specify the folder you created as a prerequisite.
  8. Choose Create an IAM role.
  9. Enter a suffix for the role name.
  10. Choose Create job.

The job will run and the status will be displayed on the Analysis jobs page.

Wait for the analysis job to complete. Amazon Comprehend will create a file and place it in the output data folder you provided. The file is in .gz or GZIP format.

This file needs to be download and converted to a non-compressed format. You can download an object from the data folder or S3 bucket using the Amazon S3 console.

  1. On the Amazon S3 console, select the object and choose Download. If you want to download the object to a specific folder, choose Download on the Actions menu.
  2. After you download the file to your local computer, open the zipped file and save it as an uncompressed file.

The uncompressed file must be uploaded to the output folder before the AWS Glue crawler can process it. For this example, we upload the uncompressed file into the same output folder that we use in later steps.

  1. On the Amazon S3 console, navigate to your S3 bucket and choose Upload.
  2. Choose Add files.
  3. Choose the uncompressed files from your local computer.
  4. Choose Upload.

After you upload the file, delete the original zipped file.

  1. On the Amazon S3 console, select the bucket and choose Delete.
  2. Confirm the file name to permanently delete the file by entering the file name in the text box.
  3. Choose Delete objects.

This will leave one file remaining in the output folder: the uncompressed file.

Convert JSON data to table format using AWS Glue

In this step, you prepare the Amazon Comprehend output to be used as input into Athena. The Amazon Comprehend output is in JSON format. You can use AWS Glue to convert JSON into a database structure to ultimately be read by QuickSight.

  1. On the AWS Glue console, choose Crawlers in the navigation pane.
  2. Choose Create crawler.
  3. Enter a name for your crawler.
  4. Choose Next.
  5. For Is your data already mapped to Glue tables, select Not yet.
  6. Add a data source.
  7. For S3 path, enter the location of the Amazon Comprehend output data folder.

Be sure to add the trailing / to the path name. AWS Glue will search the folder path for all files.

  1. Select Crawl all sub-folders.
  2. Choose Add an S3 data source.

  1. Create a new AWS Identity and Access Management (IAM) role for the crawler.
  2. Enter a name for the IAM role.
  3. Choose Update chosen IAM role to be sure the new role is assigned to the crawler.
  4. Choose Next to enter the output (database) information.
  5. Choose Add database.
  6. Enter a database name.
  7. Choose Next.
  8. Choose Create crawler.
  9. Choose Run crawler to run the crawler.

You can monitor the crawler status on the AWS Glue console.

Use Athena to prepare tables for QuickSight

Athena will extract data from the database tables the AWS Glue crawler created to provide a format that QuickSight will use to create the word cloud.

  1. On the Athena console, choose Query editor in the navigation pane.
  2. For Data source, choose AwsDataCatalog.
  3. For Database, choose the database the crawler created.

To create a table compatible for QuickSight, the data must be unnested from the arrays.

  1. The first step is to create a temporary database with the relevant Amazon Comprehend data:
CREATE TABLE temp AS
SELECT keyphrases, nested
FROM output
CROSS JOIN UNNEST(output.keyphrases) AS t (nested)
  1. The following statement limits to phrases of at least three words and groups by frequency of the phrases:
CREATE TABLE tableforquicksight AS
SELECT COUNT(*) AS count, nested.text
FROM temp
WHERE nested.Score > .9 AND 
 length(nested.text) - length(replace(nested.text, ' ', '')) + 1 > 2
GROUP BY nested.text
ORDER BY count desc

Use QuickSight to visualize output

Finally, you can create the visual output from the analysis.

  1. On the QuickSight console, choose New analysis.
  2. Choose New dataset.
  3. For Create a dataset, choose From new data sources.
  4. Choose Athena as the data source.
  5. Enter a name for the data source and choose Create data source.

  1. Choose Visualize.

Make sure QuickSight has access to the S3 buckets where the Athena tables are stored.

  1. On the QuickSight console, choose the user profile icon and choose Manage QuickSight.

  1. Choose Security & permissions.
  1. Look for the section QuickSight access to AWS services.

By configuring access to AWS services, QuickSight can access the data in those services. Access by users and groups can be controlled through the options.

  1. Verify Amazon S3 is granted access.

Now you can create the word cloud.

  1. Choose the word cloud under Visual types.
  2. Drag text to Group by and count to Size.


Choose the options menu (three dots) in the visualization to access the edit options. For example, you might want to hide the term “other” from the display. You can also edit items such as the title and subtitle for your visual. To download the word cloud as a PDF, choose Download on the QuickSight toolbar.

Clean up

To avoid incurring ongoing charges, delete any unused data and processes or resources provisioned on their respective service console.

Conclusion

Amazon Comprehend uses NLP to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. You can use Amazon Comprehend to create new products based on understanding the structure of documents. For example, with Amazon Comprehend, you can scan an entire document repository for key phrases.

This post described the steps to build a word cloud to visualize a text content analysis from Amazon Comprehend using AWS tools and QuickSight to visualize the data.

Let’s stay in touch via the comments section!


About the Authors

Kris Gedman is the US East sales leader for Retail & CPG at Amazon Web Services. When not working, he enjoys spending time with his friends and family, especially summers on Cape Cod. Kris is a temporarily retired Ninja Warrior but he loves watching and coaching his two sons for now.

Clark Lefavour is a Solutions Architect leader at Amazon Web Services, supporting enterprise customers in the East region. Clark is based in New England and enjoys spending time architecting recipes in the kitchen.

Read More

Amazon SageMaker simplifies the Amazon SageMaker Studio setup for individual users

Amazon SageMaker simplifies the Amazon SageMaker Studio setup for individual users

Today, we are excited to announce the simplified Quick setup experience in Amazon SageMaker. With this new capability, individual users can launch Amazon SageMaker Studio with default presets in minutes.

SageMaker Studio is an integrated development environment (IDE) for machine learning (ML). ML practitioners can perform all ML development steps—from preparing their data to building, training, and deploying ML models—within a single, integrated visual interface. You also get access to a large collection of models and pre-built solutions that you can deploy with a few clicks.

To use SageMaker Studio or other personal apps such as Amazon SageMaker Canvas, or to collaborate in shared spaces, AWS customers need to first set up a SageMaker domain. A SageMaker domain consists of an associated Amazon Elastic File System (Amazon EFS) volume, a list of authorized users, and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations. When a user is onboarded to a SageMaker domain, they are assigned a user profile that they can use to launch their apps. User authentication can be via AWS IAM Identity Center (successor to AWS Single Sign-On) or AWS Identity and Access Management (IAM).

Setting up a SageMaker domain and associated user profiles requires understanding the concepts of IAM roles, domains, authentication, and VPCs, and going through a number of configuration steps. To complete these configuration steps, data scientists and developers typically work with their IT admin teams who provision SageMaker Studio and set up the right guardrails.

Customers told us that the onboarding process can sometimes be time consuming, delaying data scientists and ML teams from getting started with SageMaker Studio. We listened and simplified the onboarding experience!

Introducing the simplified Quick Studio setup

The new Quick Studio setup experience for SageMaker provides a new onboarding and administration experience that makes it easy for individual users to set up and manage SageMaker Studio. Data scientists and ML admins can set up SageMaker Studio in minutes with a single click. SageMaker takes care of provisioning the SageMaker domain with default presets, including setting up the IAM role, IAM authentication, and public internet mode. ML admins can alter SageMaker Studio settings for the created domain and customize the UI further at any time. Let’s take a look at how it works.

Prerequisites

To use the Quick Studio setup, you need the following:

  • An AWS account
  • An IAM role with permissions to create the resources needed to set up a SageMaker domain

Use the Quick Studio setup option

Let’s discuss a scenario where a new user wants to access SageMaker Studio. The user experience includes the following steps:

  1. In your AWS account, navigate to the SageMaker console and choose Set up for single user.

SageMaker starts preparing the SageMaker domain. This process typically takes a few minutes. The new domain’s name is prefixed with QuickSetupDomain-.

As soon as the SageMaker domain is ready, a notification appears on the screen stating “The SageMaker Domain is ready” and the user profile under the domain is also created successfully.

  1. Choose Launch next to the created user profile and choose Studio.

Because it’s the first time SageMaker Studio is getting launched for this user profile, SageMaker creates a new JupyterServer app, which takes a few minutes.

A few minutes later, the Studio IDE loads and you’re presented with the SageMaker Studio Home page.

Components of the Quick Studio setup

When using the Quick Studio setup, SageMaker creates the following resources:

  • A new IAM role with the appropriate permissions for using SageMaker Studio, Amazon Simple Storage Service (Amazon S3), and SageMaker Canvas. You can modify the permissions of the created IAM role at any time based on your use case or persona-specific requirements.
  • Another IAM role prefixed with AmazonSagemakerCanvasForecastRole-, which enables permissions for the SageMaker Canvas time series forecasting feature.
  • A SageMaker Studio domain and a user profile for the domain with unique names. IAM is used as the authentication mode. The IAM role created is used as the default SageMaker execution role for the domain and user profile. You can launch any of the personal apps available, such as SageMaker Studio and SageMaker Canvas, which are enabled by default.
  • An EFS volume, which serves as the file system for SageMaker Studio. Apart from Amazon EFS, a new S3 bucket with prefix sagemaker-studio- is created for notebook sharing.

SageMaker Studio also uses the default VPC and its associated subnets. If there is no default VPC, or if the default VPC has no subnets, then it selects one of the existing VPCs that has associated subnets. If there is no VPC, it will prompt the user to create one on the Amazon VPC console. The VPC with all subnets under it are used to set up Amazon EFS.

Conclusion

Now, a single click is all it takes to get started with SageMaker Studio. The Quick Studio setup for individual users is available in all AWS commercial Regions where SageMaker is currently available.

Try out this new feature on the SageMaker console and let us know what you think. We always look forward to your feedback! You can send it through your usual AWS Support contacts or post it on the AWS Forum for SageMaker.


About the authors

Vikesh Pandey is a Machine Learning Specialist Solutions Architect at AWS, helping customers from financial industries design and build solutions on generative AI and ML. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.

Anastasia Tzeveleka is a Machine Learning and AI Specialist Solutions Architect at AWS. She works with customers in EMEA and helps them architect machine learning solutions at scale using AWS services. She has worked on projects in different domains including natural language processing (NLP), MLOps, and low-code/no-code tools.

Read More

Unlocking language barriers: Translate application logs with Amazon Translate for seamless support

Unlocking language barriers: Translate application logs with Amazon Translate for seamless support

Application logs are an essential piece of information that provides crucial insights into the inner workings of an application. This includes valuable information such as events, errors, and user interactions that would aid an application developer or an operations support engineer to debug and provide support. However, when these logs are presented in languages other than English, it creates a significant hurdle for developers who can’t read the content, and hinders the support team’s ability to identify and address issues promptly.

In this post, we explore a solution on how you can unlock language barriers using Amazon Translate, a fully managed neural machine translation service for translating text to and from English across a wide range of supported languages. The solution will complement your existing logging workflows by automatically translating all your applications logs in Amazon CloudWatch in real time, which can alleviate the challenges posed by non-English application logs.

Solution overview

This solution shows you how you can use three key services to automate the translation of your application logs in an event-driven manner:

  • CloudWatch Logs is used to monitor, store, and access your log files generated from various sources such as AWS services and your applications
  • Amazon Translate is used to perform the translation of text to and from English
  • AWS Lambda is a compute service that lets you run codes to retrieve application logs and translate them through the use of the Amazon Translate SDK

The following diagram illustrates the solution architecture.

The workflow consists of the following steps:

  1. A custom or third-party application is hosted on an Amazon Elastic Compute Cloud (Amazon EC2) instance and the generated application logs are uploaded to CloudWatch Logs via the CloudWatch Logs agent.
  2. Each log entry written to CloudWatch Logs triggers the Lambda function subscribed to the CloudWatch log group.
  3. The function processes the contents of the log entry and uses Amazon Translate SDK translate_text to translate the log content.
  4. The translated log content is returned to the function.
  5. The function writes the translated log content back to CloudWatch Logs in a different log group.

The entire process happens automatically in real time, and your developers will be able to access the translated application logs from the CloudWatch log groups with no change in how your existing application writes logs to CloudWatch.

Prerequisites

To follow through the instructions in this solution, you need an AWS account with an AWS Identity and Access Management (IAM) user who has permission to AWS CloudFormation, Amazon Translate, CloudWatch, Lambda, and IAM.

Deploy the solution

To get started, launch the following CloudFormation template to create a Lambda function, two CloudWatch log groups, and IAM role. Proceed to deploy with the default settings. This template takes about 1 minute to complete.

After the stack is created successfully, you can review the Lambda function by navigating to the Lambda console and locating the function translate-application-logs.

You can observe that there is a CloudWatch Logs trigger added to the function.

You can view the details of the trigger configuration by navigating to the Configuration tab and choosing Triggers in the navigation pane.

You can confirm that the trigger has been configured to subscribe to log events from the log group /applicationlogs. This is where your non-English application logs will be written to.

Next, choose Environment variables in the navigation pane.

Two environment variables are provided here:

  • source_language – The original language that the application log is in (for example, ja for Japanese)
  • target_language – The target language to translate the application log to (for example, en for English)

For a list of supported languages, refer to Supported languages and language codes.

Next, go to the Code tab and review the function logic:

import json, boto3, gzip, base64, os

translate = boto3.client(service_name='translate', region_name=os.environ['AWS_REGION'], use_ssl=True)
logs = boto3.client('logs')
    
def lambda_handler(event, context):
    # retrieve log messages
    encoded_zipped_data = event['awslogs']['data']
    zipped_data = base64.b64decode(encoded_zipped_data)
    data = gzip.decompress(zipped_data)
    json_log = json.loads(data)
    logGroup = json_log['logGroup']+'-'+os.environ['target_language']
    logStream = json_log['logStream']
    
    # check  if log group exists, create if not    
    dlg = logs.describe_log_groups(logGroupNamePrefix=logGroup)
    if len(dlg['logGroups']) == 0:
        logs.create_log_group(logGroupName=logGroup)

    # check if log stream exists, create if not    
    dls = logs.describe_log_streams(logGroupName=logGroup, logStreamNamePrefix=logStream)
    if len(dls['logStreams']) == 0:
        logs.create_log_stream(logGroupName=logGroup, logStreamName=logStream)

    # translate log event messages from source language to target language
    for logevent in json_log['logEvents']:
        logevent['message'] = translate.translate_text(Text=logevent['message'], SourceLanguageCode=os.environ['source_language'], TargetLanguageCode=os.environ['target_language']).get('TranslatedText')
        del logevent['id']

    # write translated log events back to a different log group in CloudWatch
    logs.put_log_events(
        logGroupName = logGroup,
        logStreamName = logStream,
        logEvents = json_log['logEvents']
    )
    
    # return success
    return {
        'statusCode': 200,
        'body': 'Translation success!'
    }

Test the solution

Finally, to test the solution, you can create a log message through the CloudWatch console and choose the created log group and log stream.

After creating your log messages, you will be able to see it translated immediately.

Clean up

To clean up the resources created in this post, delete the CloudFormation stack via the CloudFormation console.

Conclusion

This post addressed the challenge faced by developers and support teams when application logs are presented in languages other than English, making it difficult for them to debug and provide support. The proposed solution uses Amazon Translate to automatically translate non-English logs in CloudWatch, and provides step-by-step guidance on deploying the solution in your environment. Through this implementation, developers can now seamlessly bridge the language barrier, empowering them to address issues swiftly and effectively.

Try out this implementation and let us know your thoughts in the comments.


About the author

Xan Huang is a Senior Solutions Architect with AWS and is based in Singapore. He works with major financial institutions to design and build secure, scalable, and highly available solutions in the cloud. Outside of work, Xan spends most of his free time with his family and documenting his daughter’s growing up journey.

Read More

Accelerate client success management through email classification with Hugging Face on Amazon SageMaker

Accelerate client success management through email classification with Hugging Face on Amazon SageMaker

This is a guest post from Scalable Capital, a leading FinTech in Europe that offers digital wealth management and a brokerage platform with a trading flat rate.

As a fast-growing company, Scalable Capital’s goals are to not only build an innovative, robust, and reliable infrastructure, but to also provide the best experiences for our clients, especially when it comes to client services.

Scalable receives hundreds of email inquiries from our clients on a daily basis. By implementing a modern natural language processing (NLP) model, the response process has been shaped much more efficiently, and waiting time for clients has been reduced tremendously. The machine learning (ML) model classifies new incoming customer requests as soon as they arrive and redirects them to predefined queues, which allows our dedicated client success agents to focus on the contents of the emails according to their skills and provide appropriate responses.

In this post, we demonstrate the technical benefits of using Hugging Face transformers deployed with Amazon SageMaker, such as training and experimentation at scale, and increased productivity and cost-efficiency.

Problem statement

Scalable Capital is one of the fastest growing FinTechs in Europe. With the aim to democratize investment, the company provides its clients with easy access to the financial markets. Clients of Scalable can actively participate in the market through the company’s brokerage trading platform, or use Scalable Wealth Management to invest in an intelligent and automated fashion. In 2021, Scalable Capital experienced a tenfold increase of its client base, from tens of thousands to hundreds of thousands.

To provide our clients with a top-class (and consistent) user experience across products and client service, the company was looking for automated solutions to generate efficiencies for a scalable solution while maintaining operational excellence. Scalable Capital’s data science and client service teams identified that one of the largest bottlenecks in servicing our clients was responding to email inquiries. Specifically, the bottleneck was the classification step, in which employees had to read and label request texts on a daily basis. After the emails were routed to their proper queues, the respective specialists quickly engaged and resolved the cases.

To streamline this classification process, the data science team at Scalable built and deployed a multitask NLP model using state-of-the-art transformer architecture, based on the pre-trained distilbert-base-german-cased model published by Hugging Face. distilbert-base-german-cased uses the knowledge distillation method to pretrain a smaller general-purpose language representation model than the original BERT base model. The distilled version achieves comparable performance to the original version, while being smaller and faster. To facilitate our ML lifecycle process, we decided to adopt SageMaker to build, deploy, serve, and monitor our models. In the following section, we introduce our project architecture design.

Solution overview

Scalable Capital’s ML infrastructure consists of two AWS accounts: one as an environment for the development stage and the other one for the production stage.

The following diagram shows the workflow for our email classifier project, but can also be generalized to other data science projects.

Email classification project diagram

Email classification project diagram

The workflow consists of the following components:

  • Model experimentation – Data scientists use Amazon SageMaker Studio to carry out the first steps in the data science lifecycle: exploratory data analysis (EDA), data cleaning and preparation, and building prototype models. When the exploratory phase is complete, we turn to VSCode hosted by a SageMaker notebook as our remote development tool to modularize and productionize our code base. To explore different types of models and model configurations, and at the same time to keep track of our experimentations, we use SageMaker Training and SageMaker Experiments.
  • Model build – After we decide on a model for our production use case, in this case a multi-task distilbert-base-german-cased model, fine-tuned from the pretrained model from Hugging Face, we commit and push our code to Github develop branch. The Github merge event triggers our Jenkins CI pipeline, which in turn starts a SageMaker Pipelines job with test data. This acts as a test to make sure that codes are running as expected. A test endpoint is deployed for testing purposes.
  • Model deployment – After making sure that everything is running as expected, data scientists merge the develop branch into the primary branch. This merge event now triggers a SageMaker Pipelines job using production data for training purposes. Afterwards, model artifacts are produced and stored in an output Amazon Simple Storage Service (Amazon S3) bucket, and a new model version is logged in the SageMaker model registry. Data scientists examine the performance of the new model, then approve if it’s in line with expectations. The model approval event is captured by Amazon EventBridge, which then deploys the model to a SageMaker endpoint in the production environment.
  • MLOps – Because the SageMaker endpoint is private and can’t be reached by services outside of the VPC, an AWS Lambda function and Amazon API Gateway public endpoint are required to communicate with CRM. Whenever new emails arrive in the CRM inbox, CRM invokes the API Gateway public endpoint, which in turn triggers the Lambda function to invoke the private SageMaker endpoint. The function then relays the classification back to CRM through the API Gateway public endpoint. To monitor the performance of our deployed model, we implement a feedback loop between CRM and the data scientists to keep track of prediction metrics from the model. On a monthly basis, CRM updates the historical data used for experimentation and model training. We use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) as a scheduler for our monthly retrain.

In the following sections, we break down the data preparation, model experimentation, and model deployment steps in more detail.

Data preparation

Scalable Capital uses a CRM tool for managing and storing email data. Relevant email contents consist of subject, body, and the custodian banks. There are three labels to assign to each email: which line of business the email is from, which queue is appropriate, and the specific topic of the email.

Before we start training any NLP models, we ensure that the input data is clean and the labels are assigned according to expectation.

To retrieve clean inquiry contents from Scalable clients, we remove from raw email data and extra text and symbols, such as email signatures, impressums, quotes of previous messages in email chains, CSS symbols, and so on. Otherwise, our future trained models might experience degraded performance.

Labels for emails evolve over time as Scalable client service teams add new ones and refine or remove existing ones to accommodate business needs. To make sure that labels for training data as well as expected classifications for prediction are up to date, the data science team works in close collaboration with the client service team to ensure the correctness of the labels.

Model experimentation

We start our experiment with the readily available pre-trained distilbert-base-german-cased model published by Hugging Face. Because the pre-trained model is a general-purpose language representation model, we can adapt the architecture to perform specific downstream tasks—such as classification and question answering—by attaching appropriate heads to the neural network. In our use case, the downstream task we are interested in is sequence classification. Without modifying the existing architecture, we decide to fine-tune three separate pre-trained models for each of our required categories. With the SageMaker Hugging Face Deep Learning Containers (DLCs), starting and managing NLP experiments are made simple with Hugging Face containers and the SageMaker Experiments API.

The following is a code snippet of train.py:

config = AutoConfig.from_pretrained("distilbert-base-german-cased")  # load original config
config.num_labels = num_labels  # adapt original config to a specific number of labels (default is 2)
# instantiate a pretrained model
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-german-cased", config=config)

trainer = Trainer(
    model=model,  # the instantiated Transformers model to be trained
    args=training_args,  # training arguments, defined above
    train_dataset=train_dataset,  # training dataset
    eval_dataset=val_dataset  # evaluation dataset
)
trainer.train()

The following code is the Hugging Face estimator:

huggingface_estimator = HuggingFace(
    entry_point='train.py',
    source_dir='./scripts',
    instance_type='ml.p3.2xlarge',
    instance_count=1,
    role=role,
    transformers_version='4.26.0',
    pytorch_version='1.13.1',
    py_version='py39',
    hyperparameters = hyperparameters
)

To validate the fine-tuned models, we use the F1-score due to the imbalanced nature of our email dataset, but also to compute other metrics such as accuracy, precision, and recall. For the SageMaker Experiments API to register the training job’s metrics, we need to first log the metrics to the training job local console, which are picked up by Amazon CloudWatch. Then we define the correct regex format to capture the CloudWatch logs. The metric definitions include the name of the metrics and regex validation for extracting the metrics from the training job:

metric_definitions = [
    {"Name": "train:loss", "Regex": "'loss': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "learning_rate", "Regex": "'learning_rate': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "val:loss", "Regex": "'eval_loss': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "train:accuracy", "Regex": "'train_accuracy': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "val:accuracy", "Regex": "'eval_accuracy': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "train:precision", "Regex": "'train_precision': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "val:precision", "Regex": "'eval_precision': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "train:recall", "Regex": "'train_recall': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "val:recall", "Regex": "'eval_recall': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "train:f1", "Regex": "'train_f1': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "val:f1", "Regex": "'eval_f1': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "val:runtime", "Regex": "'eval_runtime': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "val:samples_per_second", "Regex": "'eval_samples_per_second': ([0-9]+(.|e-)[0-9]+),?"},
    {"Name": "epoch", "Regex": "'epoch': ([0-9]+(.|e-)[0-9]+),?"},
]

As part of the training iteration for the classifier model, we use a confusion matrix and classification report to evaluate the result. The following figure shows the confusion matrix for line of business prediction.

Confusion Matrix

Confusion Matrix

The following screenshot shows an example of the classification report for line of business prediction.

Classification Report

Classification Report

As a next iteration of our experiment, we’ll take advantage of multi-task learning to improve our model. Multi-task learning is a form of training where a model learns to solve multiple tasks simultaneously, because the shared information among tasks can improve learning efficiencies. By attaching two more classification heads to the original distilbert architecture, we can carry out multi-task fine-tuning, which attains reasonable metrics for our client service team.

Model deployment

In our use case, the email classifier is to be deployed to an endpoint, to which our CRM pipeline can send a batch of unclassified emails and get back predictions. Because we have other logics—such as input data cleaning and multi-task predictions—in addition to Hugging Face model inference, we need to write a custom inference script that adheres to the SageMaker standard.

The following is a code snippet of inference.py:

def model_fn(model_dir):
    model = load_from_artifact(model_dir)

    return model

def transform_fn(model, input_data, content_type, accept):
    if content_type == "application/json":
        data = json.loads(input_data)
        data = pd.DataFrame(data)
        
    else:
        raise ValueError(f"Unsupported content type: {content_type}")

    data = preprocess(data)

    # Inference
    with torch.no_grad():
        predictions = model(data)

    predictions = postprocess(predictions)

    if content_type == 'application/json':
        return json.dumps(predictions.to_dict(orient="records"))
    else:
        raise NotImplementedError

When everything is up and ready, we use SageMaker Pipelines to manage our training pipeline and attach it to our infrastructure to complete our MLOps setup.

To monitor the performance of the deployed model, we build a feedback loop to enable CRM to provide us with the status of classified emails when cases are closed. Based on this information, we make adjustments to improve the deployed model.

Conclusion

In this post, we shared how SageMaker facilitates the data science team at Scalable to manage the lifecycle of a data science project efficiently, namely the email classifier project. The lifecycle starts with the initial phase of data analysis and exploration with SageMaker Studio; moves on to model experimentation and deployment with SageMaker training, inference, and Hugging Face DLCs; and completes with a training pipeline with SageMaker Pipelines integrated with other AWS services. Thanks to this infrastructure, we are able to iterate and deploy new models more efficiently, and are therefore able to improve existing processes within Scalable as well as our clients’ experiences.

To learn more about Hugging Face and SageMaker, refer to the following resources:


About the Authors

Dr. Sandra Schmid is Head of Data Analytics at Scalable GmbH. She is responsible for data-driven approaches and use cases in the company together with her teams. Her key focus is finding the best combination of machine learning and data science models and business goals in order to gain as much business value and efficiencies out of data as possible.

Huy Dang Data Scientist at Scalable GmbH. His responsibilities include data analytics, building and deploying machine learning models, as well as developing and maintaining infrastructure for the data science team. In his spare time, he enjoys reading, hiking, rock climbing, and staying up to date with the latest machine learning developments.

Mia Chang is a ML Specialist Solutions Architect for Amazon Web Services. She works with customers in EMEA and shares best practices for running AI/ML workloads on the cloud with her background in applied mathematics, computer science, and AI/ML. She focuses on NLP-specific workloads, and shares her experience as a conference speaker and a book author. In her free time, she enjoys yoga, board games, and brewing coffee.

Moritz Guertler is an Account Executive in the Digital Native Businesses segment at AWS. He focuses on customers in the FinTech space and supports them in accelerating innovation through secure and scalable cloud infrastructure.

Read More

Falcon 180B foundation model from TII is now available via Amazon SageMaker JumpStart

Falcon 180B foundation model from TII is now available via Amazon SageMaker JumpStart

Today, we are excited to announce that the Falcon 180B foundation model developed by Technology Innovation Institute (TII) is available for customers through Amazon SageMaker JumpStart to deploy with one-click for running inference. With a 180-billion-parameter size and trained on a massive 3.5-trillion-token dataset, Falcon 180B is the largest and one of the most performant models with openly accessible weights. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Falcon 180B model via SageMaker JumpStart.

What is Falcon 180B

Falcon 180B is a model released by TII that follows previous releases in the Falcon family. It’s a scaled-up version of Falcon 40B, and it uses multi-query attention for better scalability. It’s an auto-regressive language model that uses an optimized transformer architecture. It was trained on 3.5 trillion tokens of data, primarily consisting of web data from RefinedWeb (approximately 85%). The model has two versions: 180B and 180B-Chat. 180B is a raw, pre-trained model, which should be further fine-tuned for most use cases. 180B-Chat is better suited to taking generic instructions. The Chat model has been fine-tuned on chat and instructions datasets together with several large-scale conversational datasets.

The model is made available under the Falcon-180B TII License and Acceptable Use Policy.

Falcon 180B was trained by TII on Amazon SageMaker, on a cluster of approximately 4K A100 GPUs. It used a custom distributed training codebase named Gigatron, which uses 3D parallelism with ZeRO, and custom, high-performance Triton kernels. The distributed training architecture used Amazon Simple Storage Service (Amazon S3) as the sole unified service for data loading and checkpoint writing and reading, which particularly contributed to the workload reliability and operational simplicity.

What is SageMaker JumpStart

With SageMaker JumpStart, ML practitioners can choose from a growing list of best-performing foundation models. ML practitioners can deploy foundation models to dedicated SageMaker instances within a network isolated environment, and customize models using Amazon SageMaker for model training and deployment.

You can now discover and deploy Falcon 180B with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping ensure data security. Falcon 180B is discoverable and can be deployed in Regions where the requisite instances are available. At present, ml.p4de instances are available in US East (N. Virginia) and US West (Oregon).

Discover models

You can access the foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.

In SageMaker Studio, you can access SageMaker JumpStart, which contains pre-trained models, notebooks, and prebuilt solutions, under Prebuilt and automated solutions.

From the SageMaker JumpStart landing page, you can browse for solutions, models, notebooks, and other resources. You can find Falcon 180B in the Foundation Models: Text Generation carousel.

You can also find other model variants by choosing Explore all Text Generation Models or searching for Falcon.

You can choose the model card to view details about the model such as license, data used to train, and how to use. You will also find two buttons, Deploy and Open Notebook, which will help you use the model (the following screenshot shows the Deploy option).

Deploy models

When you choose Deploy, the model deployment will start. Alternatively, you can deploy through the example notebook that shows up by choosing Open Notebook. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.

To deploy using a notebook, we start by selecting an appropriate model, specified by the model_id. You can deploy any of the selected models on SageMaker with the following code:

from sagemaker.jumpstart.model import JumpStartModel

my_model = JumpStartModel(model_id="huggingface-llm-falcon-180b-chat-bf16") predictor = my_model.deploy()

This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. To learn more, refer to the API documentation. After it’s deployed, you can run inference against the deployed endpoint through a SageMaker predictor. See the following code:

payload = {
    "inputs": "User: Hello!nFalcon: ",
    "parameters": {"max_new_tokens": 256, "top_p": 0.9, "temperature": 0.6}
}
response = predictor.predict(payload)

Inference parameters control the text generation process at the endpoint. The max new tokens control refers to the size of the output generated by the model. Note that this is not the same as the number of words because the vocabulary of the model is not the same as the English language vocabulary and each token may not be an English language word. Temperature controls the randomness in the output. Higher temperature results in more creative and hallucinated outputs. All the inference parameters are optional.

This 180B parameter model is 335GB and requires even more GPU memory to sufficiently perform inference in 16-bit precision. Currently, JumpStart only supports this model on ml.p4de.24xlarge instances. It is possible to deploy an 8-bit quantized model on a ml.p4d.24xlarge instance by providing the env={"HF_MODEL_QUANTIZE": "bitsandbytes"} keyword argument to the JumpStartModel constructor and specifying instance_type="ml.p4d.24xlarge" to the deploy method. However, please note that per-token latency is approximately 5x slower for this quantized configuration.

The following table lists all the Falcon models available in SageMaker JumpStart along with the model IDs, default instance types, maximum number of total tokens (sum of the number of input tokens and number of generated tokens) supported, and the typical response latency per token for each of these models.

Model Name Model ID Default Instance Type Max Total Tokens Latency per Token*
Falcon 7B huggingface-llm-falcon-7b-bf16 ml.g5.2xlarge 2048 34 ms
Falcon 7B Instruct huggingface-llm-falcon-7b-instruct-bf16 ml.g5.2xlarge 2048 34 ms
Falcon 40B huggingface-llm-falcon-40b-bf16 ml.g5.12xlarge 2048 57 ms
Falcon 40B Instruct huggingface-llm-falcon-40b-instruct-bf16 ml.g5.12xlarge 2048 57 ms
Falcon 180B huggingface-llm-falcon-180b-bf16 ml.p4de.24xlarge 2048 45 ms
Falcon 180B Chat huggingface-llm-falcon-180b-chat-bf16 ml.p4de.24xlarge 2048 45 ms

*per-token latency is provided for the median response time of the example prompts provided in this blog; this value will vary based on length of input and output sequences.

Inference and example prompts for Falcon 180B

Falcon models can be used for text completion for any piece of text. Through text generation, you can perform a variety of tasks, such as answering questions, language translation, sentiment analysis, and many more. The endpoint accepts the following input payload schema:

payload = {
    "inputs": str,
    (optional)"parameters": {
        "do_sample": bool,
        "max_new_tokens": int,
        "repetition_penalty": float,
        "return_full_text": bool,
        "stop": List[str],
        "seed": int,
        "temperature": float,
        "top_k": int,
        "top_p": float,
        "truncate": int,
        "typical_p": float,
        "best_of": int,
        "watermark": bool,
        "details": bool,
        "decoder_input_details": bool,
    }
}

You can explore the definition of these client parameters and their default values within the text-generation-inference repository.

The following are some sample example prompts and the text generated by the model. All outputs here are generated with inference parameters {"max_new_tokens": 768, "stop": ["<|endoftext|>", "###"]}.

Building a website can be done in 10 simple steps:

Input Building a website can be done in 10 simple steps:

Output
- Choose a domain name
- Register your domain name
- Choose a web host
- Choose a website builder
- Create your website
- Add content to your website
- Add pages to your website
- Add images to your website
- Add videos to your website
- Add social media to your website
1. Choose a domain name
A domain name is the address of your website. It is what people will type into their browser to find your website. For example, the domain name of this website is.
When choosing a domain name, you should choose something that is easy to remember and that represents your business or website. You should also make sure that the domain name is available. You can check if a domain name is available by using a domain name search tool.
2. Register your domain name
Once you have chosen a domain name, you need to register it. This can be done through a domain name registrar. A domain name registrar is a company that manages the registration of domain names.
When registering your domain name, you will need to provide your contact information and choose a payment method. You will also need to choose a domain name extension. The most common domain name extensions are .com, .net, and .org.
3. Choose a web host
A web host is a company that provides space on a server for your website. When choosing a web host, you should choose one that is reliable and that offers good customer support. You should also make sure that the web host offers the features that you need for your website.
4. Choose a website builder
A website builder is a tool that allows you to create a website without having to learn how to code. There are many different website builders available, so you should choose one that is easy to use and that offers the features that you need.
5. Create your website
Once you have chosen a website builder, you can start creating your website. You will need to choose a template for your website and then add content to your website.
6. Add content to your website
The content of your website is what will make your website unique. You should add content that is relevant to your business or website. You can add text, images, videos, and other types of content to your website.
7. Add pages to your website
A website is made up of pages. You can add as many pages as you want to your website. Each page should have a different purpose. For example, you might have a page for your home page, a page for your about page, and a page for your contact page.
8. Add images to your website
Images can make your website more visually appealing. You can add images to your website by uploading them to your website or by using a image hosting service.
9. Add videos to your website
Videos can also make your website more visually appealing. You can add videos to your website by uploading them to your website or by using a video hosting service.
10. Add social media to your website
Social media can help you promote your website. You can add social media buttons to your website so that people can share your website on social media. You can also add social media feeds to your website so that people can see your latest social media posts.

You may notice this pretrained model generates long text sequences that are not necessarily ideal for dialog use cases. Before we show how the fine-tuned chat model performs for a larger set of dialog-based prompts, the next two examples illustrate how to use Falcon models with few-shot in-context learning, where we provide training samples available to the model. Note that “few-shot learning” does not adjust model weights — we only perform inference on the deployed model during this process while providing a few examples within the input context to help guild model output.

Input
Tweet: "I hate it when my phone battery dies."
Sentiment: Negative
###
Tweet: "My day has been :+1:"
Sentiment: Positive
###
Tweet: "This is the link to the article"
Sentiment: Neutral
###
Tweet: "This new music video was incredible"
Sentiment:

Output
 Positive
###
Input
Translate English to French:
sea otter => loutre de mer
###
peppermint => menthe poivrée
###
plush girafe => girafe peluche
###
cheese =>

Output
 fromage
###

Inference and example prompts for Falcon 180B-Chat

With Falcon 180B-Chat models, optimized for dialogue use cases, the input to the chat model endpoints may contain previous history between the chat assistant and the user. You can ask questions contextual to the conversation that has happened so far. You can also provide the system configuration, such as personas, which define the chat assistant’s behavior. Input payload to the endpoint is the same as the Falcon 180B model except the inputs string value should use the following format:

System: <An optional system prompt>
User: <Optional user message history>
Falcon: <Optional chat history Falcon response>
User: <Current user message>
Falcon:

The following are some sample example prompts and the text generated by the model. All outputs are generated with inference parameters {"max_new_tokens":256, "stop": ["nUser:", "<|endoftext|>", " User:", "###"]}.

Input
System: You are a helpful assistant
User: What is Amazon SageMaker JumpStart?
Falcon: 

Output 
Amazon SageMaker JumpStart is a capability of Amazon SageMaker that provides a set of solutions to common machine learning problems. These solutions include pre-built machine learning models, algorithms, and pre-configured model training and deployment settings. JumpStart also provides sample notebooks that demonstrate how to use these solutions for various use cases. The goal of JumpStart is to help users get started with machine learning quickly and easily, without the need for extensive machine learning expertise.
User:
Input
User: Simply put, the theory of relativity states that
Falcon: 

Output 
1. The laws of physics are the same for all observers in uniform motion relative to one another.
2. The speed of light in a vacuum is the same for all observers, regardless of their motion or the motion of the light source.
3. Time and space are relative, meaning that they can vary depending on the observer's frame of reference.
4. Mass and energy are interchangeable, as expressed by the famous equation E=mc².
User:
Input
User: Building a website can be done in 10 simple steps:
Falcon: 

Output
1. Determine the purpose and goals of your website.
2. Choose a domain name and register it.
3. Select a web hosting provider and sign up for a plan.
4. Choose a website builder or content management system (CMS) to use.
5. Select a theme or template for your website.
6. Customize the design and layout of your website.
7. Add content to your website, including text, images, and videos.
8. Set up navigation menus and links to make your website easy to navigate.
9. Test your website to ensure it is functioning properly and is mobile-friendly.
10. Launch your website and promote it to attract visitors.
User:
Input
System: Always answer with a Haiku.
User: I am deploying a large language model on Amazon SageMaker JumpStart. What should I do?
Falcon: 

Output
"JumpStart" your dreams,
Deploy with ease, scale with glee,
SageMaker awaits.
Input
System: Generate a table summarizing the options outlined in this email.
User: Team, we have to make a decision on the product plan for Goocrux v2. The first option will take us 3 weeks of development, and will achieve performance of 80ms, but means that the UI has to change. The second option will take us 4 weeks of development, and will achieve performance of 60ms, without any UI change. The third option will take the longest -- 6 weeks of development, will achieve performance of 30ms, without any UI change. A markdown table with four columns: (1) Option ID, (2) Development time, (3) Performance, (4) UI Change.
Falcon: 

Output
| Option ID | Development time | Performance | UI Change |
| --- | --- | --- | --- |
| 1 | 3 weeks | 80ms | Yes |
| 2 | 4 weeks | 60ms | No |
| 3 | 6 weeks | 30ms | No |
Input
System: You always translate English to French.
User: sea otter => loutre de mer
peppermint => menthe poivrée
plush girafe => girafe peluche
cheese =>
Falcon: 

Output
=> fromage
Input
System: Classify the user prompt as Negative, Positive, or Neutral sentiment. Use a short response.
User: Tweet: "This new music video was awesome" 
Falcon: 

Output
"Positive"
Input
User: What is the recipe of mayonnaise?
Falcon: 

Output 1. In a mixing bowl, whisk together 1 egg yolk, 1 tablespoon of Dijon mustard, and 1 tablespoon of lemon juice until well combined.
2. While whisking continuously, slowly drizzle in 1 cup of vegetable oil until the mixture thickens and emulsifies.
3. Season with salt and pepper to taste.
4. Store in an airtight container in the refrigerator for up to one week.
User:

In the following example, the user has had a conversation with the assistant about tourist sites in Paris. Next, the user is inquiring about the first option recommended by the chat assistant.

Input
System: You are a helpful trip planner.
User: I am going to Paris, what should I see?
Falcon: 1. Eiffel Tower: No trip to Paris is complete without a visit to the iconic Eiffel Tower.
2. Louvre Museum: Home to some of the world's most famous art, including the Mona Lisa.
3. Notre-Dame Cathedral: A stunning example of Gothic architecture and a must-see for history buffs.
4. Champs-Élysées: One of the most famous avenues in the world, lined with shops, cafes, and theaters.
5. Montmartre: A charming neighborhood known for its bohemian vibe, street art, and stunning views of the city.
6. Palace of Versailles: A short train ride from Paris, this opulent palace was once the seat of French royalty.
7. Musée d'Orsay: Housed in a former train station, this museum boasts an impressive collection of Impressionist and Post-Impressionist art.
8. Sainte-Chapelle: A hidden gem in the heart of Paris, this medieval chapel is famous for its stunning stained glass windows.
9. Luxembourg Gardens: A beautiful park in the heart of the city, perfect for a picnic or a leisurely stroll.
User: What is so great about #1?
Falcon: 

Output
#1 is the Eiffel Tower, which is one of the most famous landmarks in the world. It is a symbol of Paris and a must-see attraction for many visitors. The tower offers stunning views of the city from its observation decks, and it is especially beautiful when lit up at night. Additionally, the Eiffel Tower has a rich history and is an engineering marvel, making it a fascinating destination for anyone interested in architecture or history.
User:

Clean up

After you’re done running the notebook, make sure to delete all resources that you created in the process so your billing is stopped. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Falcon 180B in SageMaker Studio and deploy the model for inference. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit SageMaker JumpStart in SageMaker Studio now to get started.

Resources


About the Authors

Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker JumpStart team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker JumpStart and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Olivier Cruchant is a Principal Machine Learning Specialist Solutions Architect at AWS, based in France. Olivier helps AWS customers – from small startups to large enterprises – develop and deploy production-grade machine learning applications. In his spare time, he enjoys reading research papers and exploring the wilderness with friends and family.

Karl Albertsen leads Amazon SageMaker’s foundation model hub, algorithms, and partnerships teams.

Read More

Amazon SageMaker Domain in VPC only mode to support SageMaker Studio with auto shutdown Lifecycle Configuration and SageMaker Canvas with Terraform

Amazon SageMaker Domain in VPC only mode to support SageMaker Studio with auto shutdown Lifecycle Configuration and SageMaker Canvas with Terraform

Amazon SageMaker Domain supports SageMaker machine learning (ML) environments, including SageMaker Studio and SageMaker Canvas. SageMaker Studio is a fully integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models, improving data science team productivity by up to 10x. SageMaker Canvas expands access to machine learning by providing business analysts with a visual interface that allows them to generate accurate ML predictions on their own—without requiring any ML experience or having to write a single line of code.

HashiCorp Terraform is an infrastructure as code (IaC) tool that lets you organize your infrastructure in reusable code modules. AWS customers rely on IaC to design, develop, and manage their cloud infrastructure, such as SageMaker Domains. IaC ensures that customer infrastructure and services are consistent, scalable, and reproducible while following best practices in the area of development operations (DevOps). Using Terraform, you can develop and manage your SageMaker Domain and its supporting infrastructure in a consistent and repeatable manner.

In this post, we demonstrate the Terraform implementation to deploy a SageMaker Domain and the Amazon Virtual Private Cloud (Amazon VPC) it associates with. The solution will use Terraform to create:

  • A VPC with subnets, security groups, as well as VPC endpoints to support VPC only mode for the SageMaker Domain.
  • A SageMaker Domain in VPC only mode with a user profile.
  • An AWS Key Management Service (AWS KMS) key to encrypt the SageMaker Studio’s Amazon Elastic File System (Amazon EFS) volume.
  • A Lifecycle Configuration attached to the SageMaker Domain to automatically shut down idle Studio notebook instances.
  • A SageMaker Domain execution role and IAM policies to enable SageMaker Studio and Canvas functionalities.

The solution described in this post is available at this GitHub repo.

Solution overview

The following image shows SageMaker Domain in VPC only mode.

sagemaker_domain_vpc_only

By launching SageMaker Domain in your VPC, you can control the data flow from your SageMaker Studio and Canvas environments. This allows you to restrict internet access, monitor and inspect traffic using standard AWS networking and security capabilities, and connect to other AWS resources through VPC endpoints.

VPC requirements to use VPC only mode

Creating a SageMaker Domain in VPC only mode requires a VPC with the following configurations:

  1. At least two private subnets, each in a different Availability Zone, to ensure high availability.
  2. Ensure your subnets have the required number of IP addresses needed. We recommend between two and four IP addresses per user. The total IP address capacity for a Studio domain is the sum of available IP addresses for each subnet provided when the domain is created.
  3. Set up one or more security groups with inbound and outbound rules that together allow the following traffic:
    • NFS traffic over TCP on port 2049 between the domain and the Amazon EFS volume.
    • TCP traffic within the security group. This is required for connectivity between the JupyterServer app and the KernelGateway apps. You must allow access to at least ports in the range 8192–65535.
  4. Create a gateway endpoint for Amazon Simple Storage Service (Amazon S3). SageMaker Studio needs to access Amazon S3 from your VPC using Gateway VPC endpoints. After you create the gateway endpoint, you need to add it as a target in your route table for traffic destined from your VPC to Amazon S3.
  5. Create interface VPC endpoints (AWS PrivateLink) to allow Studio to access the following services with the corresponding service names. You must also associate a security group for your VPC with these endpoints to allow all inbound traffic from port 443:
    • SageMaker API: com.amazonaws.region.sagemaker.api. This is required to communicate with the SageMaker API.
    • SageMaker runtime: com.amazonaws.region.sagemaker.runtime. This is required to run Studio notebooks and to train and host models.
    • SageMaker Feature Store: com.amazonaws.region.sagemaker.featurestore-runtime. This is required to use SageMaker Feature Store.
    • SageMaker Projects: com.amazonaws.region.servicecatalog. This is required to use SageMaker Projects.

Additional VPC endpoints to use SageMaker Canvas

In addition to the previously mentioned VPC endpoints, to use SageMaker Canvas, you need to also create the following interface VPC endpoints:

  • Amazon Forecast and Amazon Forecast Query: com.amazonaws.region.forecast and com.amazonaws.region.forecastquery. These are required to use Amazon Forecast.
  • Amazon Rekognition: com.amazonaws.region.rekognition. This is required to use Amazon Rekognition.
  • Amazon Textract: com.amazonaws.region.textract. This is required to use Amazon Textract.
  • Amazon Comprehend: com.amazonaws.region.comprehend. This is required to use Amazon Comprehend.
  • AWS Security Token Service (AWS STS): com.amazonaws.region.sts. This is required because SageMaker Canvas uses AWS STS to connect to data sources.
  • Amazon Athena and AWS Glue: com.amazonaws.region.athena and com.amazonaws.region.glue. This is required to connect to AWS Glue Data Catalog through Amazon Athena.
  • Amazon Redshift: com.amazonaws.region.redshift-data. This is required to connect to the Amazon Redshift data source.

To view all VPC endpoints for each service you can use with SageMaker Canvas, please go to Configure Amazon SageMaker Canvas in a VPC without internet access.

AWS KMS encryption for SageMaker Studio’s EFS volume

The first time a user on your team onboards to SageMaker Studio, SageMaker creates an EFS volume for the team. A home directory is created in the volume for each user who onboards to Studio as part of your team. Notebook files and data files are stored in these directories.

You can encrypt your SageMaker Studio’s EFS volume with a KMS key so your home directories’ data are encrypted at rest. This Terraform solution creates a KMS key and uses it to encrypt SageMaker Studio’s EFS volume.

SageMaker Domain Lifecycle Configuration to automatically shut down idle Studio notebooks

sagemaker_auto_shutdown

Lifecycle Configurations are shell scripts triggered by Amazon SageMaker Studio lifecycle events, such as starting a new Studio notebook. You can use Lifecycle Configurations to automate customization for your Studio environment.

This Terraform solution creates a SageMaker Lifecycle Configuration to detect and stop idle resources that incur costs within Studio using an auto-shutdown Jupyter extension. Under the hood, the following resources are created or configured to achieve the desired result:

  1. Create an S3 bucket and upload the latest version of the auto-shutdown extension sagemaker_studio_autoshutdown-0.1.5.tar.gz. Later, the auto-shutdown script will run the s3 cp command to download the extension file from the S3 bucket on Jupyter Server start-ups. Please refer to the following GitHub repos for more information regarding the auto-shutdown extension and auto-shutdown script.
  2. Create an aws_sagemaker_studio_lifecycle_config resource “auto_shutdown”. This resource will encode the autoshutdown-script.sh with base 64 and create a Lifecycle Configuration for the SageMaker Domain.
  3. For SageMaker Domain default user settings, specify the Lifecycle Configuration arn and set it as default.

SageMaker execution role IAM permissions

As a managed service, SageMaker performs operations on your behalf on the AWS hardware that is managed by SageMaker. SageMaker can perform only operations that the user permits.

A SageMaker user can grant these permissions with an IAM role (referred to as an execution role). When you create a SageMaker Studio domain, SageMaker allows you to create the execution role by default. You can restrict access to user profiles by changing the SageMaker user profile role. This Terraform solution attaches the following IAM policies to the SageMaker execution role:

  • SageMaker managed AmazonSageMakerFullAccess policy. This policy grants the execution role full access to use SageMaker Studio.
  • A customer managed IAM policy to access the KMS key used to encrypt the SageMaker Studio’s EFS volume.
  • SageMaker managed AmazonSageMakerCanvasFullAccess and AmazonSageMakerCanvasAIServicesAccess policies. These policies grant the execution role full access to use SageMaker Canvas.
  • In order to enable time series analysis in SageMaker Canvas, you also need to add the IAM trust policy for Amazon Forecast.

Solution walkthrough

In this blog post, we demonstrate how to deploy the Terraform solution. Prior to making the deployment, please ensure to satisfy the following prerequisites:

Prerequisites

  • An AWS account
  • An IAM user with administrative access

Deployment steps

To give users following this guide a unified deployment experience, we demonstrate the deployment process with AWS CloudShell. Using CloudShell, a browser-based shell, you can quickly run scripts with the AWS Command Line Interface (AWS CLI), experiment with service APIs using the AWS CLI, and use other tools to increase your productivity.

To deploy the Terraform solution, complete the following steps:

CloudShell launch settings

  • Sign in to the AWS Management Console and select the CloudShell service.
  • In the navigation bar, in the Region selector, choose US East (N. Virginia).

Your browser will open the CloudShell terminal.

Install Terraform

The next steps should be executed in a CloudShell terminal.

Check this Hashicorp guide for up-to-date instructions to install Terraform for Amazon Linux:

  • Install yum-config-manager to manage your repositories.
sudo yum install -y yum-utils
  • Use yum-config-manager to add the official HashiCorp Linux repository.
sudo yum-config-manager --add-repo https://rpm.releases.hashicorp.com/AmazonLinux/hashicorp.repo
  • Install Terraform from the new repository.
sudo yum -y install terraform
  • Verify that the installation worked by listing Terraform’s available subcommands.
terraform -help

Expected output:

Usage: terraform [-version] [-help] <command> [args]

The available commands for execution are listed below.

The most common, useful commands are shown first, followed by

less common or more advanced commands. If you’re just getting

started with Terraform, stick with the common commands. For the

other commands, please read the help and docs before usage.

…

Clone the code repo

Perform the following steps in a CloudShell terminal.

  • Clone the repo and navigate to the sagemaker-domain-vpconly-canvas-with-terraform folder:
git clone https://github.com/aws-samples/sagemaker-domain-vpconly-canvas-with-terraform.git

cd sagemaker-domain-vpconly-canvas-with-terraform
  • Download the auto-shutdown extension and place it in the assets/auto_shutdown_template folder:
wget https://github.com/aws-samples/sagemaker-studio-auto-shutdown-extension/raw/main/sagemaker_studio_autoshutdown-0.1.5.tar.gz -P assets/auto_shutdown_template

Deploy the Terraform solution

In the CloudShell terminal, run the following Terraform commands:

terraform init

You should see a success message like:

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see

any changes that are required for your infrastructure. All Terraform commands

should now work...

Now you can run:

terraform plan

After you are satisfied with the resources the plan outlines to be created, you can run:

terraform apply

Enter “yes“ when prompted to confirm the deployment.

If successfully deployed, you should see an output that looks like:

Apply complete! Resources: X added, 0 changed, 0 destroyed.

Accessing SageMaker Studio and Canvas

We now have a Studio domain associated with our VPC and a user profile in this domain.

sagemaker_domain

To use the SageMaker Studio console, on the Studio Control Panel, locate your user name (it should be defaultuser) and choose Open Studio.

We made it! Now you can use your browser to connect to the SageMaker Studio environment. After a few minutes, Studio finishes creating your environment, and you’re greeted with the launcher screen.

studio_landing_page

To use the SageMaker Canvas console, on the Canvas Control Panel, locate your user name (should be defaultuser) and choose Open Canvas.

Now you can use your browser to connect to the SageMaker Canvas environment. After a few minutes, Canvas finishes creating your environment, and you’re greeted with the launcher screen.

canvas_landing_page

Feel free to explore the full functionality SageMaker Studio and Canvas has to offer! Please refer to the Conclusion section for additional workshops and tutorials you can use to learn more about SageMaker.

Clean up

Run the following command to clean up your resources:

terraform destroy

Tip: If you set the Amazon EFS retention policy as “Retain” (the default), you will run into issues during “terraform destroy” because Terraform is trying to delete the subnets and VPC when the EFS volume as well as its associated security groups (created by SageMaker) still exist. To fix this, first delete the EFS volume manually and then delete the subnets and VPC manually in the AWS console.

Conclusion

The solution in this post provides you the ability to create a SageMaker Domain to support ML environments, including SageMaker Studio and SageMaker Canvas with Terraform. SageMaker Studio provides a fully managed IDE that removes the heavy lifting in the ML process. With SageMaker Canvas, our business users can easily explore and build ML models to make accurate predictions without writing any code. With the ability to launch Studio and Canvas inside a VPC and the use of a KMS key to encrypt the EFS volume, customers can use SageMaker ML environments with enhanced security. Auto shutdown Lifecycle Configuration helps customers save costs on idle Studio notebook instances.

Go test this solution and let us know what you think. For more information about how to use SageMaker Studio and Sagemaker Canvas, see the following:


About the Author

chen_yang_awsChen Yang is a Machine Learning Engineer at Amazon Web Services. She is part of the AWS Professional Services team, and has been focusing on building secure machine learning environments for customers. In her spare time, she enjoys running and hiking in the Pacific Northwest.

Read More

Implement smart document search index with Amazon Textract and Amazon OpenSearch

Implement smart document search index with Amazon Textract and Amazon OpenSearch

For modern companies that deal with enormous volumes of documents such as contracts, invoices, resumes, and reports, efficiently processing and retrieving pertinent data is critical to maintaining a competitive edge. However, traditional methods of storing and searching for documents can be time-consuming and often result in a large effort to find a specific document, especially when they include handwriting. What if there was a way to process documents intelligently and make them searchable in with high accuracy?

This is made possible with Amazon Textract, AWS’s Intelligent Document Processing service, coupled with the fast search capabilities of OpenSearch. In this post, we’ll take you on a journey to rapidly build and deploy a document search indexing solution that helps your organization to better harness and extract insights from documents.

Whether you’re in Human Resources looking for specific clauses in employee contracts, or a financial analyst sifting through a mountain of invoices to extract payment data, this solution is tailored to empower you to access the information you need with unprecedented speed and accuracy.

With the proposed solution, your documents are automatically ingested, their content parsed and subsequently indexed into a highly responsive and scalable OpenSearch index.

We’ll cover how technologies such as Amazon Textract, AWS Lambda, Amazon Simple Storage Service (Amazon S3), and Amazon OpenSearch Service can be integrated into a workflow that seamlessly processes documents. Then we dive into indexing this data into OpenSearch and demonstrate the search capabilities that become available at your fingertips.

Whether your organization is taking the first steps into the digital transformation era or is an established giant seeking to turbocharge information retrieval, this guide is your compass to navigating the opportunities that AWS Intelligent Document Processing and OpenSearch offer.

The implementation used in this post utilizes the Amazon Textract IDP CDK constructs – AWS Cloud Development Kit (CDK) components to define infrastructure for Intelligent Document Processing (IDP) workflows – which allow you to build use case specific customizable IDP workflows. The IDP CDK constructs and samples are a collection of components to enable definition of IDP processes on AWS and published to GitHub. The main concepts used are the AWS Cloud Development Kit (CDK) constructs, the actual CDK stacks and AWS Step Functions. The workshop Use machine learning to automate and process documents at scale is a good starting point to learn more about customizing workflows and using the other sample workflows as a base for your own.

Solution overview

In this solution, we focus on indexing documents into an OpenSearch index for quick search-and-retrieval of information and documents. Documents in PDF, TIFF, JPEG or PNG format are put in an Amazon Simple Storage Service (Amazon S3) bucket and subsequently indexed into OpenSearch using this Step Functions workflow.

Step Function workflow

Figure 1: The Step Functions OpenSearch workflow

The OpenSearchWorkflow-Decider looks at the document and verifies that the document is one of the supported mime types (PDF, TIFF, PNG or JPEG). It consists of one AWS Lambda function.

The DocumentSplitter generates maximum of 2500-pages chunk from documents. This means even though Amazon Textract supports documents of up to 3000 pages, you can pass in documents with many more pages and the process still works fine and puts the pages into OpenSearch and creates correct page numbers. The DocumentSplitter is implemented as an AWS Lambda function.

The Map State processes each chunk in parallel.

The TextractAsync task calls Amazon Textract using the asynchronous Application Programming Interface (API) following best practices with Amazon Simple Notification Service (Amazon SNS) notifications and OutputConfig to store the Amazon Textract JSON output to a customer Amazon S3 bucket. It consists of two Amazon Lambda functions: one to submit the document for processing and one getting triggered on the Amazon SNS notification.

Because the TextractAsync task can produce multiple paginated output files, the TextractAsyncToJSON2 process combines them into one JSON file.

The Step Functions context is enriched with information that should also be searchable in the OpenSearch index in the SetMetaData step. The sample implementation adds ORIGIN_FILE_NAME, START_PAGE_NUMBER, and ORIGIN_FILE_URI. You can add any information to enrich the search experience, like information from other backend systems, specific IDs or classification information.

The GenerateOpenSearchBatch takes the generated Amazon Textract output JSON, combines it with the information from the context set by SetMetaData and prepares a file that is optimized for batch import into OpenSearch.

In the OpenSearchPushInvoke, this batch import file is sent into the OpenSearch index and available for search. This AWS Lambda function is connected with the aws-lambda-opensearch construct from the AWS Solutions library using the m6g.large.search instances, OpenSearch version 2.7, and configured the Amazon Elastic Block Service (Amazon EBS) volume size to General Purpose 2 (GP2) with 200 GB. You can change the OpenSearch configuration according to your requirements.

The final TaskOpenSearchMapping step clears the context, which otherwise could exceed the Step Functions Quota of Maximum input or output size for a task, state, or execution.

Prerequisites

To deploy the samples, you need an AWS account , the AWS Cloud Development Kit (AWS CDK), a current Python version and Docker are required. You need permissions to deploy AWS CloudFormation templates, push to the Amazon Elastic Container Registry (Amazon ECR), create Amazon Identity and Access Management (AWS IAM) roles, Amazon Lambda functions, Amazon S3 buckets, Amazon Step Functions, Amazon OpenSearch cluster, and an Amazon Cognito user pool. Make sure your AWS CLI environment is setup with the according permissions.

You can also spin up a AWS Cloud9 instance with AWS CDK, Python and Docker pre-installed to initiate the deployment.

Walkthrough

Deployment

  1. After you set up the prerequisites, you need to first clone the repository:
git clone https://github.com/aws-solutions-library-samples/guidance-for-low-code-intelligent-document-processing-on-aws.git
  1. Then cd into the repository folder and install the dependencies:
cd guidance-for-low-code-intelligent-document-processing-on-aws/

pip install -r requirements.txt
  1. Deploy the OpenSearchWorkflow stack:
cdk deploy OpenSearchWorkflow

The deployment takes around 25 minutes with the default configuration settings from the GitHub samples, and creates a Step Functions workflow, which is invoked when a document is put at an Amazon S3 bucket/prefix and subsequently is processed till the content of the document is indexed in an OpenSearch cluster.

The following is a sample output including useful links and information generated fromcdk deploy OpenSearchWorkflowcommand:

OpenSearchWorkflow.CognitoUserPoolLink = https://us-east-1.console.aws.amazon.com/cognito/v2/idp/user-pools/us-east-1_1234abcdef/users?region=us-east-1
OpenSearchWorkflow.DocumentQueueLink = https://us-east-1.console.aws.amazon.com/sqs/v2/home?region=us-east-1#/queues/https%3A%2F%2Fsqs.us-east-1.amazonaws.com%2F123412341234%2FOpenSearchWorkflow-ExecutionThrottleDocumentQueueABC1234-ABCDEFG1234.fifo
OpenSearchWorkflow.DocumentUploadLocation = s3://opensearchworkflow-opensearchworkflowbucketabcdef1234/uploads/
OpenSearchWorkflow.OpenSearchDashboard = https://search-idp-cdk-opensearch-abcdef1234.us-east-1.es.amazonaws.com/states/_dashboards
OpenSearchWorkflow.OpenSearchLink = https://us-east-1.console.aws.amazon.com/aos/home?region=us-east-1#/opensearch/domains/idp-cdk-opensearch
OpenSearchWorkflow.StepFunctionFlowLink = https://us-east-1.console.aws.amazon.com/states/home?region=us-east-1#/statemachines/view/arn:aws:states:us-east-1:123412341234:stateMachine:OpenSearchWorkflow12341234

This information is also available in the AWS CloudFormation Console.

When a new document is placed under the OpenSearchWorkflow.DocumentUploadLocation, a new Step Functions workflow is started for this document.

To check the status of this document, the OpenSearchWorkflow.StepFunctionFlowLink provides a link to the list of StepFunction executions in the AWS Management Console, displaying the status of the document processing for each document uploaded to Amazon S3. The tutorial Viewing and debugging executions on the Step Functions console provides an overview of the components and views in the AWS Console.

Testing

  1. First test using a sample file.
aws s3 cp s3://amazon-textract-public-content/idp-cdk-samples/moby-dick-hidden-paystub-and-w2.pdf $(aws cloudformation list-exports --query 'Exports[?Name==`OpenSearchWorkflow-DocumentUploadLocation`].Value' --output text)
  1. After selecting the link to the StepFunction workflow or open the AWS Management Console and going to the Step Functions service page, you can look at the different workflow invocations.
Step Function executions list

Figure 2: The Step Functions executions list

  1. Take a look at the currently running sample document execution, where you can follow the execution of the individual workflow tasks.
One document Step Functions workflow execution

Figure 3: One document Step Functions workflow execution

Search

Once the process finished, we can validate that the document is indexed in the OpenSearch index.

  1. To do so, first we create an Amazon Cognito user. Amazon Cognito is used for Authentication of users against the OpenSearch index. Select the link in the output from the cdk deploy (or look at the AWS CloudFormation output in the AWS Management Console) named OpenSearchWorkflow.CognitoUserPoolLink.
Figure 4: The Cognito user pool

Figure 4: The Cognito user pool

  1. Next, select the Create user button, which directs you to a page to enter a username and a password for accessing the OpenSearch Dashboard.
Figure 5: The Cognito create user dialog

Figure 5: The Cognito create user dialog

  1. After choosing Create user, you can continue to the OpenSearch Dashboard by clicking on the OpenSearchWorkflow.OpenSearchDashboard from the CDK deployment output. Login using the previously created username and password. The first time you login, you have to change the password.
  2. Once being logged in to the OpenSearch Dashboard, select the Stack Management section, followed by Index Patterns to create a search index.
Figure 6: OpenSearch Dashboards Stack Management

Figure 6: OpenSearch Dashboards Stack Management

Figure 7: OpenSearch Index Patterns overview

Figure 7: OpenSearch Index Patterns overview

  1. The default name for the index is papers-index and an index pattern name of papers-index* will match that.
Figure 8: Define the OpenSearch index pattern

Figure 8: Define the OpenSearch index pattern

  1. After clicking Next step, select timestamp as the Time field and Create index pattern.
Figure 9: OpenSearch index pattern time field

Figure 9: OpenSearch index pattern time field

  1. Now, from the menu, select Discover.
Figure 10: OpenSearch Discover

Figure 10: OpenSearch Discover

In most cases ,you need to change the time-span according to your last ingest. The default is 15 minutes and often there was no activity in the last 15 minutes. In this example, it changed to 15 days to visualize the ingest.

Figure 11: OpenSearch timespan change

Figure 11: OpenSearch timespan change

  1. Now you can start to search. A novel was indexed, you can search for any terms like call me Ishmael and see the results.
Figure 12: OpenSearch search term

Figure 12: OpenSearch search term

In this case, the term call me Ishmael appears on page 6 of the document at the given Uniform Resource Identifier (URI), which points to the Amazon S3 location of the file. This makes it faster to identify documents and find information across a large corpus of PDF, TIFF or image documents, compared to manually skipping through them.

Running at scale

In order to estimate scale and duration of an indexing process, the implementation was tested with 93,997 documents and a total sum of 1,583,197 pages (average 16.84 pages/document and the largest file having 3755 pages), which all got indexed into OpenSearch. Processing all files and indexing them into OpenSearch took 5.5 hours in the US East (N. Virginia – us-east-1) region using default Amazon Textract Service Quotas. The graph below shows an initial test at 18:00 followed by the main ingest at 21:00 and all done by 2:30.

Figure 13: OpenSearch indexing overview

Figure 13: OpenSearch indexing overview

For the processing, the tcdk.SFExecutionsStartThrottle was set to an executions_concurrency_threshold=550, which means that concurrent document processing workflows are capped at 550 and excess requests are queued to an Amazon SQS Fist-In-First-Out (FIFO) queue, which is subsequently drained when current workflows finish. The threshold of 550 is based on the Textract Service quota of 600 in the us-east-1 region. Therefore, the queue depth and age of oldest message are metrics worth monitoring.

Figure 14: Amazon SQS monitoring

Figure 14: Amazon SQS monitoring

In this test, all documents were uploaded to Amazon S3 at once, therefore the Approximate Number of Messages Visible has a steep increase and then a slow decline as no new documents are ingested. The Approximate Age Of Oldest Message increases until all messages are processed. The Amazon SQS MessageRetentionPeriod is set to 14 days. For very long running backlog processing that could exceed 14 days processing, start with processing a smaller subset of representative documents and monitor the duration of execution to estimate how many documents you can pass in before exceeding 14 days. The Amazon SQS CloudWatch metrics look similar for a use case of processing a large backlog of documents, which is ingested at once then processed fully. If your use case is a steady flow of documents, both metrics, the Approximate Number of Messages Visible and the Approximate Age Of Oldest Message will be more linear. You can also use the threshold parameter to mix a steady load with backlog processing and allocate capacity according to your processing needs.

Another metrics to monitor is the health of the OpenSearch cluster, which you should setup according to the Opernational best practices for Amazon OpenSearch Service. The default deployment uses m6g.large.search instances.

Figure 15: OpenSearch monitoring

Figure 15: OpenSearch monitoring

Here is a snapshot of the Key Performance Indicators (KPI) for the OpenSearch cluster. No errors, constant indexing data rate and latency.

The Step Functions workflow executions show the state of processing for each individual document. If you see executions in Failed state, then select the details. A good metric to monitor is the AWS CloudWatch Automatic Dashboard for Step Functions, which exposes some of the Step Functions CloudWatch metrics.

Figure 16: Step Functions monitoring executions succeeded

Figure 16: Step Functions monitoring executions succeeded

In this AWS CloudWatch Dashboard graph, you see the successful Step Functions executions over time.

Figure 17: OpenSearch monitoring executions failed

Figure 17: OpenSearch monitoring executions failed

And this one shows the failed executions. These are worth investigating through the AWS Console Step Functions overview.

The following screenshot shows one example of a failed execution due to the origin file being of 0 size, which makes sense because the file has no content and could not be processed. It is important to filter failed processes and visualizes failures, in order for you to go back to the source document and validate the root cause.

Figure 18: Step Functions failed workflow

Figure 18: Step Functions failed workflow

Other failures might include documents that are not of mime type: application/pdf, image/png, image/jpeg, or image/tiff because other document types are not supported by Amazon Textract.

Cost

The total cost of ingesting 1,583,278 pages was split across AWS services used for the implementation. The following list serves as approximate numbers, because your actual cost and processing duration vary depending on the size of documents, the number of pages per document, the density of information in the documents, and the AWS Region. Amazon DynamoDB was consuming $0.55, Amazon S3 $3.33, OpenSearch Service $14.71, Step Functions $17.92, AWS Lambda $28.95, and Amazon Textract $1,849.97. Also, keep in mind that the deployed Amazon OpenSearch Service cluster is billed by the hour and will accumulate higher cost when run over a period of time.

Modifications

Most likely, you want to modify the implementation and customize for your use case and documents. The workshop Use machine learning to automate and process documents at scale presents a good overview on how to manipulate the actual workflows, changing the flow, and adding new components. To add custom fields to the OpenSearch index, look at the SetMetaData task in the workflow using the set-manifest-meta-data-opensearch AWS Lambda function to add meta-data to the context, which will be added as a field to the OpenSearch index. Any meta-data information will become part of the index.

Cleaning up

Delete the example resources if you no longer need them, to avoid incurring future costs using the followind command:

cdk destroy OpenSearchWorkflow

in the same environment as the cdk deploy command. Beware that this removes everything, including the OpenSearch cluster and all documents and the Amazon S3 bucket. If you want to maintain that information, backup your Amazon S3 bucket and create an index snapshot from your OpenSearch cluster. If you processed many files, then you may have to empty the Amazon S3 bucket first using the AWS Management Console (i.e., after you took a backup or synced them to a different bucket if you want to retain the information), because the cleanup function can time out and then destroy the AWS CloudFormation stack.

Conclusion

In this post, we showed you how to deploy a full stack solution to ingest a large number of documents into an OpenSearch index, which are ready to be used for search use cases. The individual components of the implementation were discussed as well as scaling considerations, cost, and modification options. All code is accessible as OpenSource on GitHub as IDP CDK samples and as IDP CDK constructs to build your own solutions from scratch. As a next step you can start to modify the workflow, add information to the documents in the search index and explore the IDP workshop. Please comment below on your experience and ideas to expand the current solution.


About the Author

Martin Schade is a Senior ML Product SA with the Amazon Textract team. He has over 20 years of experience with internet-related technologies, engineering, and architecting solutions. He joined AWS in 2014, first guiding some of the largest AWS customers on the most efficient and scalable use of AWS services, and later focused on AI/ML with a focus on computer vision. Currently, he’s obsessed with extracting information from documents.

Read More

Semantic image search for articles using Amazon Rekognition, Amazon SageMaker foundation models, and Amazon OpenSearch Service

Semantic image search for articles using Amazon Rekognition, Amazon SageMaker foundation models, and Amazon OpenSearch Service

Digital publishers are continuously looking for ways to streamline and automate their media workflows in order to generate and publish new content as rapidly as they can.

Publishers can have repositories containing millions of images and in order to save money, they need to be able to reuse these images across articles. Finding the image that best matches an article in repositories of this scale can be a time-consuming, repetitive, manual task that can be automated. It also relies on the images in the repository being tagged correctly, which can also be automated (for a customer success story, refer to Aller Media Finds Success with KeyCore and AWS).

In this post, we demonstrate how to use Amazon Rekognition, Amazon SageMaker JumpStart, and Amazon OpenSearch Service to solve this business problem. Amazon Rekognition makes it easy to add image analysis capability to your applications without any machine learning (ML) expertise and comes with various APIs to fulfil use cases such as object detection, content moderation, face detection and analysis, and text and celebrity recognition, which we use in this example. SageMaker JumpStart is a low-code service that comes with pre-built solutions, example notebooks, and many state-of-the-art, pre-trained models from publicly available sources that are straightforward to deploy with a single click into your AWS account. These models have been packaged to be securely and easily deployable via Amazon SageMaker APIs. The new SageMaker JumpStart Foundation Hub allows you to easily deploy large language models (LLM) and integrate them with your applications. OpenSearch Service is a fully managed service that makes it simple to deploy, scale, and operate OpenSearch. OpenSearch Service allows you to store vectors and other data types in an index, and offers rich functionality that allows you to search for documents using vectors and measuring the semantical relatedness, which we use in this post.

The end goal of this post is to show how we can surface a set of images that are semantically similar to some text, be that an article or tv synopsis.

The following screenshot shows an example of taking a mini article as your search input, rather than using keywords, and being able to surface semantically similar images.

Overview of solution

The solution is divided into two main sections. First, you extract label and celebrity metadata from the images, using Amazon Rekognition. You then generate an embedding of the metadata using a LLM. You store the celebrity names, and the embedding of the metadata in OpenSearch Service. In the second main section, you have an API to query your OpenSearch Service index for images using OpenSearch’s intelligent search capabilities to find images that are semantically similar to your text.

This solution uses our event-driven services Amazon EventBridge, AWS Step Functions, and AWS Lambda to orchestrate the process of extracting metadata from the images using Amazon Rekognition. Amazon Rekognition will perform two API calls to extract labels and known celebrities from the image.

Amazon Rekognition celebrity detection API, returns a number of elements in the response. For this post, you use the following:

  • Name, Id, and Urls – The celebrity name, a unique Amazon Rekognition ID, and list of URLs such as the celebrity’s IMDb or Wikipedia link for further information.
  • MatchConfidence – A match confidence score that can be used to control API behavior. We recommend applying a suitable threshold to this score in your application to choose your preferred operating point. For example, by setting a threshold of 99%, you can eliminate more false positives but may miss some potential matches.

In your second API call, Amazon Rekognition label detection API, returns a number of elements in the response. You use the following:

  • Name – The name of the detected label
  • Confidence – The level of confidence in the label assigned to a detected object

A key concept in semantic search is embeddings. A word embedding is a numerical representation of a word or group of words, in the form of a vector. When you have many vectors, you can measure the distance between them, and vectors which are close in distance are semantically similar. Therefore, if you generate an embedding of all of your images’ metadata, and then generate an embedding of your text, be that an article or tv synopsis for example, using the same model, you can then find images which are semantically similar to your given text.

There are many models available within SageMaker JumpStart to generate embeddings. For this solution, you use GPT-J 6B Embedding from Hugging Face. It produces high-quality embeddings and has one of the top performance metrics according to Hugging Face’s evaluation results. Amazon Bedrock is another option, still in preview, where you could choose Amazon Titan Text Embeddings model to generate the embeddings.

You use the GPT-J pre-trained model from SageMaker JumpStart to create an embedding of the image metadata and store this as a k-NN vector in your OpenSearch Service index, along with the celebrity name in another field.

The second part of the solution is to return the top 10 images to the user that are semantically similar to their text, be this an article or tv synopsis, including any celebrities if present. When choosing an image to accompany an article, you want the image to resonate with the pertinent points from the article. SageMaker JumpStart hosts many summarization models which can take a long body of text and reduce it to the main points from the original. For the summarization model, you use the AI21 Labs Summarize model. This model provides high-quality recaps of news articles and the source text can contain roughly 10,000 words, which allows the user to summarize the entire article in one go.

To detect if the text contains any names, potentially known celebrities, you use Amazon Comprehend which can extract key entities from a text string. You then filter by the Person entity, which you use as an input search parameter.

Then you take the summarized article and generate an embedding to use as another input search parameter. It’s important to note that you use the same model deployed on the same infrastructure to generate the embedding of the article as you did for the images. You then use Exact k-NN with scoring script so that you can search by two fields: celebrity names and the vector that captured the semantic information of the article. Refer to this post, Amazon OpenSearch Service’s vector database capabilities explained, on the scalability of Score script and how this approach on large indexes may lead to high latencies.

Walkthrough

The following diagram illustrates the solution architecture.

Following the numbered labels:

  1. You upload an image to an Amazon S3 bucket
  2. Amazon EventBridge listens to this event, and then triggers an AWS Step function execution
  3. The Step Function takes the image input, extracts the label and celebrity metadata
  4. The AWS Lambda function takes the image metadata and generates an embedding
  5. The Lambda function then inserts the celebrity name (if present) and the embedding as a k-NN vector into an OpenSearch Service index
  6. Amazon S3 hosts a simple static website, served by an Amazon CloudFront distribution. The front-end user interface (UI) allows you to authenticate with the application using Amazon Cognito to search for images
  7. You submit an article or some text via the UI
  8. Another Lambda function calls Amazon Comprehend to detect any names in the text
  9. The function then summarizes the text to get the pertinent points from the article
  10. The function generates an embedding of the summarized article
  11. The function then searches OpenSearch Service image index for any image matching the celebrity name and the k-nearest neighbors for the vector using cosine similarity
  12. Amazon CloudWatch and AWS X-Ray give you observability into the end to end workflow to alert you of any issues.

Extract and store key image metadata

The Amazon Rekognition DetectLabels and RecognizeCelebrities APIs give you the metadata from your images—text labels you can use to form a sentence to generate an embedding from. The article gives you a text input that you can use to generate an embedding.

Generate and store word embeddings

The following figure demonstrates plotting the vectors of our images in a 2-dimensional space, where for visual aid, we have classified the embeddings by their primary category.

You also generate an embedding of this newly written article, so that you can search OpenSearch Service for the nearest images to the article in this vector space. Using the k-nearest neighbors (k-NN) algorithm, you define how many images to return in your results.

Zoomed in to the preceding figure, the vectors are ranked based on their distance from the article and then return the K-nearest images, where K is 10 in this example.

OpenSearch Service offers the capability to store large vectors in an index, and also offers the functionality to run queries against the index using k-NN, such that you can query with a vector to return the k-nearest documents that have vectors in close distance using various measurements. For this example, we use cosine similarity.

Detect names in the article

You use Amazon Comprehend, an AI natural language processing (NLP) service, to extract key entities from the article. In this example, you use Amazon Comprehend to extract entities and filter by the entity Person, which returns any names that Amazon Comprehend can find in the journalist story, with just a few lines of code:

def get_celebrities(payload):
    response = comprehend_client.detect_entities(
        Text=' '.join(payload["text_inputs"]),
        LanguageCode="en",
    )
    celebrities = ""
    for entity in response["Entities"]:
        if entity["Type"] == "PERSON":
            celebrities += entity["Text"] + " "
    return celebrities

In this example, you upload an image to Amazon Simple Storage Service (Amazon S3), which triggers a workflow where you are extracting metadata from the image including labels and any celebrities. You then transform that extracted metadata into an embedding and store all of this data in OpenSearch Service.

Summarize the article and generate an embedding

Summarizing the article is an important step to make sure that the word embedding is capturing the pertinent points of the article, and therefore returning images that resonate with the theme of the article.

AI21 Labs Summarize model is very simple to use without any prompt and just a few lines of code:

def summarise_article(payload):
    sagemaker_endpoint_summarise = os.environ["SAGEMAKER_ENDPOINT_SUMMARIZE"]
    response = ai21.Summarize.execute(
        source=payload,
        sourceType="TEXT",
        destination=ai21.SageMakerDestination(sagemaker_endpoint_summarise)
    )
    response_summary = response.summary 
    return response_summary

You then use the GPT-J model to generate the embedding

def get_vector(payload_summary):
    sagemaker_endpoint = os.environ["SAGEMAKER_ENDPOINT_VECTOR"]
    response = sm_runtime_client.invoke_endpoint(
        EndpointName=sagemaker_endpoint,
        ContentType="application/json",
        Body=json.dumps(payload_summary).encode("utf-8"),
    )
    response_body = json.loads((response["Body"].read()))
    return response_body["embedding"][0]

You then search OpenSearch Service for your images

The following is an example snippet of that query:

def search_document_celeb_context(person_names, vector):
    results = wr.opensearch.search(
        client=os_client,
        index="images",
        search_body={
            "size": 10,
            "query": {
                "script_score": {
                    "query": {
                        "match": {"celebrities": person_names }
                    },
                    "script": {
                        "lang": "knn",
                        "source": "knn_score",
                        "params": {
                            "field": "image_vector",
                            "query_value": vector,
                            "space_type": "cosinesimil"
                        }
                    }
                }
            }
        },
    )
    return results.drop(columns=["image_vector"]).to_dict()

The architecture contains a simple web app to represent a content management system (CMS).

For an example article, we used the following input:

“Werner Vogels loved travelling around the globe in his Toyota. We see his Toyota come up in many scenes as he drives to go and meet various customers in their home towns.”

None of the images have any metadata with the word “Toyota,” but the semantics of the word “Toyota” are synonymous with cars and driving. Therefore, with this example, we can demonstrate how we can go beyond keyword search and return images that are semantically similar. In the above screenshot of the UI, the caption under the image shows the metadata Amazon Rekognition extracted.

You could include this solution in a larger workflow where you use the metadata you already extracted from your images to start using vector search along with other key terms, such as celebrity names, to return the best resonating images and documents for your search query.

Conclusion

In this post, we showed how you can use Amazon Rekognition, Amazon Comprehend, SageMaker, and OpenSearch Service to extract metadata from your images and then use ML techniques to discover them automatically using celebrity and semantic search. This is particularly important within the publishing industry, where speed matters in getting fresh content out quickly and to multiple platforms.

For more information about working with media assets, refer to Media intelligence just got smarter with Media2Cloud 3.0.


About the Author

Mark Watkins is a Solutions Architect within the Media and Entertainment team, supporting his customers solve many data and ML problems. Away from professional life, he loves spending time with his family and watching his two little ones growing up.

Read More