Automate document validation and fraud detection in the mortgage underwriting process using AWS AI services: Part 1

Automate document validation and fraud detection in the mortgage underwriting process using AWS AI services: Part 1

In this three-part series, we present a solution that demonstrates how you can automate detecting document tampering and fraud at scale using AWS AI and machine learning (ML) services for a mortgage underwriting use case.

This solution rides on a more significant global wave of increasing mortgage fraud, which is worsening as more people present fraudulent proofs to qualify for loans. Data suggests high-risk and suspected fraudulent mortgage activity is on the rise, noting a 52% increase in suspected fraudulent mortgage applications since 2013. (Source: Equifax)

Part 1 of this series discusses the most common challenges associated with the manual lending process. We provide concrete guidance on addressing this issue with AWS AI and ML services to detect document tampering, identify and categorize patterns for fraudulent scenarios, and integrate with business-defined rules while minimizing human expertise for fraud detection.

In Part 2, we demonstrate how to train and host a computer vision model for tampering detection and localization on Amazon SageMaker. In Part 3, we show how to automate detecting fraud in mortgage documents with an ML model and business-defined rules using Amazon Fraud Detector.

Challenges associated with the manual lending process

Organizations in the lending and mortgage industry receive thousands of applications, ranging from new mortgage applications to refinancing an existing mortgage. These documents are increasingly susceptible to document fraud as fraudsters attempt to exploit the system and qualify for mortgages in several illegal ways. To be eligible for a mortgage, the applicant must provide the lender with documents verifying their employment, assets, and debts. Changing borrowing rules and interest rates can drastically alter an applicant’s credit affordability. Fraudsters range from blundering novices to near-perfect masters when creating fraudulent loan application documents. Fraudulent paperwork includes but is not limited to altering or falsifying paystubs, inflating information about income, misrepresenting job status, and forging letters of employment and other key mortgage underwriting documents. These fraud attempts can be challenging for mortgage lenders to capture.

The significant challenges associated with the manual lending process include but not limited to:

  • The necessity for a borrower to visit the branch
  • Operational overhead
  • Data entry errors
  • Automation and time to resolution

Finally, the underwriting process, or the analysis of creditworthiness and the loan decision, takes additional time if done manually. Again, the manual consumer lending process has some advantages, such as approving a loan that requires human judgment. The solution will provide automation and risk mitigation in mortgage underwriting which will help reduce time and cost as compared to the manual process.

Solution overview

Document validation is a critical type of input for mortgage fraud decisions. Understanding the risk profile of the supporting mortgage documents and driving insights from this data can significantly improve risk decisions and is central to any underwriter’s fraud management strategy.

The following diagram represents each stage in a mortgage document fraud detection pipeline. We walk through each of these stages and how they aid towards underwriting accuracy (initiated with capturing documents to classify and extract required content), detecting tampered documents, and finally using an ML model to detect potential fraud classified according to business-driven rules.

Conceptual Architecture

In the following sections, we discuss the stages of the process in detail.

Document classification

With intelligent document processing (IDP), we can automatically process financial documents using AWS AI services such as Amazon Textract and Amazon Comprehend.

Additionally, we can use the Amazon Textract Analyze Lending API in processing mortgage documents. Analyze Lending uses pre-trained ML models to automatically extract, classify, and validate information in mortgage-related documents with high speed and accuracy while reducing human error. As depicted in the following figure, Analyze Lending receives a loan document and then splits it into pages, classifying them according to the type of document. The document pages are then automatically routed to Amazon Textract text processing operations for accurate data extraction and analysis.

Amazon Textract Analyze Lending API

The Analyze Lending API offers the following benefits:

  • Automated end-to-end processing of mortgage packages
  • Pre-trained ML models across a variety of document types in a mortgage application package
  • Ability to scale on demand and reduce reliance on human reviewers
  • Improved decision-making and significantly lower operating costs

Tampering detection

We use a computer vision model deployed on SageMaker for our end-to-end image forgery detection and localization solution, which means it takes a testing image as input and predicts pixel-level forgery likelihood as output.

Most research studies focus on four image forgery techniques: splicing, copy-move, removal, and enhancement. Both splicing and copy-move involve adding image content to the target (forged) image. However, the added content is obtained from a different image in splicing. In copy-move, it’s from the target image. Removal, or inpainting, removes a selected image region (for example, hiding an object) and fills the space with new pixel values estimated from the background. Finally, image enhancement is a vast collection of local manipulations, such as sharpening, brightness, and adjustment.

Depending on the characteristics of the forgery, different clues can be used as the foundation for detection and localization. These clues include JPEG compression artifacts, edge inconsistencies, noise patterns, color consistency, visual similarity, EXIF consistency, and camera model. However, real-life forgeries are more complex and often use a sequence of manipulations to hide the forgery. Most existing methods focus on image-level detection, whether or not an image is forged, and not on localizing or highlighting a forged area of the document image to aid the underwriter in making informed decisions.

We walk through the implementation details of training and hosting a computer vision model for tampering detection and localization on SageMaker in Part 2 of this series. The conceptual CNN-based architecture of the model is depicted in the following diagram. The model extracts image manipulation trace features for a testing image and identifies anomalous regions by assessing how different a local feature is from its reference features. It detects forged pixels by identifying local anomalous features as a predicted mask of the testing image.

Computer vision tampering detection

Fraud detection

We use Amazon Fraud Detector, a fully managed AI service, to automate the generation, evaluation, and detection of fraudulent activities. This is achieved by generating fraud predictions based on data extracted from the mortgage documents against ML fraud models trained with the customer’s historical (fraud) data. You can use the prediction to trigger business rules in relation to underwriting decisions.

Amazon Fraud Detector Process

Defining the fraud prediction logic involves the following components:

  • Event types – Define the structure of the event
  • Models – Define the algorithm and data requirements for predicting fraud
  • Variables – Represent a data element associated with the fraud detection event
  • Rules – Tell Amazon Fraud Detector how to interpret the variable values during fraud prediction
  • Outcomes – The results generated from a fraud prediction
  • Detector version – Contains fraud prediction logic for the fraud detection event

The following diagram illustrates the architecture of this component.

Amazon Fraud Detector Detailed Process

After you deploy your model, you may evaluate its performance scores and metrics based on the prediction explanations. This helps identify top risk indicators and analyze fraud patterns across the data.

Third-party validation

We integrate the solution with third-party providers (via API) to validate the extracted information from the documents, such as personal and employment information. This is particularly useful to cross-validate details in addition to document tampering detection and fraud detection based on the historical pattern of applications.

The following architecture diagram illustrates a batch-oriented fraud detection pipeline in mortgage application processing using various AWS services.

Fraud Detection End to End Architecture

The workflow includes the following steps:

  1. The user uploads the scanned documents into Amazon Simple Storage Service (Amazon S3).
  2. The upload triggers an AWS Lambda function (Invoke Document Analysis) that calls the Amazon Textract API for text extraction. Additionally, we can use the Amazon Textract Analyze Lending API to automatically extract, classify, and validate information.
  3. On completion of text extraction, a notification is sent via Amazon Simple Notification Service (Amazon SNS).
  4. The notification triggers a Lambda function (Get Document Analysis), which invokes Amazon Comprehend for custom document classification.
  5. Document analysis results that have a low confidence score to are routed to human reviewers using Amazon Augmented AI (Amazon A2I).
  6. Output from Amazon Textract and Amazon Comprehend is aggregated using a Lambda function (Analyze & Classify Document).
  7. A SageMaker inference endpoint is called for a fraud prediction mask of the input documents.
  8. Amazon Fraud Detector is called for a fraud prediction score using the data extracted from the mortgage documents.
  9. The results from Amazon Fraud Detector and the SageMaker inference endpoint are aggregated into the loan origination application.
  10. The status of the document processing job is tracked in Amazon DynamoDB.

Conclusion

This post walked through an automated solution to detect document tampering and fraud in the mortgage underwriting process using Amazon Fraud Detector and other Amazon AI and ML services. This solution allows you to detect fraudulent attempts closer to the time of fraud occurrence and helps underwriters with an effective decision-making process. The flexibility of the implementation allows you to define business-driven rules to classify and capture the fraudulent attempts customized to specific business needs.

In Part 2 of this series, we provide the implementation details for detecting document tampering using SageMaker. In Part 3, we demonstrate how to implement the solution on Amazon Fraud Detector.


About the authors


Anup Ravindranath
is a Senior Solutions Architect at Amazon Web Services (AWS) based in Toronto, Canada working with Financial Services organizations. He helps customers to transform their businesses and innovate on cloud.

Vinnie Saini is a Senior Solutions Architect at Amazon Web Services (AWS) based in Toronto, Canada. She has been helping Financial Services customers transform on cloud, with AI and ML driven solutions laid on strong foundational pillars of Architectural Excellence.

Read More

Perform batch transforms with Amazon SageMaker Jumpstart Text2Text Generation large language models

Perform batch transforms with Amazon SageMaker Jumpstart Text2Text Generation large language models

Today we are excited to announce that you can now perform batch transforms with Amazon SageMaker JumpStart large language models (LLMs) for Text2Text Generation. Batch transforms are useful in situations where the responses don’t need to be real time and therefore you can do inference in batch for large datasets in bulk. For batch transform, a batch job is run that takes batch input as a dataset and a pre-trained model, and outputs predictions for each data point in the dataset. Batch transform is cost-effective because unlike real-time hosted endpoints that have persistent hardware, batch transform clusters are torn down when the job is complete and therefore the hardware is only used for the duration of the batch job.

In some use cases, real-time inference requests can be grouped in small batches for batch processing to create real-time or near-real-time responses. For example, if you need to process a continuous stream of data with low latency and high throughput, invoking a real-time endpoint for each request separately would require more resources and can take longer to process all the requests because the processing is being done serially. A better approach would be to group some of the requests and call the real-time endpoint in batch inference mode, which processes your requests in one forward pass of the model and returns the bulk response for the request in real time or near-real time. The latency of the response will depend upon how many requests you group together and instance memory size, therefore you can tune the batch size per your business requirements for latency and throughput. We call this real-time batch inference because it combines the concept of batching while still providing real-time responses. With real-time batch inference, you can achieve a balance between low latency and high throughput, enabling you to process large volumes of data in a timely and efficient manner.

Jumpstart batch transform for Text2Text Generation models allows you to pass the batch hyperparameters through environment variables that further increase throughput and minimize latency.

JumpStart provides pretrained, open-source models for a wide range of problem types to help you get started with machine learning (ML). You can incrementally train and tune these models before deployment. JumpStart also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for ML with Amazon SageMaker. You can access the pre-trained models, solution templates, and examples through the JumpStart landing page in Amazon SageMaker Studio. You can also access JumpStart models using the SageMaker Python SDK.

In this post, we demonstrate how to use the state-of-the-art pre-trained text2text FLAN T5 models from Hugging Face for batch transform and real-time batch inference.

Solution overview

The notebook showing batch transform of pre-trained Text2Text FLAN T5 models from Hugging Face in available in the following GitHub repository. This notebook uses data from the Hugging Face cnn_dailymail dataset for a text summarization task using the SageMaker SDK.

The following are the key steps for implementing batch transform and real-time batch inference:

  1. Set up prerequisites.
  2. Select a pre-trained model.
  3. Retrieve artifacts for the model.
  4. Specify batch transform job hyperparameters.
  5. Prepare data for the batch transform.
  6. Run the batch transform job.
  7. Evaluate the summarization using a ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score.
  8. Perform real-time batch inference.

Set up prerequisites

Before you run the notebook, you must complete some initial setup steps. Let’s set up the SageMaker execution role so it has permissions to run AWS services on your behalf:

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

Select a pre-trained model

We use the huggingface-text2text-flan-t5-large model as a default model. Optionally, you can retrieve the list of available Text2Text models on JumpStart and choose your preferred model. This method provides a straightforward way to select different model IDs using same notebook. For demonstration purposes, we use the huggingface-text2text-flan-t5-large model:

model_id, model_version, = (
"huggingface-text2text-flan-t5-large",
"*",
)

Retrieve artifacts for the model

With SageMaker, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. We start by retrieving the deploy_image_uri, deploy_source_uri, and model_uri for the pre-trained model:

inference_instance_type = "ml.p3.2xlarge"

# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
deploy_image_uri = image_uris.retrieve(
region=None,
framework=None, # automatically inferred from model_id
image_scope="inference",
model_id=model_id,
model_version=model_version,
instance_type=inference_instance_type,
)

# Retrieve the model uri.
model_uri = model_uris.retrieve(
model_id=model_id, model_version=model_version, model_scope="inference"
)

#Create the SageMaker model instance
model = Model(
image_uri=deploy_image_uri,
model_data=model_uri,
role=aws_role,
predictor_cls=Predictor)

Specify batch transform job hyperparameters

You may pass any subset of hyperparameters as environment variables to the batch transform job. You can also pass these hyperparameters in a JSON payload. However, if you’re setting environment variables for hyperparameters like the following code shows, then the advanced hyperparameters from the individual examples in the JSON lines payload will not be used. If you want to use hyperparameters from the payload, you may want to set the hyper_params_dict parameter as null instead.

#Specify the Batch Job Hyper Params Here, If you want to treate each example hyperparameters different please pass hyper_params_dict as None
hyper_params = {"batch_size":4, "max_length":50, "top_k": 50, "top_p": 0.95, "do_sample": True}
hyper_params_dict = {"HYPER_PARAMS":str(hyper_params)}

Prepare data for batch transform

Now we’re ready to load the cnn_dailymail dataset from Hugging Face:

cnn_test = load_dataset('cnn_dailymail','3.0.0',split='test')

We go over each data entry and create the input data in the required format. We create an articles.jsonl file as a test data file containing articles that need to be summarized as input payload. As we create this file, we append the prompt "Briefly summarize this text:" to each test input row. If you want to have different hyperparameters for each test input, you can append those hyperparameters as part of creating the dataset.

We create highlights.jsonl as the ground truth file containing highlights of each article stored in the test file articles.jsonl. We store both test files in an Amazon Simple Storage Service (Amazon S3) bucket. See the following code:

#You can specify a prompt here
prompt = "Briefly summarize this text: "
#Provide the test data and the ground truth file name
test_data_file_name = "articles.jsonl"
test_reference_file_name = 'highlights.jsonl'

test_articles = []
test_highlights =[]

# We will go over each data entry and create the data in the input required format as described above
for id, test_entry in enumerate(cnn_test):
    article = test_entry['article']
    highlights = test_entry['highlights']
    # Create a payload like this if you want to have different hyperparameters for each test input
    # payload = {"id": id,"text_inputs": f"{prompt}{article}", "max_length": 100, "temperature": 0.95}
    # Note that if you specify hyperparameter for each payload individually, you may want to ensure that hyper_params_dict is set to None instead
    payload = {"id": id,"text_inputs": f"{prompt}{article}"}
    test_articles.append(payload)
    test_highlights.append({"id":id, "highlights": highlights})

with open(test_data_file_name, "w") as outfile:
    for entry in test_articles:
        outfile.write("%sn" % json.dumps(entry))

with open(test_reference_file_name, "w") as outfile:
    for entry in test_highlights:
        outfile.write("%sn" % json.dumps(entry))

# Uploading the data        
s3 = boto3.client("s3")
s3.upload_file(test_data_file_name, output_bucket, os.path.join(output_prefix + "/batch_input/articles.jsonl"))

Run the batch transform job

When you start a batch transform job, SageMaker launches the necessary compute resources to process the data, including CPU or GPU instances depending on the selected instance type. During the batch transform job, SageMaker automatically provisions and manages the compute resources required to process the data, including instances, storage, and networking resources. When the batch transform job is complete, the compute resources are automatically cleaned up by SageMaker. This means that the instances and storage used during the job are stopped and removed, freeing up resources and minimizing cost. See the following code:

# Creating the Batch transformer object
batch_transformer = model.transformer(
    instance_count=1,
    instance_type=inference_instance_type,
    output_path=s3_output_data_path,
    assemble_with="Line",
    accept="text/csv",
    max_payload=1,
    env = hyper_params_dict
)

# Making the predications on the input data
batch_transformer.transform(s3_input_data_path, content_type="application/jsonlines", split_type="Line")

batch_transformer.wait()

The following is one example record from the articles.jsonl test file. Note that record in this file has an ID that matched with predict.jsonl file records that shows a summarized record as output from the Hugging Face Text2Text model. Similarly, the ground truth file also has a matching ID for the data record. The matching ID across the test file, ground truth file, and output file allows linking input records with output records for easy interpretation of the results.

The following is the example input record provided for summarization:

{"id": 0, "text_inputs": "Briefly summarize this text: (CNN)The Palestinian Authority officially became the 123rd member of the International Criminal Court on Wednesday, a step that gives the court jurisdiction over alleged crimes in Palestinian territories. The formal accession was marked with a ceremony at The Hague, in the Netherlands, where the court is based. The Palestinians signed the ICC's founding Rome Statute in January, when they also accepted its jurisdiction over alleged crimes committed "in the occupied Palestinian territory, including East Jerusalem, since June 13, 2014." Later that month, the ICC opened a preliminary examination into the situation in Palestinian territories, paving the way for possible war crimes investigations against Israelis. As members of the court, Palestinians may be subject to counter-charges as well. Israel and the United States, neither of which is an ICC member, opposed the Palestinians' efforts to join the body. But Palestinian Foreign Minister Riad al-Malki, speaking at Wednesday's ceremony, said it was a move toward greater justice. "As Palestine formally becomes a State Party to the Rome Statute today, the world is also a step closer to ending a long era of impunity and injustice," he said, according to an ICC news release. "Indeed, today brings us closer to our shared goals of justice and peace." Judge Kuniko Ozaki, a vice president of the ICC, said acceding to the treaty was just the first step for the Palestinians. "As the Rome Statute today enters into force for the State of Palestine, Palestine acquires all the rights as well as responsibilities that come with being a State Party to the Statute. These are substantive commitments, which cannot be taken lightly," she said. Rights group Human Rights Watch welcomed the development. "Governments seeking to penalize Palestine for joining the ICC should immediately end their pressure, and countries that support universal acceptance of the court's treaty should speak out to welcome its membership," said Balkees Jarrah, international justice counsel for the group. "What's objectionable is the attempts to undermine international justice, not Palestine's decision to join a treaty to which over 100 countries around the world are members." In January, when the preliminary ICC examination was opened, Israeli Prime Minister Benjamin Netanyahu described it as an outrage, saying the court was overstepping its boundaries. The United States also said it "strongly" disagreed with the court's decision. "As we have said repeatedly, we do not believe that Palestine is a state and therefore we do not believe that it is eligible to join the ICC," the State Department said in a statement. It urged the warring sides to resolve their differences through direct negotiations. "We will continue to oppose actions against Israel at the ICC as counterproductive to the cause of peace," it said. But the ICC begs to differ with the definition of a state for its purposes and refers to the territories as "Palestine." While a preliminary examination is not a formal investigation, it allows the court to review evidence and determine whether to investigate suspects on both sides. Prosecutor Fatou Bensouda said her office would "conduct its analysis in full independence and impartiality." The war between Israel and Hamas militants in Gaza last summer left more than 2,000 people dead. The inquiry will include alleged war crimes committed since June. The International Criminal Court was set up in 2002 to prosecute genocide, crimes against humanity and war crimes. CNN's Vasco Cotovio, Kareem Khadder and Faith Karimi contributed to this report."}

The following is the predicted output with summarization:

{'id': 0, 'generated_texts': ['The Palestinian Authority officially became a member of the International Criminal Court on Wednesday, a step that gives the court jurisdiction over alleged crimes in Palestinian territories.']}

The following is the ground truth summarization for model evaluation purposes:

{"id": 0, "highlights": "Membership gives the ICC jurisdiction over alleged crimes committed in Palestinian territories since last June .nIsrael and the United States opposed the move, which could open the door to war crimes investigations against Israelis ."}

Next, we use the ground truth and predicted outputs for model evaluation.

Evaluate the model using a ROUGE score¶

ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation in natural language processing. The metrics compare an automatically produced summary or translation against a reference (human-produced) summary or translation or a set of references.

In the following code, we combine the predicted and original summaries by joining them on the common key id and use this to compute the ROUGE score:

# Downloading the predictions
s3.download_file(
output_bucket, output_prefix + "/batch_output/" + "articles.jsonl.out", "predict.jsonl"
)

with open('predict.jsonl', 'r') as json_file:
json_list = list(json_file)

# Creating the prediction list for the dataframe
predict_dict_list = []
for predict in json_list:
if len(predict) > 1:
predict_dict = ast.literal_eval(predict)
predict_dict_req = {"id": predict_dict["id"], "prediction": predict_dict["generated_texts"][0]}
predict_dict_list.append(predict_dict_req)

# Creating the predictions dataframe
predict_df = pd.DataFrame(predict_dict_list)

test_highlights_df = pd.DataFrame(test_highlights)

# Combining the predict dataframe with the original summarization on id to compute the rouge score
df_merge = test_highlights_df.merge(predict_df, on="id", how="left")

rouge = evaluate.load('rouge')
results = rouge.compute(predictions=list(df_merge["prediction"]),references=list(df_merge["highlights"]))
print(results)
{'rouge1': 0.32749078992945646, 'rouge2': 0.126038645005132, 'rougeL': 0.22764277967933363, 'rougeLsum': 0.28162915746368966}

Perform real-time batch inference

Next, we show you how to run real-time batch inference on the endpoint by providing the inputs as a list. We use the same model ID and dataset as earlier, except we take a few records from the test dataset and use them to invoke a real-time endpoint.

The following code shows how to create and deploy a real-time endpoint for real-time batch inference:

from sagemaker.utils import name_from_base
endpoint_name = name_from_base(f"jumpstart-example-{model_id}")
# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name
)

Next, we prepare our input payload. For this, we use the data that we prepared earlier and extract the first 10 test inputs and append the text inputs with hyperparameters that we want to use. We provide this payload to the real-time invoke_endpoint. The response payload is then returned as a list of responses. See the following code:

#Provide all the text inputs to the model as a list
text_inputs = [entry["text_inputs"] for entry in test_articles[0:10]]

# The information about the different Parameters is provided above
payload = {
"text_inputs": text_inputs,
"max_length": 50,
"num_return_sequences": 1,
"top_k": 50,
"top_p": 0.95,
"do_sample": True,
"batch_size": 4
}


def query_endpoint_with_json_payload(encoded_json, endpoint_name):
client = boto3.client("runtime.sagemaker")
response = client.invoke_endpoint(
EndpointName=endpoint_name, ContentType="application/json", Body=encoded_json
)
return response


query_response = query_endpoint_with_json_payload(
json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
)


def parse_response_multiple_texts(query_response):
model_predictions = json.loads(query_response["Body"].read())
return model_predictions

generated_text_list = parse_response_multiple_texts(query_response)
print(*generated_text_list, sep='n')

Clean up

After you have tested the endpoint, make sure you delete the SageMaker inference endpoint and delete the model to avoid incurring charges.

Conclusion

In this notebook, we performed a batch transform to showcase the Hugging Face Text2Text Generator model for summarization tasks. Batch transform is advantageous in obtaining inferences from large datasets without requiring a persistent endpoint. We linked input records with inferences to aid in result interpretation. We used the ROUGE score to compare the test data summarization with the model-generated summarization.

Additionally, we demonstrated real-time batch inference, where you can send a small batch of data to a real-time endpoint to achieve a balance between latency and throughput for scenarios like streaming input data. Real-time batch inference helps increase throughput for real-time requests.

Try out the batch transform with Text2Text Generation models in SageMaker today and let us know your feedback!


About the authors

Hemant Singh is a Machine Learning Engineer with experience in Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He got his masters from Courant Institute of Mathematical Sciences and B.Tech from IIT Delhi. He has experience in working on a diverse range of machine learning problems within the domain of natural language processing, computer vision, and time series analysis.

Rachna Chadha is a Principal Solutions Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that the ethical and responsible use of AI can improve society in future and bring economic and social prosperity. In her spare time, Rachna likes spending time with her family, hiking, and listening to music.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Read More

Index your Confluence content using the new Confluence connector V2 for Amazon Kendra

Index your Confluence content using the new Confluence connector V2 for Amazon Kendra

Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides.

Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should be able to pull together data across several structured and unstructured repositories to index and search on.

One such unstructured data repository is Confluence. Confluence is a team workspace that gives knowledge worker teams a place to create, capture, and collaborate on any project or idea. Team spaces help teams structure, organize, and share work, so every team member has visibility into institutional knowledge and access to the information they need.

There are two Confluence offerings:

  • Cloud – This is offered as a software as a service (SaaS) product. It’s always on, continuously updated, and highly secure.
  • Data Center (self-managed) – Here, you host Confluence on your infrastructure, which could be on premises or the cloud. This allows you to keep data within your network and manage it yourself.

We’re excited to announce that you can now use the new Amazon Kendra connector V2 for Confluence to search information stored in your Confluence account both on the cloud and your data center. In this post, we show how to index information stored in Confluence and use the Amazon Kendra intelligent search function. In addition, the ML-powered intelligent search can accurately find information from unstructured documents having natural language narrative content, for which keyword search is not very effective.

What’s new for this version

This version supports OAuth 2.0 authentication in addition to basic authentication for the Cloud edition. For the Data Center (on-premises) edition, we have added OAuth2 in addition to basic authentication and personal access tokens for showing search results based on user access rights. You can benefit from the following features:

  • You can now crawl comments in addition to spaces, pages, blogs, and attachments
  • You now have fine-grained choices for your sync scope—you can specify pages, blogs, comments, and attachments
  • You can choose to import identities (or not)
  • This version offers regex support for choosing entity titles as well as file types
  • You have the choice of multiple Sync modes

Solution overview

With Amazon Kendra, you can configure multiple data sources to provide a central place to search across your document repository. For our solution, we demonstrate how to index a Confluence repository using the Amazon Kendra connector for Confluence. The solution consists of the following steps:

  1. Choose an authentication mechanism.
  2. Configure an app on Confluence and get the connection details.
  3. Store the details in AWS Secrets Manager.
  4. Create a Confluence data source V2 via the Amazon Kendra console.
  5. Index the data in the Confluence repository.
  6. Run a sample query to test the solution.

Prerequisites

To try out the Amazon Kendra connector for Confluence, you need the following:

Choose an authentication mechanism

Choose your preferred authentication method:

  • Basic – This works on both the Cloud and Data Center editions. You need a user ID and a password to configure this method.
  • Personal access token – This option only works for the Data Center edition.
  • OAuth2 – This is more involved and works for both Cloud and Data Center editions.

Gather authentication details

In this section, we show the steps to gather your authentication details depending on your authentication method.

Basic authentication

For basic authentication with the Data Center edition, all you need is your login and password. Make sure your login has privileges to gather all content.

For Cloud edition, your user ID serves as your user login. For your password, you need to get a token. Complete the following steps:

  1. Log in to https://id.atlassian.com/manage-profile/security/api-tokens and choose Create API token.

  1. For Label, enter a name for the token.
  2. Choose Create.

  1. Copy the value and save it to use as your password.

Personal access token

This authentication method works for on premises (Data Center) only. Complete the following steps to acquire authentication details:

  1. Log in to your Confluence URL using the user ID and password that you want Amazon Kendra to use while retrieving content.
  2. Choose the profile icon and choose Settings.

  1. Choose Personal Access Tokens in the navigation pane, then choose Create token.

create token

  1. For Token name, enter a name.
  2. For Expiry date, deselect Automatic expiry.
  3. Choose Create.

  1. Copy the token and save it in a safe place.

To configure Secrets Manager, we use the login URL and this value.

OAuth2 authentication for Confluence Cloud edition

This authentication method follows the full OAuth2.0 (3LO) documentation from Confluence. We first create and configure an app on Confluence and enable it for OAuth2. The process is slightly different for the Cloud and Data Center editions. We then get an authorization token and exchange this for an access token. Finally, we get the client ID, client secret, and client code. Complete the following steps:

  1. Log in to the Confluence app.
  2. Navigate to https://developer.atlassian.com/.
  3. Next to My apps, choose Create and choose OAuth2 Integration.

  1. For Name, enter a name.
  2. Choose Create.

  1. Choose Authorization in the navigation pane.
  2. Choose Add next to your authorization type.

  1. For Callback URL, enter the URL you use to log in to Confluence.
  2. Choose Save changes.

save changess

  1. Under Authorization URL generator, choose Add APIs.

add apis

  1. Next to User identity API, choose Add, then choose Configure.

add permissions

  1. Choose Edit Scopes to configure read scopes for the app.
  2. Select View active user profile and View user profiles.

edit scopes

  1. Choose Permissions in the navigation pane.
  2. Next to Confluence API, choose Add, then choose Configure.
  3. On the Classic scopes tab, choose Edit Scopes.
  4. Select all read, search, and download scopes.
  5. Choose Save.

grannular scopes

  1. On the Granular scopes tab, choose Edit Scopes.
  2. Search for read and select all the scopes found.
  3. Choose Save.

scope choice confirmation

  1. Choose Authorization in the navigation pane.
  2. Next to your authorization type, choose Configure.

configure authorization type

You should see three URLs listed.

generated urls

  1. Copy the code for Granular Confluence API authorization URL.

The following is example code:

https://auth.atlassian.com/authorize?
audience=api.atlassian.com
&client_id=YOUR_CLIENT_ID
&scope=REQUESTED_SCOPE%20REQUESTED_SCOPE_TWO

&redirect_uri=https://YOUR_APP_CALLBACK_URL
&state=YOUR_USER_BOUND_VALUE
&response_type=code
&prompt=consent
  1. If you want to generate a refresh token so that you don’t have to repeat this process, add offline_access (or %20offline_access) to the end of all the scopes in the URL (for example, &scope=REQUESTED_SCOPE%20REQUESTED_SCOPE_TWO%20offline_access).
  2. If you’re okay generating a new token each time, just enter the URL in your browser.
  3. Choose Accept.

choose accept

You’re redirected to your Confluence home page.

  1. Inspect the browser URL and locate code=xxxxx.
  2. Copy this code and save it.

This is the authorization code that we use to exchange with the access token.

copy authorization code

  1. Return to the Atlassian developer console and choose Settings in the navigation pane.
  2. Copy the values of the client ID and secret ID and save them.

We need these values to make a call to exchange the authorization token with the access token.

postman utility

Next, we use the Postman utility to post the authorization code to get the access token. You can use alternate tools like curl to do this as well.

  1. The URL to post the authorization code is https://auth.atlassian.com/oauth/token.
  2. The JSON body to post is as follows:
    {"grant_type": "authorization_code",
    "client_id": "YOUR_CLIENT_ID",
    "client_secret": "YOUR_CLIENT_SECRET",
    "code": "YOUR_AUTHORIZATION_CODE",
    "redirect_uri": "https://YOUR_APP_CALLBACK_URL"}

The grant_type parameter is hard-coded. We collected the values for client_id and client_secret in a previous step. The value for code is the authorization code we collected earlier.

A successful response will return the access token. If you added offline access to the URL earlier, you also get a refresh token.

return response with access token

  1. Save the access token to use when setting up Secrets Manager.

If you’re generating a new token from the refresh token, the current token is valid only for 1 hour. If you need to get a new token, you can start all over again. However, if you have the refresh token, as before, use Postman to post to the following URL: https://auth.atlassian.com/oauth/token. Use the following JSON format for the body of the token:

{"grant_type": "refresh_token",
"client_id": "YOUR_CLIENT_ID",
"client_secret": "YOUR_CLIENT_SECRET",
"refresh_token": "YOUR_REFRESH_TOKEN"}

The call will return a new access token

new access token

OAuth2 authentication for Confluence Data Center edition

If using the Data Center edition with OAuth2 authentication, complete the following steps:

  1. Log in to Confluence Data Center edition.
  2. Choose the gear icon, then choose General configuration.
  3. In the navigation pane, choose Application links, then choose Create link.
  4. In the Create link pop-up window, select External application and Incoming, then choose Continue.
  5. For Name, enter a name.
  6. For Redirect URL, enter https://httpbin.org/.
  7. Choose Save.
  8. Copy and save the values for the client ID and client secret.
  9. On a separate browser tab, open the URL https://example-app.com/pkce.
  10. Choose Generate Random String and Calculate Hash.
  11. Copy the value under Code Challenge.

  12. Return to your original tab.
  13. Use the following URL to get the authorization code:
    https://<confluence url>/rest/oauth2/latest/authorize
    ?client_id=CLIENT_ID
    &redirect_uri=REDIRECT_URI
    &response_type=code
    &scope=SCOPE
    &code_challenge=CODE_CHALLENGE
    &code_challenge_method=S256

Use the client ID you copied earlier, and https://httpbin.org for the redirect URI. For CODE_CHALLENGE, enter the code you copied earlier.

  1. Choose Allow.

You’re redirected to httpbin.org.

  1. Save the code to use in the next step.

  1. To get the access token and refresh token, use a tool such as curl or Postman to post the following values to https://<your confluence URL>/rest/oauth2/latest/token:
    grant_type: authorization_code
    client_id: YOUR_CLIENT_ID
    client_secret: YOUR_CLIENT_SECRET
    code: YOUR_AUTHORIZATION_CODE
    code_verifier: CODE_VERIFIER
    redirect_uri: YOUR_REDIRECT_URL

Use the client ID, client secret, and authorization code you saved earlier. For CODE_VERIFIER, enter the value from when you generated the code challenge.

  1. Copy the access token and refresh token to use later

copy access and refresh tokens

The access token and refresh token are valid only for 1 hour. To refresh the token, post the following code to the same URL to get new values:

grant_type: refresh_token
client_id: YOUR_CLIENT_ID
client_secret: YOUR_CLIENT_SECRET
refresh_token: REFRESH_TOKEN
redirect_uri: YOUR_REDIRECT_URL

The new tokens are valid for 1 hour.

new tokens

Store Confluence credentials in Secrets Manager

To store your Confluence credentials in Secrets Manager, compete the following steps:

  1. On the Secrets Manager console, choose Store a new secret.
  2. Select Other type of secret.

other type

  1. Depending on the type of secret, enter the key-values as follows:
    • For Confluence Cloud basic authentication, enter the following key-value pairs (note that the password is not the login password, but the token you created earlier):
      "username" : "<your login username>",
      
      "password" : "<your token value>"

    • For Confluence Cloud OAuth authentication, enter the following key-value pairs:
      "confluenceAppKey" : “<your clientid>”
      
      "confluenceAppSecret" : “<your client Secret>”
      
      "confluenceAccessToken" : “<your access token>”
      
      "confluenceRefreshToken" : “<your refresh token>”

    • For Confluence Data Center basic authentication, enter the following key-value pairs:
      "username" : "<login username>"
      
      "password" : "<login password>"

    • For Confluence Data Center personal access token authentication, enter the following key-value pairs:
      "patToken" :"<your personal access token>"

    • For Confluence Data Center OAuth authentication, enter the following key-value pairs:
      "confluenceAppKey" : "<your client id>"
      
      "confluenceAppSecret" : “<your Client Secret>”
      
      "confluenceAccessToken" : “<your Access Token>"
      
      "confluenceRefreshToken" : “<your refresh token>”

  1. Choose Next.

choose next

  1. For Secret name, enter a name (for example, AmazonKendra-my-confluence-secret).
  2. Enter an optional description.
  3. Choose Next.

configure secret

  1. In the Configure rotation section, keep all settings at their defaults and choose Next.

configure rotation

  1. On the Review page, choose Store.

Configure the Amazon Kendra connector for Confluence

To configure the Amazon Kendra connector, complete the following steps:

  1. On the Amazon Kendra console, choose Create an Index.

create an index

  1. For Index name, enter a name for the index (for example, my-confluence-index).
  2. Enter an optional description.
  3. For Role name, enter an IAM role name.
  4. Configure optional encryption settings and tags.
  5. Choose Next.

specify index details

  1. In the Configure user access control section, leave the settings at their defaults and choose Next.

configure user access control

  1. In the Specify provisioning section, select Developer edition and choose Next.

specify provisioning

  1. On the review page, choose Create.

This creates and propagates the IAM role and then creates the Amazon Kendra index, which can take up to 30 minutes.

index created

Create a Confluence data source

Complete the following steps to create your data source:

  1. On the Amazon Kendra console, choose Data sources in the navigation pane.
  2. Under Confluence connector V2.0, choose Add connector.

.

  1. For Data source name, enter a name (for example, my-Confluence-data-source).
  2. Enter an optional description.
  3. Choose Next.

specify data source details

  1. Choose either Confluence Cloud or Confluence Server depending on your data source.
  2. For Authentication, choose your authentication option.
  3. Select Identity crawler is on.
  4. For IAM role¸ choose Create a new role.
  5. For Role name, enter a name (for example, AmazonKendra-my-confluence-datasource-role).
  6. Choose Next.

define access and security

For Confluence Data Center and Cloud editions, we can add additional optional information (not shown) like the VPC. For Data Center edition only, we can add additional information for the web proxy. There is also an additional authentication option if using a personal access token that is valid only for Data Center and not Cloud edition.

  1. For Sync scope, select all the content to sync.
  2. For Sync mode, select Full sync.
  3. For Frequency, choose Run on demand.
  4. Choose Next.

configure sync settings

  1. Optionally, you can set mapping fields.

Mapping fields is a useful exercise where you can substitute field names to values that are user-friendly and fit in your organization’s vocabulary.

  1. For this post, keep all defaults and choose Next.

set field mappings

  1. Review the settings and choose Add data source.
  2. To sync the data source, choose Sync now.

sync data source

A banner message appears when the sync is complete.

Test the solution

Now that you have ingested the content from your Confluence account into your Amazon Kendra index, you can test some queries. For the purposes of our test, we have created a Confluence website with two teams: team1 with the member Analyst1 and team2 with the member Analyst2.

  1. On the Amazon Kendra console, navigate to your index and choose Search indexed content.
  2. Enter a sample search query and review your search results (your results will vary based on the contents of your account).

simple search

The Confluence connector also crawls local identity information from Confluence. You can use this feature to narrow down your query by user. Confluence offers comprehensive visibility options. Users can choose their content to be seen by other users, at a space level, or by groups. When you filter your searches by users, the query returns only those documents that the user has access to at the time of ingestion.

  1. To use this feature, expand Test query with user name or groups and choose Apply user name or groups.
  2. Enter the user name of your user and choose Apply.

Note that for Confluence Data Center edition, the user name is the email ID.

apply user name or groups

Rerun your search query.

This brings you a filtered set of results. Notice we bring back just 62 results.

filtered resultw

We now go back and restrict Bob Straham to just be able to access his workspace and run the search again.

bob's results

Notice that we get just a subset of the results because the search is restricted to just Bob’s content.

When fronting Amazon Kendra with an application such as an application built using Experience Builder, you can pass the user identity (in the form of the email ID for Cloud edition or user name for Data Center edition) to Amazon Kendra to ensure that each user only sees content specific to their user ID. Alternately, you can use AWS IAM Identity Center (successor to AWS Single Sign-On) to control user context being passed to Amazon Kendra to limit queries by user.

Congratulations! You have successfully used Amazon Kendra to surface answers and insights based on the content indexed from your Confluence account.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Amazon Kendra connector for Confluence V2, delete that data source.

Conclusion

With the new Confluence connector V2 for Amazon Kendra, organizations can tap into the repository of information stored in their account securely using intelligent search powered by Amazon Kendra.

To learn about these possibilities and more, refer to the Amazon Kendra Developer Guide. For more information on how you can create, modify, or delete metadata and content when ingesting your data from Confluence, refer to Enriching your documents during ingestion and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.


About the author

Ashish Lagwankar is a Senior Enterprise Solutions Architect at AWS. His core interests include AI/ML, serverless, and container technologies. Ashish is based in the Boston, MA, area and enjoys reading, outdoors, and spending time with his family.

Read More

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

This is a guest blog post co-written with Vik Pant and Kyle Bassett from PwC.

With organizations increasingly investing in machine learning (ML), ML adoption has become an integral part of business transformation strategies. A recent PwC CEO survey unveiled that 84% of Canadian CEOs agree that artificial intelligence (AI) will significantly change their business within the next 5 years, making this technology more critical than ever. However, implementing ML into production comes with various considerations, notably being able to navigate the world of AI safely, strategically, and responsibly. One of the first steps and notably a great challenge to becoming AI powered is effectively developing ML pipelines that can scale sustainably in the cloud. Thinking of ML in terms of pipelines that generate and maintain models rather than models by themselves helps build versatile and resilient prediction systems that are better able to withstand meaningful changes in relevant data over time.

Many organizations start their journey into the world of ML with a model-centric viewpoint. In the early stages of building an ML practice, the focus is on training supervised ML models, which are mathematical representations of relationships between inputs (independent variables) and outputs (dependent variables) that are learned from data (typically historical). Models are mathematical artifacts that take input data, perform calculations and computations on them, and generate predictions or inferences.

Although this approach is a reasonable and relatively simple starting point, it isn’t inherently scalable or intrinsically sustainable due to the manual and ad hoc nature of model training, tuning, testing, and trialing activities. Organizations with greater maturity in the ML domain adopt an ML operations (MLOps) paradigm that incorporates continuous integration, continuous delivery, continuous deployment, and continuous training. Central to this paradigm is a pipeline-centric viewpoint for developing and operating industrial-strength ML systems.

In this post, we start with an overview of MLOps and its benefits, describe a solution to simplify its implementations, and provide details on the architecture. We finish with a case study highlighting the benefits realize by a large AWS and PwC customer who implemented this solution.

Background

An MLOps pipeline is a set of interrelated sequences of steps that are used to build, deploy, operate, and manage one or more ML models in production. Such a pipeline encompasses the stages involved in building, testing, tuning, and deploying ML models, including but not limited to data preparation, feature engineering, model training, evaluation, deployment, and monitoring. As such, an ML model is the product of an MLOps pipeline, and a pipeline is a workflow for creating one or more ML models. Such pipelines support structured and systematic processes for building, calibrating, assessing, and implementing ML models, and the models themselves generate predictions and inferences. By automating the development and operationalization of stages of pipelines, organizations can reduce the time to delivery of models, increase the stability of the models in production, and improve collaboration between teams of data scientists, software engineers, and IT administrators.

Solution overview

AWS offers a comprehensive portfolio of cloud-native services for developing and running MLOps pipelines in a scalable and sustainable manner. Amazon SageMaker comprises a comprehensive portfolio of capabilities as a fully managed MLOps service to enable developers to create, train, deploy, operate, and manage ML models in the cloud. SageMaker covers the entire MLOps workflow, from collecting to preparing and training the data with built-in high-performance algorithms and sophisticated automated ML (AutoML) experiments so that companies can choose specific models that fit their business priorities and preferences. SageMaker enables organizations to collaboratively automate the majority of their MLOps lifecycle so that they can focus on business results without risking project delays or escalating costs. In this way, SageMaker allows businesses to focus on results without worrying about infrastructure, development, and maintenance associated with powering industrial-strength prediction services.

SageMaker includes Amazon SageMaker JumpStart, which offers out-of-the-box solution patterns for organizations seeking to accelerate their MLOps journey. Organizations can start with pre-trained and open-source models that can be fine-tuned to meet their specific needs through retraining and transfer learning. Additionally, JumpStart provides solution templates designed to tackle common use cases, as well as example Jupyter notebooks with prewritten starter code. These resources can be accessed by simply visiting the JumpStart landing page within Amazon SageMaker Studio.

PwC has built a pre-packaged MLOps accelerator that further speeds up time to value and increases return on investment for organizations that use SageMaker. This MLOps accelerator enhances the native capabilities of JumpStart by integrating complementary AWS services. With a comprehensive suite of technical artifacts, including infrastructure as code (IaC) scripts, data processing workflows, service integration code, and pipeline configuration templates, PwC’s MLOps accelerator simplifies the process of developing and operating production-class prediction systems.

Architecture overview

The inclusion of cloud-native serverless services from AWS is prioritized into the architecture of the PwC MLOps accelerator. The entry point into this accelerator is any collaboration tool, such as Slack, that a data scientist or data engineer can use to request an AWS environment for MLOps. Such a request is parsed and then fully or semi-automatically approved using workflow features in that collaboration tool. After a request is approved, its details are used for parameterizing IaC templates. The source code for these IaC templates is managed in AWS CodeCommit. These parameterized IaC templates are submitted to AWS CloudFormation for modeling, provisioning, and managing stacks of AWS and third-party resources.

The following diagram illustrates the workflow.

After AWS CloudFormation provisions an environment for MLOps on AWS, the environment is ready for use by data scientists, data engineers, and their collaborators. The PWC accelerator includes predefined roles on AWS Identity and Access Management (IAM) that are related to MLOps activities and tasks. These roles specify the services and resources in the MLOps environment that can be accessed by various users based on their job profiles. After accessing the MLOps environment, users can access any of the modalities on SageMaker to perform their duties. These include SageMaker notebook instances, Amazon SageMaker Autopilot experiments, and Studio. You can benefit from all SageMaker features and functions, including model training, tuning, evaluation, deployment, and monitoring.

The accelerator also includes connections with Amazon DataZone for sharing, searching, and discovering data at scale across organizational boundaries to generate and enrich models. Similarly, data for training, testing, validating, and detecting model drift can source a variety of services, including Amazon Redshift, Amazon Relational Database Service (Amazon RDS), Amazon Elastic File System (Amazon EFS), and Amazon Simple Storage Service (Amazon S3). Prediction systems can be deployed in many ways, including as SageMaker endpoints directly, SageMaker endpoints wrapped in AWS Lambda functions, and SageMaker endpoints invoked through custom code on Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Compute Cloud (Amazon EC2). Amazon CloudWatch is used to monitor the environment for MLOps on AWS in a comprehensive manner to observe alarms, logs, and events data from across the complete stack (applications, infrastructure, network, and services).

The following diagram illustrates this architecture.

Case study

In this section, we share an illustrative case study from a large insurance company in Canada. It focuses on the transformative impact of the implementation of PwC Canada’s MLOps accelerator and JumpStart templates.

This client partnered with PwC Canada and AWS to address challenges with inefficient model development and ineffective deployment processes, lack of consistency and collaboration, and difficulty in scaling ML models. The implementation of this MLOps Accelerator in concert with JumpStart templates achieved the following:

  • End-to-end automation – Automation nearly halved the amount of time for data preprocessing, model training, hyperparameter tuning, and model deployment and monitoring
  • Collaboration and standardization – Standardized tools and frameworks to promote consistency across the organization nearly doubled the rate of model innovation
  • Model governance and compliance – They implemented a model governance framework to ensure that all ML models met regulatory requirements and adhered to the company’s ethical guidelines, which reduced risk management costs by 40%
  • Scalable cloud infrastructure – They invested in scalable infrastructure to effectively manage massive data volumes and deploy multiple ML models simultaneously, reducing infrastructure and platform costs by 50%
  • Rapid deployment – The prepackaged solution reduced time to production by 70%

By delivering MLOps best practices through rapid deployment packages, our client was able to de-risk their MLOps implementation and unlock the full potential of ML for a range of business functions, such as risk prediction and asset pricing. Overall, the synergy between the PwC MLOps accelerator and JumpStart enabled our client to streamline, scale, secure, and sustain their data science and data engineering activities.

It should be noted that the PwC and AWS solution is not industry specific and is relevant across industries and sectors.

Conclusion

SageMaker and its accelerators allow organizations to enhance the productivity of their ML program. There are many benefits, including but not limited to the following:

  • Collaboratively create IaC, MLOps, and AutoML use cases to realize business benefits from standardization
  • Enable efficient experimental prototyping, with and without code, to turbocharge AI from development to deployment with IaC, MLOps, and AutoML
  • Automate tedious, time-consuming tasks such as feature engineering and hyperparameter tuning with AutoML
  • Employ a continuous model monitoring paradigm to align the risk of ML model usage with enterprise risk appetite

Please contact the authors of this post, AWS Advisory Canada, or PwC Canada to learn more about Jumpstart and PwC’s MLOps accelerator.


About the Authors

Vik PantVik is a Partner in the Cloud & Data practice at PwC Canada He earned a PhD in Information Science from the University of Toronto. He is convinced that there is a telepathic connection between his biological neural network and the artificial neural networks that he trains on SageMaker. Connect with him on LinkedIn.

Kyle is a Partner in the Cloud & Data practice at PwC Canada, along with his crack team of tech alchemists, they weave enchanting MLOPs solutions that mesmerize clients with accelerated business value. Armed with the power of artificial intelligence and a sprinkle of wizardry, Kyle turns complex challenges into digital fairy tales, making the impossible possible. Connect with him on LinkedIn.

Francois ChevallierFrancois is a Principal Advisory Consultant with AWS Professional Services Canada and the Canadian practice lead for Data and Innovation Advisory. He guides customers to establish and implement their overall cloud journey and their data programs, focusing on vision, strategy, business drivers, governance, target operating models, and roadmaps. Connect with him on LinkedIn.

Read More

Deploy generative AI models from Amazon SageMaker JumpStart using the AWS CDK

Deploy generative AI models from Amazon SageMaker JumpStart using the AWS CDK

The seeds of a machine learning (ML) paradigm shift have existed for decades, but with the ready availability of virtually infinite compute capacity, a massive proliferation of data, and the rapid advancement of ML technologies, customers across industries are rapidly adopting and using ML technologies to transform their businesses.

Just recently, generative AI applications have captured everyone’s attention and imagination. We are truly at an exciting inflection point in the widespread adoption of ML, and we believe every customer experience and application will be reinvented with generative AI.

Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. Like all AI, generative AI is powered by ML models—very large models that are pre-trained on vast corpora of data and commonly referred to as foundation models (FMs).

The size and general-purpose nature of FMs make them different from traditional ML models, which typically perform specific tasks, like analyzing text for sentiment, classifying images, and forecasting trends.

With tradition ML models, in order to achieve each specific task, you need to gather labeled data, train a model, and deploy that model. With foundation models, instead of gathering labeled data for each model and training multiple models, you can use the same pre-trained FM to adapt various tasks. You can also customize FMs to perform domain-specific functions that are differentiating to your businesses, using only a small fraction of the data and compute required to train a model from scratch.

Generative AI has the potential to disrupt many industries by revolutionizing the way content is created and consumed. Original content production, code generation, customer service enhancement, and document summarization are typical use cases of generative AI.

Amazon SageMaker JumpStart provides pre-trained, open-source models for a wide range of problem types to help you get started with ML. You can incrementally train and tune these models before deployment. JumpStart also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for ML with Amazon SageMaker.

With over 600 pre-trained models available and growing every day, JumpStart enables developers to quickly and easily incorporate cutting-edge ML techniques into their production workflows. You can access the pre-trained models, solution templates, and examples through the JumpStart landing page in Amazon SageMaker Studio. You can also access JumpStart models using the SageMaker Python SDK. For information about how to use JumpStart models programmatically, see Use SageMaker JumpStart Algorithms with Pretrained Models.

In April 2023, AWS unveiled Amazon Bedrock, which provides a way to build generative AI-powered apps via pre-trained models from startups including AI21 Labs, Anthropic, and Stability AI. Amazon Bedrock also offers access to Titan foundation models, a family of models trained in-house by AWS. With the serverless experience of Amazon Bedrock, you can easily find the right model for your needs, get started quickly, privately customize FMs with your own data, and easily integrate and deploy them into your applications using the AWS tools and capabilities you’re familiar with (including integrations with SageMaker ML features like Amazon SageMaker Experiments to test different models and Amazon SageMaker Pipelines to manage your FMs at scale) without having to manage any infrastructure.

In this post, we show how to deploy image and text generative AI models from JumpStart using the AWS Cloud Development Kit (AWS CDK). The AWS CDK is an open-source software development framework to define your cloud application resources using familiar programming languages like Python.

We use the Stable Diffusion model for image generation and the FLAN-T5-XL model for natural language understanding (NLU) and text generation from Hugging Face in JumpStart.

Solution overview

The web application is built on Streamlit, an open-source Python library that makes it easy to create and share beautiful, custom web apps for ML and data science. We host the web application using Amazon Elastic Container Service (Amazon ECS) with AWS Fargate and it is accessed via an Application Load Balancer. Fargate is a technology that you can use with Amazon ECS to run containers without having to manage servers or clusters or virtual machines. The generative AI model endpoints are launched from JumpStart images in Amazon Elastic Container Registry (Amazon ECR). Model data is stored on Amazon Simple Storage Service (Amazon S3) in the JumpStart account. The web application interacts with the models via Amazon API Gateway and AWS Lambda functions as shown in the following diagram.

API Gateway provides the web application and other clients a standard RESTful interface, while shielding the Lambda functions that interface with the model. This simplifies the client application code that consumes the models. The API Gateway endpoints are publicly accessible in this example, allowing for the possibility to extend this architecture to implement different API access controls and integrate with other applications.

In this post, we walk you through the following steps:

  1. Install the AWS Command Line Interface (AWS CLI) and AWS CDK v2 on your local machine.
  2. Clone and set up the AWS CDK application.
  3. Deploy the AWS CDK application.
  4. Use the image generation AI model.
  5. Use the text generation AI model.
  6. View the deployed resources on the AWS Management Console.

We provide an overview of the code in this project in the appendix at the end of this post.

Prerequisites

You must have the following prerequisites:

You can deploy the infrastructure in this tutorial from your local computer or you can use AWS Cloud9 as your deployment workstation. AWS Cloud9 comes pre-loaded with AWS CLI, AWS CDK and Docker. If you opt for AWS Cloud9, create the environment from the AWS console.

The estimated cost to complete this post is $50, assuming you leave the resources running for 8 hours. Make sure you delete the resources you create in this post to avoid ongoing charges.

Install the AWS CLI and AWS CDK on your local machine

If you don’t already have the AWS CLI on your local machine, refer to Installing or updating the latest version of the AWS CLI and Configuring the AWS CLI.

Install the AWS CDK Toolkit globally using the following node package manager command:

$ npm install -g aws-cdk-lib@latest

Run the following command to verify the correct installation and print the version number of the AWS CDK:

$ cdk --version

Make sure you have Docker installed on your local machine. Issue the following command to verify the version:

$ docker --version

Clone and set up the AWS CDK application

On your local machine, clone the AWS CDK application with the following command:

$ git clone https://github.com/aws-samples/generative-ai-sagemaker-cdk-demo.git

Navigate to the project folder:

$ cd generative-ai-sagemaker-cdk-demo

Before we deploy the application, let’s review the directory structure:

.
├── LICENSE
├── README.md
├── app.py
├── cdk.json
├── code
│   ├── lambda_txt2img
│   │   └── txt2img.py
│   └── lambda_txt2nlu
│       └── txt2nlu.py
├── construct
│   └── sagemaker_endpoint_construct.py
├── images
│   ├── architecture.png
│   ├── ...
├── requirements-dev.txt
├── requirements.txt
├── source.bat
├── stack
│   ├── __init__.py
│   ├── generative_ai_demo_web_stack.py
│   ├── generative_ai_txt2img_sagemaker_stack.py
│   ├── generative_ai_txt2nlu_sagemaker_stack.py
│   └── generative_ai_vpc_network_stack.py
├── tests
│   ├── __init__.py
│   └── ...
└── web-app
    ├── Dockerfile
    ├── Home.py
    ├── configs.py
    ├── img
    │   └── sagemaker.png
    ├── pages
    │   ├── 2_Image_Generation.py
    │   └── 3_Text_Generation.py
    └── requirements.txt

The stack folder contains the code for each stack in the AWS CDK application. The code folder contains the code for the Lambda functions. The repository also contains the web application located under the folder web-app.

The cdk.json file tells the AWS CDK Toolkit how to run your application.

This application was tested in the us-east-1 Region, but it should work in any Region that has the required services and inference instance type ml.g4dn.4xlarge specified in app.py.

Set up a virtual environment

This project is set up like a standard Python project. Create a Python virtual environment using the following code:

$ python3 -m venv .venv

Use the following command to activate the virtual environment:

$ source .venv/bin/activate

If you’re on a Windows platform, activate the virtual environment as follows:

% .venvScriptsactivate.bat

After the virtual environment is activated, upgrade pip to the latest version:

$ python3 -m pip install --upgrade pip

Install the required dependencies:

$ pip install -r requirements.txt

Before you deploy any AWS CDK application, you need to bootstrap a space in your account and the Region you’re deploying into. To bootstrap in your default Region, issue the following command:

$ cdk bootstrap

If you want to deploy into a specific account and Region, issue the following command:

$ cdk bootstrap aws://ACCOUNT-NUMBER/REGION

For more information about this setup, visit Getting started with the AWS CDK.

AWS CDK application stack structure

The AWS CDK application contains multiple stacks, as shown in the following diagram.

You can list the stacks in your AWS CDK application with the following command:

$ cdk list

GenerativeAiTxt2imgSagemakerStack
GenerativeAiTxt2nluSagemakerStack
GenerativeAiVpcNetworkStack
GenerativeAiDemoWebStack

The following are other useful AWS CDK commands:

  • cdk ls – Lists all stacks in the app
  • cdk synth – Emits the synthesized AWS CloudFormation template
  • cdk deploy – Deploys this stack to your default AWS account and Region
  • cdk diff – Compares the deployed stack with current state
  • cdk docs – Opens the AWS CDK documentation

The next section shows you how to deploy the AWS CDK application.

Deploy the AWS CDK application

The AWS CDK application will be deployed to the default Region based on your workstation configuration. If you want to force the deployment in a specific Region, set your AWS_DEFAULT_REGION environment variable accordingly.

At this point, you can deploy the AWS CDK application. First you launch the VPC network stack:

$ cdk deploy GenerativeAiVpcNetworkStack

If you are prompted, enter y to proceed with the deployment. You should see a list of AWS resources that are being provisioned in the stack. This step takes around 3 minutes to complete.

Then you launch the web application stack:

$ cdk deploy GenerativeAiDemoWebStack

After analyzing the stack, the AWS CDK will display the resource list in the stack. Enter y to proceed with the deployment. This step takes around 5 minutes.

Note down the WebApplicationServiceURL from the output to use later. You can also retrieve it on the AWS CloudFormation console, under the GenerativeAiDemoWebStack stack outputs.

Now, launch the image generation AI model endpoint stack:

$ cdk deploy GenerativeAiTxt2imgSagemakerStack

This step takes around 8 minutes. The image generation model endpoint is deployed, we can now use it.

Use the image generation AI model

The first example demonstrates how to utilize Stable Diffusion, a powerful generative modeling technique that enables the creation of high-quality images from text prompts.

  1. Access the web application using the WebApplicationServiceURL from the output of GenerativeAiDemoWebStack in your browser.
  2. In the navigation pane, choose Image Generation.
  3. The SageMaker Endpoint Name and API GW Url fields will be pre-populated, but you can change the prompt for the image description if you’d like.
  4. Choose Generate image.
  5. The application will make a call to the SageMaker endpoint. It takes a few seconds. A picture with the characteristics in your image description will be displayed.

Use the text generation AI model

The second example centers around using the FLAN-T5-XL model, which is a foundation or large language model (LLM), to achieve in-context learning for text generation while also addressing a broad range of natural language understanding (NLU) and natural language generation (NLG) tasks.

Some environments might limit the number of endpoints you can launch at a time. If this is the case, you can launch one SageMaker endpoint at a time. To stop a SageMaker endpoint in the AWS CDK app, you have to destroy the deployed endpoint stack and before launching the other endpoint stack. To turn down the image generation AI model endpoint, issue the following command:

$ cdk destroy GenerativeAiTxt2imgSagemakerStack

Then launch the text generation AI model endpoint stack:

$ cdk deploy GenerativeAiTxt2nluSagemakerStack

Enter y at the prompts.

After the text generation model endpoint stack is launched, complete the following steps:

  1. Go back to the web application and choose Text Generation in the navigation pane.
  2. The Input Context field is pre-populated with a conversation between a customer and an agent regarding an issue with the customers phone, but you can enter your own context if you’d like.
  3. Below the context, you will find some pre-populated queries on the drop-down menu. Choose a query and choose Generate Response.
  4. You can also enter your own query in the Input Query field and then choose Generate Response.

View the deployed resources on the console

On the AWS CloudFormation console, choose Stacks in the navigation pane to view the stacks deployed.

On the Amazon ECS console, you can see the clusters on the Clusters page.

On the AWS Lambda console, you can see the functions on the Functions page.

On the API Gateway console, you can see the API Gateway endpoints on the APIs page.

On the SageMaker console, you can see the deployed model endpoints on the Endpoints page.

When the stacks are launched, some parameters are generated. These are stored in the AWS Systems Manager Parameter Store. To view them, choose Parameter Store in the navigation pane on the AWS Systems Manager console.

Clean up

To avoid unnecessary cost, clean up all the infrastructure created with the following command on your workstation:

$ cdk destroy --all

Enter y at the prompt. This step takes around 10 minutes. Check if all resources are deleted on the console. Also delete the assets S3 buckets created by the AWS CDK on the Amazon S3 console as well as the assets repositories on Amazon ECR.

Conclusion

As demonstrated in this post, you can use the AWS CDK to deploy generative AI models in JumpStart. We showed an image generation example and a text generation example using a user interface powered by Streamlit, Lambda, and API Gateway.

You can now build your generative AI projects using pre-trained AI models in JumpStart. You can also extend this project to fine-tune the foundation models for your use case and control access to API Gateway endpoints.

We invite you to test the solution and contribute to the project on GitHub. Share your thoughts on this tutorial in the comments!

License summary

This sample code is made available under a modified MIT license. See the LICENSE file for more information. Also, review the respective licenses for the stable diffusion and flan-t5-xl models on Hugging Face.


About the authors

Hantzley Tauckoor is an APJ Partner Solutions Architecture Leader based in Singapore. He has 20 years’ experience in the ICT industry spanning multiple functional areas, including solutions architecture, business development, sales strategy, consulting, and leadership. He leads a team of Senior Solutions Architects that enable partners to develop joint solutions, build technical capabilities, and steer them through the implementation phase as customers migrate and modernize their applications to AWS.

Kwonyul Choi is a CTO at BABITALK, a Korean beauty care platform startup, based in Seoul. Prior to this role, Kownyul worked as Software Development Engineer at AWS with a focus on AWS CDK and Amazon SageMaker.

Arunprasath Shankar is a Senior AI/ML Specialist Solutions Architect with AWS, helping global customers scale their AI solutions effectively and efficiently in the cloud. In his spare time, Arun enjoys watching sci-fi movies and listening to classical music.

Satish Upreti is a Migration Lead PSA and Security SME in the partner organization in APJ. Satish has 20 years of experience spanning on-premises private cloud and public cloud technologies. Since joining AWS in August 2020 as a migration specialist, he provides extensive technical advice and support to AWS partners to plan and implement complex migrations.


Appendix: Code walkthrough

In this section, we provide an overview of the code in this project.

AWS CDK application

The main AWS CDK application is contained in the app.py file in the root directory. The project consists of multiple stacks, so we have to import the stacks:

#!/usr/bin/env python3
import aws_cdk as cdk

from stack.generative_ai_vpc_network_stack import GenerativeAiVpcNetworkStack
from stack.generative_ai_demo_web_stack import GenerativeAiDemoWebStack
from stack.generative_ai_txt2nlu_sagemaker_stack import GenerativeAiTxt2nluSagemakerStack
from stack.generative_ai_txt2img_sagemaker_stack import GenerativeAiTxt2imgSagemakerStack

We define our generative AI models and get the related URIs from SageMaker:

from script.sagemaker_uri import *
import boto3

region_name = boto3.Session().region_name
env={"region": region_name}

#Text to Image model parameters
TXT2IMG_MODEL_ID = "model-txt2img-stabilityai-stable-diffusion-v2-1-base"
TXT2IMG_INFERENCE_INSTANCE_TYPE = "ml.g4dn.4xlarge" 
TXT2IMG_MODEL_TASK_TYPE = "txt2img"
TXT2IMG_MODEL_INFO = get_sagemaker_uris(model_id=TXT2IMG_MODEL_ID,
                                        model_task_type=TXT2IMG_MODEL_TASK_TYPE, 
                                        instance_type=TXT2IMG_INFERENCE_INSTANCE_TYPE,
                                        region_name=region_name)

#Text to NLU image model parameters
TXT2NLU_MODEL_ID = "huggingface-text2text-flan-t5-xl"
TXT2NLU_INFERENCE_INSTANCE_TYPE = "ml.g4dn.4xlarge" 
TXT2NLU_MODEL_TASK_TYPE = "text2text"
TXT2NLU_MODEL_INFO = get_sagemaker_uris(model_id=TXT2NLU_MODEL_ID,
                                        model_task_type=TXT2NLU_MODEL_TASK_TYPE,
                                        instance_type=TXT2NLU_INFERENCE_INSTANCE_TYPE,
                                        region_name=region_name)

The function get_sagemaker_uris retrieves all the model information from JumpStart. See script/sagemaker_uri.py.

Then, we instantiate the stacks:

app = cdk.App()

network_stack = GenerativeAiVpcNetworkStack(app, "GenerativeAiVpcNetworkStack", env=env)
GenerativeAiDemoWebStack(app, "GenerativeAiDemoWebStack", vpc=network_stack.vpc, env=env)

GenerativeAiTxt2nluSagemakerStack(app, "GenerativeAiTxt2nluSagemakerStack", env=env, model_info=TXT2NLU_MODEL_INFO)
GenerativeAiTxt2imgSagemakerStack(app, "GenerativeAiTxt2imgSagemakerStack", env=env, model_info=TXT2IMG_MODEL_INFO)

app.synth()

The first stack to launch is the VPC stack, GenerativeAiVpcNetworkStack. The web application stack, GenerativeAiDemoWebStack, is dependent on the VPC stack. The dependency is done through parameter passing vpc=network_stack.vpc.

See app.py for the full code.

VPC network stack

In the GenerativeAiVpcNetworkStack stack, we create a VPC with a public subnet and a private subnet spanning across two Availability Zones:

        self.output_vpc = ec2.Vpc(self, "VPC",
            nat_gateways=1,
            ip_addresses=ec2.IpAddresses.cidr("10.0.0.0/16"),
            max_azs=2,
            subnet_configuration=[
                ec2.SubnetConfiguration(name="public",subnet_type=ec2.SubnetType.PUBLIC,cidr_mask=24),
                ec2.SubnetConfiguration(name="private",subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS,cidr_mask=24)
            ]
        )

See /stack/generative_ai_vpc_network_stack.py for the full code.

Demo web application stack

In the GenerativeAiDemoWebStack stack, we launch Lambda functions and respective API Gateway endpoints through which the web application interacts with the SageMaker model endpoints. See the following code snippet:

        # Defines an AWS Lambda function for Image Generation service
        lambda_txt2img = _lambda.Function(
            self, "lambda_txt2img",
            runtime=_lambda.Runtime.PYTHON_3_9,
            code=_lambda.Code.from_asset("code/lambda_txt2img"),
            handler="txt2img.lambda_handler",
            role=role,
            timeout=Duration.seconds(180),
            memory_size=512,
            vpc_subnets=ec2.SubnetSelection(
                subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS
            ),
            vpc=vpc
        )
        
        # Defines an Amazon API Gateway endpoint for Image Generation service
        txt2img_apigw_endpoint = apigw.LambdaRestApi(
            self, "txt2img_apigw_endpoint",
            handler=lambda_txt2img
        )

The web application is containerized and hosted on Amazon ECS with Fargate. See the following code snippet:

        # Create Fargate service
        fargate_service = ecs_patterns.ApplicationLoadBalancedFargateService(
            self, "WebApplication",
            cluster=cluster,            # Required
            cpu=2048,                   # Default is 256 (512 is 0.5 vCPU, 2048 is 2 vCPU)
            desired_count=1,            # Default is 1
            task_image_options=ecs_patterns.ApplicationLoadBalancedTaskImageOptions(
                image=image, 
                container_port=8501,
                ),
            #load_balancer_name="gen-ai-demo",
            memory_limit_mib=4096,      # Default is 512
            public_load_balancer=True)  # Default is True

See /stack/generative_ai_demo_web_stack.py for the full code.

Image generation SageMaker model endpoint stack

The GenerativeAiTxt2imgSagemakerStack stack creates the image generation model endpoint from JumpStart and stores the endpoint name in Systems Manager Parameter Store. This parameter will be used by the web application. See the following code:

        endpoint = SageMakerEndpointConstruct(self, "TXT2IMG",
                                    project_prefix = "GenerativeAiDemo",
                                    
                                    role_arn= role.role_arn,

                                    model_name = "StableDiffusionText2Img",
                                    model_bucket_name = model_info["model_bucket_name"],
                                    model_bucket_key = model_info["model_bucket_key"],
                                    model_docker_image = model_info["model_docker_image"],

                                    variant_name = "AllTraffic",
                                    variant_weight = 1,
                                    instance_count = 1,
                                    instance_type = model_info["instance_type"],

                                    environment = {
                                        "MMS_MAX_RESPONSE_SIZE": "20000000",
                                        "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
                                        "SAGEMAKER_PROGRAM": "inference.py",
                                        "SAGEMAKER_REGION": model_info["region_name"],
                                        "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
                                    },

                                    deploy_enable = True
        )
        
        ssm.StringParameter(self, "txt2img_sm_endpoint", parameter_name="txt2img_sm_endpoint", string_value=endpoint.endpoint_name)

See /stack/generative_ai_txt2img_sagemaker_stack.py for the full code.

NLU and text generation SageMaker model endpoint stack

The GenerativeAiTxt2nluSagemakerStack stack creates the NLU and text generation model endpoint from JumpStart and stores the endpoint name in Systems Manager Parameter Store. This parameter will also be used by the web application. See the following code:

        endpoint = SageMakerEndpointConstruct(self, "TXT2NLU",
                                    project_prefix = "GenerativeAiDemo",
                                    
                                    role_arn= role.role_arn,

                                    model_name = "HuggingfaceText2TextFlan",
                                    model_bucket_name = model_info["model_bucket_name"],
                                    model_bucket_key = model_info["model_bucket_key"],
                                    model_docker_image = model_info["model_docker_image"],

                                    variant_name = "AllTraffic",
                                    variant_weight = 1,
                                    instance_count = 1,
                                    instance_type = model_info["instance_type"],

                                    environment = {
                                        "MODEL_CACHE_ROOT": "/opt/ml/model",
                                        "SAGEMAKER_ENV": "1",
                                        "SAGEMAKER_MODEL_SERVER_TIMEOUT": "3600",
                                        "SAGEMAKER_MODEL_SERVER_WORKERS": "1",
                                        "SAGEMAKER_PROGRAM": "inference.py",
                                        "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code/",
                                        "TS_DEFAULT_WORKERS_PER_MODEL": "1"
                                    },

                                    deploy_enable = True
        )
        
        ssm.StringParameter(self, "txt2nlu_sm_endpoint", parameter_name="txt2nlu_sm_endpoint", string_value=endpoint.endpoint_name)

See /stack/generative_ai_txt2nlu_sagemaker_stack.py for the full code.

Web application

The web application is located in the /web-app directory. It is a Streamlit application that is containerized as per the Dockerfile:

FROM python:3.9
EXPOSE 8501
WORKDIR /app
COPY requirements.txt ./requirements.txt
RUN pip3 install -r requirements.txt
COPY . .
CMD streamlit run Home.py 
    --server.headless true 
    --browser.serverAddress="0.0.0.0" 
    --server.enableCORS false 
    --browser.gatherUsageStats false

To learn more about Streamlit, see Streamlit documentation.

Read More

Instruction fine-tuning for FLAN T5 XL with Amazon SageMaker Jumpstart

Instruction fine-tuning for FLAN T5 XL with Amazon SageMaker Jumpstart

Generative AI is in the midst of a period of stunning growth. Increasingly capable foundation models are being released continuously, with large language models (LLMs) being one of the most visible model classes. LLMs are models composed of billions of parameters trained on extensive corpora of text, up to hundreds of billions or even a trillion tokens. These models have proven extremely effective for a wide range of text-based tasks, from question answering to sentiment analysis.

The power of LLMs comes from their capacity to learn and generalize from extensive and diverse training data. The initial training of these models is performed with a variety of objectives, supervised, unsupervised, or hybrid. Text completion or imputation is one of the most common unsupervised objectives: given a chunk of text, the model learns to accurately predict what comes next (for example, predict the next sentence). Models can also be trained in a supervised fashion using labeled data to accomplish a set of tasks (for example, is this movie review positive, negative, or neutral). Whether the model is trained for text completion or some other task, it is frequently not the task customers want to use the model for.

To improve the performance of a pre-trained LLM on a specific task, we can tune the model using examples of the target task in a process known as instruction fine-tuning. Instruction fine-tuning uses a set of labeled examples in the form of {prompt, response} pairs to further train the pre-trained model in adequately predicting the response given the prompt. This process modifies the weights of the model.

This post describes how to perform instruction fine-tuning of an LLM, namely FLAN T5 XL, using Amazon SageMaker Jumpstart. We demonstrate how to accomplish this using both the Jumpstart UI and a notebook in Amazon SageMaker Studio. You can find the accompanying notebook in the amazon-sagemaker-examples GitHub repository.

Solution overview

The target task in this post is to, given a chunk of text in the prompt, return questions that are related to the text but can’t be answered based on the information it contains. This is a useful task to identify missing information in a description or identify whether a query needs more information to be answered.

FLAN T5 models are instruction fine-tuned on a wide range of tasks to increase the zero-shot performance of these models on many common tasks[1]. Additional instruction fine-tuning for a particular customer task can further increase the accuracy of these models, especially if the target task wasn’t previously used to train a FLAN T5 model, as is the case for our task.

In our example task, we’re interested in generating relevant but unanswered questions. To this end, we use a subset of the version 2 of the Stanford Question Answering Dataset (SQuAD2.0)[2] to fine-tune the model. This dataset contains questions posed by human annotators on a set of Wikipedia articles. In addition to questions with answers, SQuAD2.0 contains about 50,000 unanswerable questions. Such questions are plausible but can’t be directly answered from articles’ content. We only use the unanswerable questions. Our data is structured as a JSON Lines file, with each line containing a context and a question.

Screenshot of a few entries of the SQuADv2 dataset.

Prerequisites

To get started, all you need is an AWS account in which you can use Studio. You will need to create a user profile for Studio if you don’t already have one.

Fine-tune FLAN-T5 with the Jumpstart UI

To fine-tune the model with the Jumpstart UI, complete the following steps:

  1. On the SageMaker console, open Studio.
  2. Under SageMaker Jumpstart in the navigation pane, choose Models, notebooks, solutions.

You will see a list of foundation models, including FLAN T5 XL, which is marked as fine-tunable.

  1. Choose View model.

The JumpStart UI with FLAN-T5 XL.

  1. Under Data source, you can provide the path to your training data. The source for the data used in this post is provided by default.
  2. You can keep the default value for the deployment configuration (including instance type), security, and the hyperparameters, but you should increase the number of epochs to at least three to get good results.
  3. Choose Train to train the model.

The JumpStart train UI for the FLAN-T5 XL model.

You can track the status of the training job in the UI.

Jumpstart UI for training in progress.

  1. When training is complete (after about 53 minutes in our case), choose Deploy to deploy the fine-tuned model.

JumpStart UI training complete.

After the endpoint is created (a few minutes), you can open a notebook and start using your fine-tuned model.

Fine-tune FLAN-T5 using a Python notebook

Our example notebook shows how to use Jumpstart and SageMaker to programmatically fine-tune and deploy a FLAN T5 XL model. It can be run in Studio or locally.

In this section, we first walk through some general setup. Then you fine-tune the model using the SQuADv2 datasets. Next, you deploy the pre-trained version of the model behind a SageMaker endpoint, and do the same with the fine-tuned model. Finally, you can query the endpoints and compare the quality of the output of the pre-trained and fine-tuned model. You will find that the output of the fine-tuned model is of much higher quality.

Set up prerequisites

Begin by installing and upgrading the necessary packages. Restart the kernel after running the following code:

!pip install nest-asyncio==1.5.5 --quiet
!pip install ipywidgets==8.0.4 --quiet
!pip install --upgrade sagemaker --quiet

Next, obtain the execution role associated with the current notebook instance:

import boto3
import sagemaker
# Get current region, role, and default bucket
aws_region = boto3.Session().region_name
aws_role = sagemaker.session.Session().get_caller_identity_arn()
output_bucket = sagemaker.Session().default_bucket()
# This will be useful for printing
newline, bold, unbold = "n", "33[1m", "33[0m"
print(f"{bold}aws_region:{unbold} {aws_region}")
print(f"{bold}aws_role:{unbold} {aws_role}")
print(f"{bold}output_bucket:{unbold} {output_bucket}"

You can define a convenient drop-down menu that will list the model sizes available for fine-tuning:

import IPython
from ipywidgets import Dropdown
from sagemaker.jumpstart.filters import And
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models
# Default model choice
model_id = "huggingface-text2text-flan-t5-xl"
# Identify FLAN T5 models that support fine-tuning
filter_value = And(
"task == text2text", "framework == huggingface", "training_supported == true"
)
model_list = [m for m in list_jumpstart_models(filter=filter_value) if "flan-t5" in m]
# Display the model IDs in a dropdown, for user to select
dropdown = Dropdown(
value=model_id,
options=model_list,
description="FLAN T5 models available for fine-tuning:",
style={"description_width": "initial"},
layout={"width": "max-content"},
)
display(IPython.display.Markdown("### Select a pre-trained model from the dropdown below"))
display(dropdown)

Jumpstart automatically retrieves appropriate training and inference instance types for the model that you chose:

from sagemaker.instance_types import retrieve_default
model_id, model_version = dropdown.value, "*"
# Instance types for training and inference
training_instance_type = retrieve_default(
model_id=model_id, model_version=model_version, scope="training"
)
inference_instance_type = retrieve_default(
model_id=model_id, model_version=model_version, scope="inference"
)
print(f"{bold}model_id:{unbold} {model_id}")
print(f"{bold}training_instance_type:{unbold} {training_instance_type}")
print(f"{bold}inference_instance_type:{unbold} {inference_instance_type}")

If you have chosen the FLAN T5 XL, you will see the following output:

model_id: huggingface-text2text-flan-t5-xl

training_instance_type: ml.p3.16xlarge

inference_instance_type: ml.g5.2xlarge

You’re now ready to start fine-tuning.

Retrain the model on the fine-tuning dataset

After your setup is complete, complete the following steps:

Use the following code to retrieve the URI for the artifacts needed:

from sagemaker import image_uris, model_uris, script_uris
# Training instance will use this image
train_image_uri = image_uris.retrieve(
region=aws_region,
framework=None,  # automatically inferred from model_id
model_id=model_id,
model_version=model_version,
image_scope="training",
instance_type=training_instance_type,
)
# Pre-trained model
train_model_uri = model_uris.retrieve(
model_id=model_id, model_version=model_version, model_scope="training"
)
# Script to execute on the training instance
train_script_uri = script_uris.retrieve(
model_id=model_id, model_version=model_version, script_scope="training"
)
print(f"{bold}image uri:{unbold} {train_image_uri}")
print(f"{bold}model uri:{unbold} {train_model_uri}")
print(f"{bold}script uri:{unbold} {train_script_uri}")

The training data is located in a public Amazon Simple Storage Service (Amazon S3) bucket.

Use the following code to point to the location of the data and set up the output location in a bucket in your account:

from sagemaker.s3 import S3Downloader

# We will use the train split of SQuAD2.0
original_data_file = "train-v2.0.json"

# The data was mirrored in the following bucket
original_data_location = f"s3://sagemaker-sample-files/datasets/text/squad2.0/{original_data_file}"
S3Downloader.download(original_data_location, ".")

The original data is not in a format that corresponds to the task for which you are fine-tuning the model, so you can reformat it:

import json

local_data_file = "task-data.jsonl"  # any name with .jsonl extension

with open(original_data_file) as f:
data = json.load(f)

with open(local_data_file, "w") as f:
for article in data["data"]:
for paragraph in article["paragraphs"]:
# iterate over questions for a given paragraph
for qas in paragraph["qas"]:
if qas["is_impossible"]:
# the question is relevant, but cannot be answered
example = {"context": paragraph["context"], "question": qas["question"]}
json.dump(example, f)
f.write("n")

template = {
"prompt": "Ask a question which is related to the following text, but cannot be answered based on the text. Text: {context}",
"completion": "{question}",
}
with open("template.json", "w") as f:
json.dump(template, f)

from sagemaker.s3 import S3Uploader

train_data_location = f"s3://{output_bucket}/train_data"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"{bold}training data:{unbold} {train_data_location}")

Now you can define some hyperparameters for the training:

from sagemaker import hyperparameters

# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)

# We will override some default hyperparameters with custom values
hyperparameters["epochs"] = "3"
# TODO
# hyperparameters["max_input_length"] = "300"  # data inputs will be truncated at this length
# hyperparameters["max_output_length"] = "40"  # data outputs will be truncated at this length
# hyperparameters["generation_max_length"] = "40"  # max length of generated output
print(hyperparameters)

You are now ready to launch the training job:

from sagemaker.estimator import Estimator
from sagemaker.utils import name_from_base

model_name = "-".join(model_id.split("-")[2:])  # get the most informative part of ID
training_job_name = name_from_base(f"js-demo-{model_name}-{hyperparameters['epochs']}")
print(f"{bold}job name:{unbold} {training_job_name}")

training_metric_definitions = [
{"Name": "val_loss", "Regex": "'eval_loss': ([0-9\.]+)"},
{"Name": "train_loss", "Regex": "'loss': ([0-9\.]+)"},
{"Name": "epoch", "Regex": "'epoch': ([0-9\.]+)"},
]

# Create SageMaker Estimator instance
sm_estimator = Estimator(
role=aws_role,
image_uri=train_image_uri,
model_uri=train_model_uri,
source_dir=train_script_uri,
entry_point="transfer_learning.py",
instance_count=1,
instance_type=training_instance_type,
volume_size=300,
max_run=360000,
hyperparameters=hyperparameters,
output_path=output_location,
metric_definitions=training_metric_definitions,
)

# Launch a SageMaker training job over data located in the given S3 path
# Training jobs can take hours, it is recommended to set wait=False,
# and monitor job status through SageMaker console
sm_estimator.fit({"training": train_data_location}, job_name=training_job_name, wait=False)

Depending on the size of the fine-tuning data and model chosen, the fine-tuning could take up to a couple of hours.

You can monitor performance metrics such as training and validation loss using Amazon CloudWatch during training. Conveniently, you can also fetch the most recent snapshot of metrics by running the following code:

from sagemaker import TrainingJobAnalytics

# This can be called while the job is still running
df = TrainingJobAnalytics(training_job_name=training_job_name).dataframe()
df.head(10)

model uri: s3://sagemaker-us-west-2-802376408542/avkan/training-huggingface-text2text-huggingface-text2text-flan-t5-xl-repack.tar.gz
job name: jumpstart-demo-xl-3-2023-04-06-08-16-42-738
INFO:sagemaker:Creating training-job with name: jumpstart-demo-xl-3-2023-04-06-08-16-42-738

When the training is complete, you have a fine-tuned model at model_uri. Let’s use it!

You can create two inference endpoints: one for the original pre-trained model, and one for the fine-tuned model. This allows you to compare the output of both versions of the model. In the next step, you deploy an inference endpoint for the pre-trained model. Then you deploy an endpoint for your fine-tuned model.

Deploy the pre-trained model

Let’s start by deploying the pre-trained model retrieve the inference Docker image URI. This is the base Hugging Face container image. Use the following code:

from sagemaker import image_uris

# Retrieve the inference docker image URI. This is the base HuggingFace container image
deploy_image_uri = image_uris.retrieve(
region=None,
framework=None,  # automatically inferred from model_id
model_id=model_id,
model_version=model_version,
image_scope="inference",
instance_type=inference_instance_type,
)

You can now create the endpoint and deploy the pre-trained model. Note that you need to pass the Predictor class when deploying model through the Model class to be able to run inference through the SageMaker API. See the following code:

from sagemaker import model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

# Retrieve the URI of the pre-trained model
pre_trained_model_uri = model_uris.retrieve(
model_id=model_id, model_version=model_version, model_scope="inference"
)

pre_trained_name = name_from_base(f"jumpstart-demo-pre-trained-{model_id}")

# Create the SageMaker model instance of the pre-trained model
if ("small" in model_id) or ("base" in model_id):
deploy_source_uri = script_uris.retrieve(
model_id=model_id, model_version=model_version, script_scope="inference"
)
pre_trained_model = Model(
image_uri=deploy_image_uri,
source_dir=deploy_source_uri,
entry_point="inference.py",
model_data=pre_trained_model_uri,
role=aws_role,
predictor_cls=Predictor,
name=pre_trained_name,
)
else:
# For those large models, we already repack the inference script and model
# artifacts for you, so the `source_dir` argument to Model is not required.
pre_trained_model = Model(
image_uri=deploy_image_uri,
model_data=pre_trained_model_uri,
role=aws_role,
predictor_cls=Predictor,
name=pre_trained_name,
)

print(f"{bold}image URI:{unbold}{newline} {deploy_image_uri}")
print(f"{bold}model URI:{unbold}{newline} {pre_trained_model_uri}")
print("Deploying an endpoint ...")

# Deploy the pre-trained model. Note that we need to pass Predictor class when we deploy model
# through Model class, for being able to run inference through the SageMaker API
pre_trained_predictor = pre_trained_model.deploy(
initial_instance_count=1,
instance_type=inference_instance_type,
predictor_cls=Predictor,
endpoint_name=pre_trained_name,
)
print(f"{newline}Deployed an endpoint {pre_trained_name}")

The endpoint creation and model deployment can take a few minutes, then your endpoint is ready to receive inference calls.

Deploy the fine-tuned model

Let’s deploy the fine-tuned model to its own endpoint. The process is almost identical to the one we used earlier for the pre-trained model. The only difference is that we use the fine-tuned model name and URI:

from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

fine_tuned_name = name_from_base(f"jumpstart-demo-fine-tuned-{model_id}")
fine_tuned_model_uri = f"{output_location}{training_job_name}/output/model.tar.gz"

# Create the SageMaker model instance of the fine-tuned model
fine_tuned_model = Model(
image_uri=deploy_image_uri,
model_data=fine_tuned_model_uri,
role=aws_role,
predictor_cls=Predictor,
name=fine_tuned_name,
)

print(f"{bold}image URI:{unbold}{newline} {deploy_image_uri}")
print(f"{bold}model URI:{unbold}{newline} {fine_tuned_model_uri}")
print("Deploying an endpoint ...")

# Deploy the fine-tuned model.
fine_tuned_predictor = fine_tuned_model.deploy(
initial_instance_count=1,
instance_type=inference_instance_type,
predictor_cls=Predictor,
endpoint_name=fine_tuned_name,
)
print(f"{newline}Deployed an endpoint {fine_tuned_name}")

When this process is complete, both pre-trained and fine-tuned models are deployed behind their own endpoints. Let’s compare their outputs.

Generate output and compare the results

Define some utility functions to query the endpoint and parse the response:

import boto3
import json

# Parameters of (output) text generation. A great introduction to generation
# parameters can be found at https://huggingface.co/blog/how-to-generate
parameters = {
"max_length": 40,  # restrict the length of the generated text
"num_return_sequences": 5,  # we will inspect several model outputs
"num_beams": 10,  # use beam search
}

# Helper functions for running inference queries
def query_endpoint_with_json_payload(payload, endpoint_name):
encoded_json = json.dumps(payload).encode("utf-8")
client = boto3.client("runtime.sagemaker")
response = client.invoke_endpoint(
EndpointName=endpoint_name, ContentType="application/json", Body=encoded_json
)
return response

def parse_response_multiple_texts(query_response):
model_predictions = json.loads(query_response["Body"].read())
generated_text = model_predictions["generated_texts"]
return generated_text

def generate_questions(endpoint_name, text):
expanded_prompt = prompt.replace("{context}", text)
payload = {"text_inputs": expanded_prompt, **parameters}
query_response = query_endpoint_with_json_payload(payload, endpoint_name=endpoint_name)
generated_texts = parse_response_multiple_texts(query_response)
for i, generated_text in enumerate(generated_texts):
print(f"Response {i}: {generated_text}{newline}")

In the next code snippet, we define the prompt and the test data. The describes our target task, which is to generate questions that are related to the provided text but can’t be answered based on it.

The test data consists of three different paragraphs, one on the Australian city of Adelaide from the first two paragraphs of it Wikipedia page, one regarding Amazon Elastic Block Store (Amazon EBS) from the Amazon EBS documentation, and one of Amazon Comprehend from the Amazon Comprehend documentation. We expect the model to identify questions related to these paragraphs but that can’t be answered with the information provided therein.

prompt = "Ask a question which is related to the following text, but cannot be answered based on the text. Text: {context}"

test_paragraphs = [
"""
Adelaide is the capital city of South Australia, the state's largest city and the fifth-most populous city in Australia.
"Adelaide" may refer to either Greater Adelaide (including the Adelaide Hills) or the Adelaide city centre.
The demonym Adelaidean is used to denote the city and the residents of Adelaide. The Traditional Owners of the Adelaide
region are the Kaurna people. The area of the city centre and surrounding parklands is called Tarndanya in the Kaurna language.

Adelaide is situated on the Adelaide Plains north of the Fleurieu Peninsula, between the Gulf St Vincent in the west and
the Mount Lofty Ranges in the east. Its metropolitan area extends 20 km (12 mi) from the coast to the foothills of
the Mount Lofty Ranges, and stretches 96 km (60 mi) from Gawler in the north to Sellicks Beach in the south.
""",
"""
Amazon Elastic Block Store (Amazon EBS) provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices. You can mount these volumes as devices on your instances. EBS volumes that are attached to an instance are exposed as storage volumes that persist independently from the life of the instance. You can create a file system on top of these volumes, or use them in any way you would use a block device (such as a hard drive). You can dynamically change the configuration of a volume attached to an instance.

We recommend Amazon EBS for data that must be quickly accessible and requires long-term persistence. EBS volumes are particularly well-suited for use as the primary storage for file systems, databases, or for any applications that require fine granular updates and access to raw, unformatted, block-level storage. Amazon EBS is well suited to both database-style applications that rely on random reads and writes, and to throughput-intensive applications that perform long, continuous reads and writes.
""",
"""
Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases. 
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition. 
All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input. 
Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend's Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages.
"""
]

You can now test the endpoints using the example articles

print(f"{bold}Prompt:{unbold} {repr(prompt)}")
for paragraph in test_paragraphs:
print("-" * 80)
print(paragraph)
print("-" * 80)
print(f"{bold}pre-trained{unbold}")
generate_questions(pre_trained_name, paragraph)
print(f"{bold}fine-tuned{unbold}")
generate_questions(fine_tuned_name, paragraph)

Test data: Adelaide

We use the following context:

delaide is the capital city of South Australia, the state's largest city and the fifth-most populous city in Australia.
"Adelaide" may refer to either Greater Adelaide (including the Adelaide Hills) or the Adelaide city centre.
The demonym Adelaidean is used to denote the city and the residents of Adelaide. The Traditional Owners of the Adelaide
region are the Kaurna people. The area of the city centre and surrounding parklands is called Tarndanya in the Kaurna language.

Adelaide is situated on the Adelaide Plains north of the Fleurieu Peninsula, between the Gulf St Vincent in the west and
the Mount Lofty Ranges in the east. Its metropolitan area extends 20 km (12 mi) from the coast to the foothills of
the Mount Lofty Ranges, and stretches 96 km (60 mi) from Gawler in the north to Sellicks Beach in the south.

The pre-trained model response is as follows:

Response 0: What is the area of the city centre and surrounding parklands called in the Kaurna language?
Response 1: What is the area of the city centre and surrounding parklands is called Tarndanya in the Kaurna language?
Response 2: What is the area of the city centre and surrounding parklands called in Kaurna?
Response 3: What is the capital city of South Australia?
Response 4: What is the area of the city centre and surrounding parklands known as in the Kaurna language?

The fine-tuned model responses are as follows:

Response 0: What is the second most populous city in Australia?
Response 1: What is the fourth most populous city in Australia?
Response 2: What is the population of Gawler?
Response 3: What is the largest city in Australia?
Response 4: What is the fifth most populous city in the world?

Test data: Amazon EBS

We use the following context:

Amazon Elastic Block Store (Amazon EBS) provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices. You can mount these volumes as devices on your instances. EBS volumes that are attached to an instance are exposed as storage volumes that persist independently from the life of the instance. You can create a file system on top of these volumes, or use them in any way you would use a block device (such as a hard drive). You can dynamically change the configuration of a volume attached to an instance.

We recommend Amazon EBS for data that must be quickly accessible and requires long-term persistence. EBS volumes are particularly well-suited for use as the primary storage for file systems, databases, or for any applications that require fine granular updates and access to raw, unformatted, block-level storage. Amazon EBS is well suited to both database-style applications that rely on random reads and writes, and to throughput-intensive applications that perform long, continuous reads and writes.

The pre-trained model responses are as follows:

esponse 0: What is the difference between Amazon EBS and Amazon Elastic Block Store (Amazon EBS)?
Response 1: What is the difference between Amazon EBS and Amazon Elastic Block Store?
Response 2: What is the difference between Amazon EBS and Amazon Simple Storage Service (Amazon S3)?
Response 3: What is Amazon Elastic Block Store (Amazon EBS)?
Response 4: What is the difference between Amazon EBS and a hard drive?

The fine-tuned model responses are as follows:

Response 0: What type of applications are not well suited to Amazon EBS?
Response 1: What behaves like formatted block devices?
Response 2: What type of applications are not suited to Amazon EBS?
Response 3: What type of applications are not well suited for Amazon EBS?
Response 4: What type of applications are not suited for Amazon EBS?

Test data: Amazon Comprehend

We use the following context:

Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases. 
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition. 
All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input. 
Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend's Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages.

The pre-trained model responses are as follows:

Response 0: What does Amazon Comprehend use to extract insights about the content of documents?
Response 1: How does Amazon Comprehend extract insights about the content of documents?
Response 2: What does Amazon Comprehend use to develop insights about the content of documents?
Response 3: How does Amazon Comprehend develop insights about the content of documents?
Response 4: What does Amazon Comprehend use to extract insights about the content of a document?

The fine-tuned model responses are as follows:

Response 0: What does Amazon Comprehend use to extract insights about the structure of documents?
Response 1: How does Amazon Comprehend recognize sentiments in a document?
Response 2: What does Amazon Comprehend use to extract insights about the content of social networking feeds?
Response 3: What does Amazon Comprehend use to extract insights about the content of documents?
Response 4: What type of files does Amazon Comprehend reject as input?

The difference in output quality between the pre-trained model and the fine-tuned model is stark. The questions provided by the fine-tuned model touch on a wider range of topics. They are systematically meaningful questions, which isn’t always the case for the pre-trained model, as illustrated with the Amazon EBS example.

Although this doesn’t constitute a formal and systematic evaluation, it’s clear that the fine-tuning process has improved the quality of the model’s responses on this task.

Clean up

Lastly, remember to clean up and delete the endpoints:

# Delete resources
pre_trained_predictor.delete_model()
pre_trained_predictor.delete_endpoint()
fine_tuned_predictor.delete_model()
fine_tuned_predictor.delete_endpoint()

Conclusion

In this post, we showed how to use instruction fine-tuning with FLAN T5 models using the Jumpstart UI or a Jupyter notebook running in Studio. We provided code explaining how to retrain the model using data for the target task and deploy the fine-tuned model behind an endpoint. The target task in this post was to identify questions that relate to a chunk of text provided in the input but can’t be answered based on the information provided in that text. We demonstrated that a model fine-tuned for this specific task returns better results than a pre-trained model.

Now that you know how to instruction fine-tune a model with Jumpstart, you can create powerful models customized for your application. Gather some data for your use case, uploaded it to Amazon S3, and use either the Studio UI or the notebook to tune a FLAN T5 model!

References

[1] Chung, Hyung Won, et al. “Scaling instruction-fine tuned language models.” arXiv preprint arXiv:2210.11416 (2022).

[2] Rajpurkar, Pranav, Robin Jia, and Percy Liang. “Know What You Don’t Know: Unanswerable Questions for SQuAD.” Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018.


About the authors

Laurent Callot is a Principal Applied Scientist and manager at AWS AI Labs who has worked on a variety of machine learning problems, from foundational models and generative AI to forecasting, anomaly detection, causality, and AI Ops.

Andrey Kan is a Senior Applied Scientist at AWS AI Labs within interests and experience in different fields of Machine Learning. These include research on foundation models, as well as ML applications for graphs and time series.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana Champaign. He is an active researcher in machine learning and statistical inference and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Baris Kurt is an Applied Scientist at AWS AI Labs. His interests are in time series anomaly detection and foundation models. He loves developing user friendly ML systems.

Jonas Kübler is an Applied Scientist at AWS AI Labs. He is working on foundation models with the goal to facilitate use-case specific applications.

Read More

Introducing an image-to-speech Generative AI application using Amazon SageMaker and Hugging Face

Introducing an image-to-speech Generative AI application using Amazon SageMaker and Hugging Face

Vision loss comes in various forms. For some, it’s from birth, for others, it’s a slow descent over time which comes with many expiration dates: The day you can’t see pictures, recognize yourself, or loved ones faces or even read your mail. In our previous blogpost Enable the Visually Impaired to Hear Documents using Amazon Textract and Amazon Polly, we showed you our Text to Speech application called “Read for Me”. Accessibility has come a long way, but what about images?

At the 2022 AWS re:Invent conference in Las Vegas, we demonstrated “Describe for Me” at the AWS Builders’ Fair, a website which helps the visually impaired understand images through image caption, facial recognition, and text-to-speech, a technology we refer to as “Image to Speech.” Through the use of multiple AI/ML services, “Describe For Me” generates a caption of an input image and will read it back in a clear, natural-sounding voice in a variety of languages and dialects.

In this blog post we walk you through the Solution Architecture behind “Describe For Me”, and the design considerations of our solution.

Solution Overview

The following Reference Architecture shows the workflow of a user taking a picture with a phone and playing an MP3 of the captioning the image.

Reference Architecture for the described solution.

The workflow includes the below steps,

  1. AWS Amplify distributes the DescribeForMe web app consisting of HTML, JavaScript, and CSS to end users’ mobile devices.
  2. The Amazon Cognito Identity pool grants temporary access to the Amazon S3 bucket.
  3. The user uploads an image file to the Amazon S3 bucket using AWS SDK through the web app.
  4. The DescribeForMe web app invokes the backend AI services by sending the Amazon S3 object Key in the payload to Amazon API Gateway
  5. Amazon API Gateway instantiates an AWS Step Functions workflow. The state Machine orchestrates the Artificial Intelligence /Machine Learning (AI/ML) services Amazon Rekognition, Amazon SageMakerAmazon Textract, Amazon Translate, and Amazon Polly  using AWS lambda functions.
  6. The AWS Step Functions workflow creates an audio file as output and stores it in Amazon S3 in MP3 format.
  7. A pre-signed URL with the location of the audio file stored in Amazon S3 is sent back to the user’s browser through Amazon API Gateway. The user’s mobile device plays the audio file using the pre-signed URL.

Solution Walkthrough

In this section, we focus on the design considerations for why we chose

  1. parallel processing within an AWS Step Functions workflow
  2. unified sequence-to-sequence pre-trained machine learning model OFA (One For All) from Hugging Face to Amazon SageMaker for image caption
  3. Amazon Rekognition for facial recognition

For a more detailed overview of why we chose a serverless architecture, synchronous workflow, express step functions workflow, headless architecture and the benefits gained, please read our previous blog post Enable the Visually Impaired to Hear Documents using Amazon Textract and Amazon Polly

Parallel Processing

Using parallel processing within the Step Functions workflow reduced compute time up to 48%. Once the user uploads the image to the S3 bucket, Amazon API Gateway instantiates an AWS Step Functions workflow. Then the below three Lambda functions process the image within the Step Functions workflow in parallel.

  • The first Lambda function called describe_image analyzes the image using the OFA_IMAGE_CAPTION model hosted on a SageMaker real-time endpoint to provide image caption.
  • The second Lambda function called describe_faces first checks if there are faces using Amazon Rekognition’s Detect Faces API, and if true, it calls the Compare Faces API. The reason for this is Compare Faces will throw an error if there are no faces found in the image. Also, calling Detect Faces first is faster than simply running Compare Faces and handling errors, so for images without faces in them, processing time will be faster.
  • The third Lambda function called extract_text handles text-to-speech utilizing Amazon Textract, and Amazon Comprehend.

Executing the Lambda functions in succession is suitable, but the faster, more efficient way of doing this is through parallel processing. The following table shows the compute time saved for three sample images.

Image People Sequential Time Parallel Time Time Savings (%) Caption
0 1869ms 1702ms 8% A tabby cat curled up in a fluffy white bed.
1 4277ms 2197ms 48% A woman in a green blouse and black cardigan smiles at the camera. I recognize one person: Kanbo.
4 6603ms 3904ms 40% People standing in front of the Amazon Spheres. I recognize 3 people: Kanbo, Jack, and Ayman.

Image Caption

Hugging Face is an open-source community and data science platform that allows users to share, build, train, and deploy machine learning models. After exploring models available in the Hugging Face model hub, we chose to use the OFA model because as described by the authors, it is “a task-agnostic and modality-agnostic framework that supports Task Comprehensiveness”.

OFA is a step towards “One For All”, as it is a unified multimodal pre-trained model that can transfer to a number of downstream tasks effectively. While the OFA model supports many tasks including visual grounding, language understanding, and image generation, we used the OFA model for image captioning in the Describe For Me project to perform the image to text portion of the application. Check out the official repository of OFA (ICML 2022), paper to learn about OFA’s Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework.

To integrate OFA in our application we cloned the repo from Hugging Face and containerized the model to deploy it to a SageMaker endpoint. The notebook in this repo is an excellent guide to deploy the OFA large model in a Jupyter notebook in SageMaker. After containerizing your inference script, the model is ready to be deployed behind a SageMaker endpoint as described in the SageMaker documentation. Once the model is deployed, create an HTTPS endpoint which can be integrated with the “describe_image” lambda function that analyzes the image to create the image caption. We deployed the OFA tiny model because it is a smaller model and can be deployed in a shorter period of time while achieving similar performance.

Examples of image to speech content generated by “Describe For Me“ are shown below:

The aurora borealis, or northern lights, fill the night sky above a silhouette of a house..

The aurora borealis, or northern lights, fill the night sky above a silhouette of a house..

A dog sleeps on a red blanket on a hardwood floor, next to an open suitcase filled with toys..

A dog sleeps on a red blanket on a hardwood floor, next to an open suitcase filled with toys..

A tabby cat curled up in a fluffy white bed.

A tabby cat curled up in a fluffy white bed.

Facial recognition

Amazon Rekognition Image provides the DetectFaces operation that looks for key facial features such as eyes, nose, and mouth to detect faces in an input image. In our solution we leverage this functionality to detect any people in the input image. If a person is detected, we then use the CompareFaces operation to compare the face in the input image with the faces that “Describe For Me“ has been trained with and describe the person by name. We chose to use Rekognition for facial detection because of the high accuracy and how simple it was to integrate into our application with the out of the box capabilities.

A group of people posing for a picture in a room. I recognize 4 people: Jack, Kanbo, Alak, and Trac. There was text found in the image as well. It reads: AWS re: Invent

A group of people posing for a picture in a room. I recognize 4 people: Jack, Kanbo, Alak, and Trac. There was text found in the image as well. It reads: AWS re: Invent

Potential Use Cases

Alternate Text Generation for web images

All images on a web site are required to have an alternative text so that screen readers can speak them to the visually impaired. It’s also good for search engine optimization (SEO). Creating alt captions can be time consuming as a copywriter is tasked with providing them within a design document. The Describe For Me API could automatically generate alt-text for images. It could also be utilized as a browser plugin to automatically add image caption to images missing alt text on any website.

Audio Description for Video

Audio Description provides a narration track for video content to help the visually impaired follow along with movies. As image caption becomes more robust and accurate, a workflow involving the creation of an audio track based upon descriptions for key parts of a scene could be possible. Amazon Rekognition can already detect scene changes, logos, and credit sequences, and celebrity detection. A future version of describe would allow for automating this key feature for films and videos.

Conclusion

In this post, we discussed how to use AWS services, including AI and serverless services, to aid the visually impaired to see images. You can learn more about the Describe For Me project and use it by visiting describeforme.com. Learn more about the unique features of Amazon SageMakerAmazon Rekognition and the AWS partnership with Hugging Face.

Third Party ML Model Disclaimer for Guidance

This guidance is for informational purposes only. You should still perform your own independent assessment, and take measures to ensure that you comply with your own specific quality control practices and standards, and the local rules, laws, regulations, licenses and terms of use that apply to you, your content, and the third-party Machine Learning model referenced in this guidance. AWS has no control or authority over the third-party Machine Learning model referenced in this guidance, and does not make any representations or warranties that the third-party Machine Learning model is secure, virus-free, operational, or compatible with your production environment and standards. AWS does not make any representations, warranties or guarantees that any information in this guidance will result in a particular outcome or result.


About the Authors

Jack MarchettiJack Marchetti is a Senior Solutions architect at AWS focused on helping customers modernize and implement serverless, event-driven architectures. Jack is legally blind and resides in Chicago with his wife Erin and cat Minou. He also is a screenwriter, and director with a primary focus on Christmas movies and horror. View Jack’s filmography at his IMDb page.

Alak EswaradassAlak Eswaradass is a Senior Solutions Architect at AWS based in Chicago, Illinois. She is passionate about helping customers design cloud architectures utilizing AWS services to solve business challenges. Alak is enthusiastic about using SageMaker to solve a variety of ML use cases for AWS customers. When she’s not working, Alak enjoys spending time with her daughters and exploring the outdoors with her dogs.

Kandyce BohannonKandyce Bohannon is a Senior Solutions Architect based out of Minneapolis, MN. In this role, Kandyce works as a technical advisor to AWS customers as they modernize technology strategies especially related to data and DevOps to implement best practices in AWS. Additionally, Kandyce is passionate about mentoring future generations of technologists and showcasing women in technology through the AWS She Builds Tech Skills program.

Trac DoTrac Do is a Solutions Architect at AWS. In his role, Trac works with enterprise customers to support their cloud migrations and application modernization initiatives. He is passionate about learning customers’ challenges and solving them with robust and scalable solutions using AWS services. Trac currently lives in Chicago with his wife and 3 boys. He is a big aviation enthusiast and in the process of completing his Private Pilot License.

Read More