Text summarization with Amazon SageMaker and Hugging Face

In this post, we show you how to implement one of the most downloaded Hugging Face pre-trained models used for text summarization, DistilBART-CNN-12-6, within a Jupyter notebook using Amazon SageMaker and the SageMaker Hugging Face Inference Toolkit. Based on the steps shown in this post, you can try summarizing text from the WikiText-2 dataset managed by fast.ai, available at the Registry of Open Data on AWS.

Global data volumes are growing at zettabyte scale as companies and consumers expand their use of digital products and online services. To better understand this growing data, machine learning (ML) natural language processing (NLP) techniques for text analysis have evolved to address use cases involving text summarization, entity recognition, classification, translation, and more. AWS offers pre-trained AWS AI services that can be integrated into applications using API calls and require no ML experience. For example, Amazon Comprehend can perform NLP tasks such as custom entity recognition, sentiment analysis, key phrase extraction, topic modeling, and more to gather insights from text. It can perform text analysis on a wide variety of languages for its various features.

Text summarization is a helpful technique in understanding large amounts of text data because it creates a subset of contextually meaningful information from source documents. You can apply this NLP technique to longer-form text documents and articles, enabling quicker consumption and more effective document indexing, for example to summarize call notes from meetings.

Hugging Face is a popular open-source library for NLP, with over 49,000 pre-trained models in more than 185 languages with support for different frameworks. AWS and Hugging Face have a partnership that allows a seamless integration through SageMaker with a set of AWS Deep Learning Containers (DLCs) for training and inference in PyTorch or TensorFlow, and Hugging Face estimators and predictors for the SageMaker Python SDK. These capabilities in SageMaker help developers and data scientists get started with NLP on AWS more easily. Processing texts with transformers in deep learning frameworks such as PyTorch is typically a complex and time-consuming task for data scientists, often leading to frustration and lack of efficiency when developing NLP projects. The rise of AI communities like Hugging Face, combined with the power of ML services in the cloud like SageMaker, accelerate and simplify the development of these text processing tasks. SageMaker helps you build, train, deploy, and operationalize Hugging Face models.

Text summarization overview

You can apply text summarization to identify key sentences within a document or identify key sentences across multiple documents. Text summarization can produce two types of summaries: extractive and abstractive. Extractive summaries don’t contain any machine-generated text and are a collection of important sentences selected from the input document. Abstractive summaries contain new human-readable phrases and sentences generated by the text summarization model. Most text summarization systems are based on extractive summarization because accurate abstractive text summarization is difficult to achieve.

Hugging Face has over 400 pre-trained state-of-the-art text summarization models available, implementing different combinations of NLP techniques. These models are trained on different datasets, uploaded and maintained by technology companies and members of the Hugging Face community. You can filter the models by most downloaded or most liked, and directly load them when using the summarization pipeline Hugging Face transformer API. The Hugging Face transformer simplifies the NLP implementation process so that high-performance NLP models can be fine-tuned to deliver text summaries, without requiring extensive ML operation knowledge.

Hugging Face text summarization models on AWS

SageMaker offers business analysts, data scientists, and MLOps engineers a choice of tools to design and operate ML workloads on AWS. These tools provide you with faster implementation and testing of ML models to achieve your optimal outcomes.

From the SageMaker Hugging Face Inference Toolkit, an open-source library, we outline three different ways to implement and host Hugging Face text summarization models using a Jupyter notebook:

  • Hugging Face summarization pipeline – Create a Hugging Face summarization pipeline using the “summarization” task identifier to use a default text summarization model for inference within your Jupyter notebook. These pipelines abstract away the complex code, offering novice ML practitioners a simple API to quickly implement text summarization without configuring an inference endpoint. The pipeline also allows the ML practitioner to select a specific pre-trained model and its associated tokenizer. Tokenizers prepare text to be ready as an input for the model by splitting text into words or subwords, which then are converted to IDs through a lookup table. For simplicity, the following code snippet provides for the default case when using pipelines. The DistilBART-CNN-12-6 model is one of the most downloaded summarization models on Hugging Face and is the default model for the summarization pipeline. The last line calls the pre-trained model to get a summary for the passed text given the provided two arguments.

    from transformers import pipeline
    
    summarizer = pipeline("summarization")
    summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20)

  • SageMaker endpoint with pre-trained model – Create a SageMaker endpoint with a pre-trained model from the Hugging Face Model Hub and deploy it on an inference endpoint, such as the ml.m5.xlarge instance in the following code snippet. This method allows experienced ML practitioners to quickly select specific open-source models, fine-tune them, and deploy the models onto high-performing inference instances.

    from sagemaker.huggingface import HuggingFaceModel
    from sagemaker import get_execution_role
    
    role = get_execution_role()
    
    # Hub Model configuration. https://huggingface.co/models
    hub = {
      'HF_MODEL_ID':'sshleifer/distilbart-cnn-12-6',
      'HF_TASK':'summarization'
    }
    
    # create Hugging Face Model Class
    huggingface_model = HuggingFaceModel(
        transformers_version='4.17.0',
        pytorch_version='1.10.2',
        py_version='py38',
        env=hub,
        role=role,
    )
    
    # deploy model to SageMaker Inference
    predictor = huggingface_model.deploy(initial_instance_count=1,instance_type="ml.m5.xlarge")

  • SageMaker endpoint with a trained model – Create a SageMaker model endpoint with a trained model stored in an Amazon Simple Storage Service (Amazon S3) bucket and deploy it on an inference endpoint. This method allows experienced ML practitioners to quickly deploy their own models stored on Amazon S3 onto high-performing inference instances. The model itself is downloaded from Hugging Face and compressed, and then can be uploaded to Amazon S3. This step is demonstrated in the following code snippet:

    from sagemaker.huggingface import HuggingFaceModel
    from sagemaker import get_execution_role
    
    role = get_execution_role()
    
    # create Hugging Face Model Class
    huggingface_model = HuggingFaceModel(
        transformers_version='4.17.0',
        pytorch_version='1.0.2',
        py_version='py38',
        model_data='s3://my-trained-model/artifacts/model.tar.gz',
        role=role,
    )
    
    # deploy model to SageMaker Inference
    predictor = huggingface_model.deploy(initial_instance_count=1,instance_type="ml.m5.xlarge")

AWS has several resources available to assist you in deploying your ML workloads. The Machine Learning Lens of the AWS Well Architected Framework recommends ML workloads best practices, including optimizing resources and reducing cost. These recommended design principles ensure that well architected ML workloads on AWS are deployed to production. Amazon SageMaker Inference Recommender helps you select the right instance to deploy your ML models at optimal inference performance and cost. Inference Recommender speeds up model deployment and reduces time to market by automating load testing and optimizing model performance across ML instances.

In the next sections, we demonstrate how to load a trained model from an S3 bucket and deploy it to a suitable inference instance.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Load the Hugging Face model to SageMaker for text summarization inference

Use the following code to download the Hugging Face pre-trained text summarization model DistilBART-CNN-12-6 and its tokenizer, and save them locally in SageMaker to your Jupyter notebook directory:

from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig

PRE_TRAINED_MODEL_NAME='sshleifer/distilbart-cnn-12-6'

model = BartForConditionalGeneration.from_pretrained(PRE_TRAINED_MODEL_NAME, cache_dir=hf_cache_dir)
model.save_pretrained('./models/bart_model/')

tokenizer = BartTokenizer.from_pretrained(PRE_TRAINED_MODEL_NAME)
tokenizer.save_pretrained('./models/bart_tokenizer/')

Compress the saved text summarization model and its tokenizer into tar.gz format and upload the compressed model artifact to an S3 bucket:

! tar -C models/ -czf model.tar.gz code/ bart_tokenizer/ bart_model/
from sagemaker.s3 import S3Uploader

file_key = 'model.tar.gz'
model_artifact = S3Uploader.upload(file_key,'s3://my-trained-model/artifacts')

Select an inference Docker container image to perform the text summarization inference. Define the Linux OS, PyTorch framework, and Hugging Face Transformer version and specify the Amazon Elastic Compute Cloud (Amazon EC2) instance type to run the container.

The Docker image is available in the Amazon Elastic Container Registry (Amazon ECR) of the same AWS account, and the link for that container image is returned as a URI.

from sagemaker.image_uris import retrieve

deploy_instance_type = 'ml.m5.xlarge'

pytorch_inference_image_uri = retrieve('huggingface',
                                       region=region,
                                       version='4.6.1',
                                       instance_type=deploy_instance_type,
                                       base_framework_version='pytorch1.8.1',
                                       image_scope='inference')

Define the text summarization model to be deployed by the selected container image performing inference. In the following code snippet, the compressed model uploaded to Amazon S3 is deployed:

from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker import get_execution_role

role = get_execution_role()

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data="s3://my-trained-model/artifacts/model.tar.gz", # path to your trained sagemaker model
   image_uri=pytorch_inference_image_uri,
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.6.1", # transformers version used
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1, 
   instance_type="ml.m5.xlarge"
)

Test the deployed text summarization model on a sample input:

# example request, you need to define "inputs"
data = {
   "text": "Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days."
}

# request
predictor.predict(data)

Use Inference Recommender to evaluate the optimal EC2 instance for the inference task

Next, create multiple payload samples of input text in JSON format and compress them into a single payload file. These payload samples are used by the Inference Recommender to compare inference performance between different EC2 instance types. Each of the sample payloads must match the JSON format shown earlier. You can get examples from the WikiText-2 dataset managed by fast.ai, available at the Registry of Open Data on AWS.

Upload the compressed text summarization model artifact and the compressed sample payload file to the S3 bucket. We uploaded the model in an earlier step, but for clarity we include the code to upload it again:

bucket = sagemaker.Session().default_bucket()

prefix = "sagemaker/inference-recommender"

model_archive_name = "model.tar.gz"
payload_archive_name = "payload.tar.gz"

sample_payload_url = sagemaker.Session().upload_data(
    payload_archive_name, bucket=bucket, key_prefix=prefix + "/inference"
)
model_url = sagemaker.Session().upload_data(
    model_archive_name, bucket=bucket, key_prefix=prefix + "/model"
)

Review the list of standard ML models available on SageMaker across common model zoos, such as NLP and computer vision. Select an NLP model to perform the text summarization inference:

import boto3
import pandas as pd

inference_client = boto3.client("sagemaker", region)

list_model_metadata_response = inference_client.list_model_metadata()

domains = []
frameworks = []
framework_versions = []
tasks = []
models = []

for model_summary in list_model_metadata_response["ModelMetadataSummaries"]:
    domains.append(model_summary["Domain"])
    tasks.append(model_summary["Task"])
    models.append(model_summary["Model"])
    frameworks.append(model_summary["Framework"])
    framework_versions.append(model_summary["FrameworkVersion"])

data = {
    "Domain": domains,
    "Task": tasks,
    "Framework": frameworks,
    "FrameworkVersion": framework_versions,
    "Model": models,
}

df = pd.DataFrame(data)

pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.width", 1000)
pd.set_option("display.colheader_justify", "center")
pd.set_option("display.precision", 3)

display(df.sort_values(by=["Domain", "Task", "Framework", "FrameworkVersion"]))

The following example uses the bert-base-cased NLP model. Register the text summarization model into the SageMaker model registry with the correctly identified domain, framework, and task from the previous step. The parameters for this example are shown at the beginning of the following code snippet.

Note the range of EC2 instance types to be evaluated by Inference Recommender under SupportedRealtimeInferenceInstanceTypes in the following code. Make sure that the service limits for the AWS account allow the deployment of these types of inference nodes.

ml_domain = "NATURAL_LANGUAGE_PROCESSING"
ml_task = "FILL_MASK"
model_name = "bert-base-cased"
dlc_uri = pytorch_inference_image_uri
framework = 'PYTORCH'
framework_version='1.6.0'

inference_client = boto3.client("sagemaker", region)

model_package_group_name = uuid.uuid1()

model_pacakge_group_response = inference_client.create_model_package_group(
    ModelPackageGroupName=str(model_package_group_name), ModelPackageGroupDescription="description"
)

model_package_version_response = inference_client.create_model_package(
    ModelPackageGroupName=str(model_package_group_name),
    ModelPackageDescription="InferenceRecommenderDemo",
    Domain=ml_domain,
    Task=ml_task,
    SamplePayloadUrl=sample_payload_url,
    InferenceSpecification={
        "Containers": [
            {
                "ContainerHostname": "huggingface-pytorch",
                "Image": dlc_uri,
                "ModelDataUrl": model_url,
                "Framework": framework,
                "FrameworkVersion": framework_version,
                "NearestModelName": model_name,
                "Environment": {
                    "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
                    "SAGEMAKER_PROGRAM": "inference.py",
                    "SAGEMAKER_REGION": region,
                    "SAGEMAKER_SUBMIT_DIRECTORY": model_url,
                },
            },
        ],
        "SupportedRealtimeInferenceInstanceTypes": [
            "ml.t2.xlarge",
            "ml.c5.xlarge",
            "ml.m5.xlarge",
            "ml.m5d.xlarge",
            "ml.r5.xlarge",
            "ml.inf1.xlarge",
        ],
        "SupportedContentTypes": [
            "application/json",
        ],
        "SupportedResponseMIMETypes": ["application/json"],
    },
)

Create an Inference Recommender default job using the ModelPackageVersion resulting from the previous step. The uuid Python library is used to generate a unique name for the job.

from sagemaker import get_execution_role

client = boto3.client("sagemaker", region)

role = get_execution_role()
default_job = uuid.uuid1()
default_response = client.create_inference_recommendations_job(
    JobName=str(default_job),
    JobDescription="Job Description",
    JobType="Default",
    RoleArn=role,
    InputConfig={"ModelPackageVersionArn": model_package_version_response["ModelPackageArn"]},
)

You can get the status of the Inference Recommender job by running the following code:

inference_recommender_job = client.describe_inference_recommendations_job(
        JobName=str(default_job)
)

When the job status is COMPLETED, compare the inference latency, runtime, and other metrics of the EC2 instance types evaluated by the Inference Recommender default job. Select the suitable node type based on your use case requirements.

data = [
    {**x["EndpointConfiguration"], **x["ModelConfiguration"], **x["Metrics"]}
    for x in inference_recommender_job["InferenceRecommendations"]
]
df = pd.DataFrame(data)
df.drop("VariantName", inplace=True, axis=1)
pd.set_option("max_colwidth", 400)
df.head()

Conclusion

SageMaker offers multiple ways to use Hugging Face models; for more examples, check out the AWS Samples GitHub. Depending on the complexity of the use case and the need to fine-tune the model, you can select the optimal way to use these models. The Hugging Face pipelines can be a good starting point to quickly experiment and select suitable models. When you need to customize and parameterize the selected models, you can download the models and deploy them to customized inference endpoints. To fine-tune the model more for a specific use case, you’ll need to train the model after downloading it.

NLP models in general, including text summarization models, perform better after being trained on a dataset that is specific for the use case. The MLOPs and model monitoring features of SageMaker make sure that the deployed model continues to perform within expectations. In this post, we used Inference Recommender to evaluate the best suited instance type to deploy the text summarization model. These recommendations can optimize performance and cost for your ML use case.


About the Authors

Dr. Nidal AlBeiruti is a Senior Solutions Architect at Amazon Web Services, with a passion for machine learning solutions. Nidal has over 25 years of experience working in a variety of global IT roles at different levels and verticals. Nidal acts as a trusted advisor for many AWS customers to support and accelerate their cloud adoption journey.

Darren Ko is a Solutions Architect based in London. He advises UK and Ireland SMB customers on rearchitecting and innovating on the cloud. Darren is interested in applications built with serverless architectures and he is passionate about solving sustainability challenges with machine learning.

Read More

Take your intelligent search experience to the next level with Amazon Kendra hierarchical facets

Unstructured data continues to grow in many organizations, making it a challenge for users to get the information they need. Amazon Kendra is a highly accurate, intelligent search service powered by machine learning (ML). Amazon Kendra uses deep learning and reading comprehension to deliver precise answers, and returns a list of ranked documents that match the search query for you to choose from. To help users interactively narrow down the list of relevant documents, you can assign metadata at the time of document ingestion to provide filtering and faceting capabilities.

In a search solution with a growing number of documents, simple faceting or filtering isn’t always sufficient to enable users to really pinpoint documents with the information they’re looking for. Amazon Kendra now features hierarchical facets, with a more granular view of the scope of the search results. Hierarchical facets offer filtering options with more details about the number of results expected for each option, and allows users to further narrow their search, pinpointing their documents of interest quickly.

In this post, we demonstrate what hierarchical facets in Amazon Kendra can do. We first ingest a set of documents, along with their metadata, into an Amazon Kendra index. We then make search queries using both simple and hierarchical facets, and add filtering to get straight to the documents of interest.

Solution overview

Instead of presenting each facet individually as a list, hierarchical facets enable defining a parent-child relationship between facets to shape the scope of the search results. With this, you see the number of results that not only have a particular facet, but also have each of the sub-facets. Let’s take the example of a repository of AWS documents of types User_Guides, Reference_Guides and Release_Notes, regarding compute, storage, and database technologies.

First let’s look at non-hierarchical facets from the response to a search query:

Technology
  Databases:23
  Storage:22
  Compute:15
Document_Type
  User_Guides:37
  Reference_Guides:18
  Release_Notes:5

Here we know the number of search results in each of the technologies, as well as each of the document types. However, we don’t know, for example, how many results to expect from User_Guides related to Storage, except that it’s going to be less than 22, as the smaller of the number of results from User_Guides:37 and from Storage:22.

Now let’s look at hierarchical facets from the response to the same search query:

Technology
  Databases:23
    Document_Type
      User_Guides:12
      Reference_Guides:7
      Release_Notes:4
  Storage:22
    Document_Type
      User_Guides:16
      Reference_Guides:6
  Compute:15
    Document_Type
      User_Guides:9
      Reference_Guides:5
      Release_Notes:1

With hierarchical facets, we get more information in terms of the number results from each document type about each technology. With this additional information, we know that there are 16 results from User_Guides related to Storage.

In the subsequent sections, we use this example to demonstrate the use of hierarchical facets to narrow down search results along with step-by-step instructions you can follow to try this out in your own AWS account. If you just want to read about this feature without running it yourself, you can refer to the Python script facet-search-query.py used in this post, and its output output.txt, and then jump to the section Search and filtering using facets without hierarchy.

Prerequisites

To deploy and experiment with the solution in this post, make sure that you have the following:

Set up the infrastructure and run the Python script to query the Amazon Kendra index

To set up the solution, complete the following steps:

  1. Use the AWS Management Console for Amazon S3 to create an S3 bucket to use as a data source to store the sample documents.
  2. On the AWS Management Console, start CloudShell by choosing the shell icon on the navigation bar.
    Alternatively, you can run the Python script from any computer that has the AWS SDK for Python (Boto3) installed and an AWS account with access to the Amazon Kendra index. Make sure to update Boto3 on your computer. For simplicity, the step-by-step instructions in this post focus on CloudShell.
  3. After CloudShell starts, download facet-search-query.py to your local machine.
  4. Upload the script to your CloudShell by switching to the CloudShell tab, choosing the Actions menu, and choosing Upload file.
  5. Download hierarchical-facets-data.zip to your local machine, unzip it, and upload the entire directory structure to your S3 bucket.
  6. If you’re not using an existing Amazon Kendra index, create a new Amazon Kendra index.
  7. On the Amazon Kendra console, open your index.
  8. In the navigation pane, choose Facet definition.
  9. Choose Add field.
  10. Configure the field Document_Type and choose Add.
  11. Configure the field Technology and choose Add.
  12. Configure your S3 bucket as a data source to the Amazon Kendra index you just created.
  13. Sync the data source and wait for the sync to complete.
  14. Switch to the CloudShell tab.
  15. Update Boto3 by running pip3 install boto3=1.23.1 --upgrade.
    This ensures that CloudShell has a version of Boto3 that supports hierarchical facets.
  16. Edit facet-search-query.py and replace REPLACE-WITH-YOUR-AMAZON-KENDRA-INDEX-ID with your Amazon Kendra index ID.
    You can get the index ID by opening your index details on the Amazon Kendra console.
  17. In the CloudShell prompt, run facet-search-query.py using the command python3 facet-search-query.py | tee output.txt.

If this step is canceled with the error Unknown parameter in Facets[0]: “Facets”, must be one of: DocumentAttributeKey,
choose the Actions menu, and choose Delete AWS CloudShell home directory. Repeat the steps to download facet-search-query.py, update Boto3, edit facet-search-query.py, and run it again. If you have any other data in the CloudShell home directory, you should back it up before running this step.

For convenience, all the steps are included in one Python script. You can read facet-search-query.py and experiment by copying parts of this script and making your own scripts. Edit output.txt to observe the search results.

Search and filtering with facets without hierarchy

Let’s start by querying with facets having no hierarchy. In this case, the facets parameter used in the query only provides the information that the results in the response should be faceted using two attributes: Technology and Document_Type. See the following code:

fac0 = [
    { "DocumentAttributeKey":"Technology" },
    { "DocumentAttributeKey":"Document_Type" }
]

This is used as a parameter to the query API call:

kclient.query(IndexId=indexid, QueryText=kquery, Facets=fac0)

The formatted version of the response is as follows:

Query:  How to encrypt data?
Number of results: 62
Document Title:  developerguide
Document Attributes:
  Document_Type: User_Guides
  Technology: Databases
Document Excerpt:
  4. Choose the option that you want for encryption at rest. Whichever
  option you choose, you can't   change it after the cluster is
  created. • To encrypt data at rest in this cluster, choose Enable
  encryption. • If you don't want to encrypt data at rest in this
  cluster, choose Disable encryption.
----------------------------------------------------------------------
Facets:
  Technology
    Databases:23
    Storage:22
    Compute:16
  Document_Type
    User_Guides:37
    Reference_Guides:19
    Release_Notes:5
======================================================================

The first result from the response is from a User_Guide about Databases. The facets below the result show the number of results for Technology and Document_Type present in the response.

Let’s narrow down these results to be only from User_Guides and Storage by setting the filter as follows:

att_filter0 = {
    "AndAllFilters": [
        {
            "EqualsTo":{
                "Key": "Technology",
                "Value": {
                    "StringValue": "Storage"
                }
            }
        },
        {
            "EqualsTo":{
                "Key": "Document_Type",
                "Value": {
                    "StringValue": "User_Guides"
                }
            }
        }
    ]
}

Now let’s make a query call using the facets without hierarchy and the preceding filter:

kclient.query(IndexId=indexid, QueryText=kquery, Facets=fac0, AttributeFilter=att_filter0)

A formatted version of the response is as follows:

Query:  How to encrypt data?
Query Filter: Technology: Storage AND Document_Type: User_Guides
Number of results: 18
Document Title:  efs-ug
Document Attributes:
  Document_Type: User_Guides
  Technology: Storage
Document Excerpt:
  ,             "Action": [                 "kms:Describe*",
  "kms:Get*",                 "kms:List*",
  "kms:RevokeGrant"             ],             "Resource": "*"
  }     ] }   Encrypting data in transit You can encrypt data in
  transit using an Amazon EFS file sys
----------------------------------------------------------------------
Facets:
  Technology
    Storage:16
  Document_Type
    User_Guides:16

The response contains 16 results from User_Guides on Storage. Based on the non-hierarchical facets in the response without filters, we only knew to expect fewer than 22 results.

Search and filtering with hierarchical facets with Document_Type as a sub-facet of Technology

Now let’s run a query using hierarchical facets, with the relationship of Document_Type being a sub-facet of Technology. This hierarchical relationship is important for a Technology-focused user such as an engineer. Note the nested facets in the following definition. The MaxResults parameter is used to display only top MaxResults facets. For our example, there are only three facets for Technology and Document_Type, therefore this parameter isn’t particularly useful. When the number of facets is high, it makes sense to use this parameter.

fac1 = [{
    "DocumentAttributeKey":"Technology",
    "Facets":[{
        "DocumentAttributeKey":"Document_Type",
        "MaxResults": max_results
    }],
}]

The query API call is made as follows:

kclient.query(IndexId=indexid, QueryText=kquery, Facets=fac1)

The formatted version of the response is as follows:

Document Attributes:
  Document_Type: User_Guides
  Technology: Databases
Document Excerpt:
  4. Choose the option that you want for encryption at rest. Whichever
  option you choose, you can't   change it after the cluster is
  created. • To encrypt data at rest in this cluster, choose Enable
  encryption. • If you don't want to encrypt data at rest in this
  cluster, choose Disable encryption.
----------------------------------------------------------------------
Facets:
  Technology
    Databases:23
      Document_Type
        User_Guides:12
        Reference_Guides:7
        Release_Notes:4
    Storage:22
      Document_Type
        User_Guides:16
        Reference_Guides:6
    Compute:16
      Document_Type
        User_Guides:9
        Reference_Guides:6
        Release_Notes:1
======================================================================

The results are classified as per the Technology facet followed by Document_Type. In this case, looking at the facets, we know that 16 results are from User_Guides about Storage and 7 are from Reference_Guides related to Databases.

Let’s narrow down these results to be only from Reference_Guides related to Databases using the following filter:

att_filter1 = {
    "AndAllFilters": [
        {
            "EqualsTo":{
                "Key": "Technology",
                "Value": {
                    "StringValue": "Databases"
                }
            }
        },
        {
            "EqualsTo":{
                "Key": "Document_Type",
                "Value": {
                    "StringValue": "Reference_Guides"
                }
            }
        }
    ]
}

Now let’s make a query API call using the hierarchical facets with this filter:

kclient.query(IndexId=indexid, QueryText=kquery, Facets=fac1, AttributeFilter=att_filter1)

The formatted response to this is as follows:

Query:  How to encrypt data?
Query Filter: Technology: Databases AND Document_Type: Reference_Guides
Number of results: 7
Document Title:  redshift-api
Document Attributes:
  Document_Type: Reference_Guides
  Technology: Databases
Document Excerpt:
  ...Constraints: Maximum length of 2147483647.   Required: No
  KmsKeyId   The AWS Key Management Service (KMS) key ID of the
  encryption key that you want to use to encrypt data in the cluster.
  Type: String   Length Constraints: Maximum length of 2147483647.
  Required: No LoadSampleData   A flag...
----------------------------------------------------------------------
Facets:
  Technology
    Databases:7
      Document_Type
        Reference_Guides:7
======================================================================

From the facets of this response, there are seven results, all from Reference_Guides related to Databases, exactly as we knew before making the query.

Search and filtering with hierarchical facets with Technology as a sub-facet of Document_Type

You can choose the hierarchical relationship between different facets at the time of querying. Let’s define Technology as the sub-facet of Document_Type, as shown in the following code. This hierarchical relationship would be important for a Document_Type-focused user such as a technical writer.

fac2 = [{
    "DocumentAttributeKey":"Document_Type",
    "Facets":[{
        "DocumentAttributeKey":"Technology",
        "MaxResults": max_results
    }]
}]

The query API call is made as follows:

kclient.query(IndexId=indexid, QueryText=kquery, Facets=fac2)

The formatted response to this is as follows:

Query:  How to encrypt data?
Number of results: 62
Document Title:  developerguide
Document Attributes:
  Document_Type: User_Guides
  Technology: Databases
Document Excerpt:
  4. Choose the option that you want for encryption at rest. Whichever
  option you choose, you can't   change it after the cluster is
  created. • To encrypt data at rest in this cluster, choose Enable
  encryption. • If you don't want to encrypt data at rest in this
  cluster, choose Disable encryption.
----------------------------------------------------------------------
Facets:
  Document_Type
    User_Guides:37
      Technology
        Storage:16
        Databases:12
        Compute:9
    Reference_Guides:19
      Technology
        Databases:7
        Compute:6
        Storage:6
    Release_Notes:5
      Technology
        Databases:4
        Compute:1
======================================================================

The results are classified as per their Document_Type followed by Technology. In other words, reversing the hierarchical relationship results in transposing the matrix of scope of results as shown by the preceding facets. Six results are from Reference_Guides related to Compute. Let’s define the filter as follows:

att_filter2 = {
    "AndAllFilters": [
        {
            "EqualsTo":{
                "Key": "Document_Type",
                "Value": {
                    "StringValue": "Reference_Guides"
                }
            }
        },
        {
            "EqualsTo":{
                "Key": "Technology",
                "Value": {
                    "StringValue": "Compute"
                }
            }
        }
    ]
}

We use this filter to make the query API call:

kclient.query(IndexId=indexid, QueryText=kquery, Facets=fac2, AttributeFilter=att_filter2)

The formatted response to this is as follows:

Query:  How to encrypt data?
Query Filter: Document_Type: Reference_Guides AND Technology:Compute
Number of results: 7
Document Title:  ecr-api
Document Attributes:
  Document_Type: Reference_Guides
  Technology: Compute
Document Excerpt:
  When you use AWS KMS to encrypt your data, you can either use the
  default AWS managed AWS KMS key for Amazon ECR, or specify your own
  AWS KMS key, which you already created. For more information, see
  Protecting data using server-side encryption with an AWS KMS key
  stored in AWS Key Management Service
----------------------------------------------------------------------
Facets:
  Document_Type
    Reference_Guides:6
      Technology
        Compute:6
======================================================================

The results contain six Reference_Guides related to Compute, exactly as we knew before running the query.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Amazon Kendra connector for Amazon S3, delete that data source. If you created an Amazon S3 bucket to store the data used, delete that as well.

Conclusion

You can use Amazon Kendra hierarchical facets to define a hierarchical relationship between attributes to provide granular information about the scope of the results in the response to a query. This enables you to make an informed filtering choice to narrow down the search results and find the documents you’re looking for quickly.

To learn more about facets and filters in Amazon Kendra, refer to the Filtering queries.

For more information on how you can automatically create, modify, or delete metadata, which you can use for faceting the search results, refer to Customizing document metadata during the ingestion process and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.


About the Authors

Abhinav JawadekarAbhinav Jawadekar is a Principal Solutions Architect focused on Amazon Kendra in the AI/ML language services team at AWS. Abhinav works with AWS customers and partners to help them build intelligent search solutions on AWS.

Ji Kim is a Software Development Engineer at Amazon Web Services and is a member of the Amazon Kendra team.

Read More

Easily customize your notifications while using Amazon Lookout for Metrics

We are excited to announce that you can now add filters to alerts and also edit existing alerts while using Amazon Lookout for Metrics. With this launch, you can add filters to your alerts configuration to only get notifications for anomalies that matter the most to you. You can also modify existing alerts as per your needs for notification as anomalies evolve.

Lookout for Metrics uses machine learning (ML) to automatically monitor the metrics that are most important to businesses with greater speed and accuracy. The service also makes it easier to diagnose the root cause of anomalies like unexpected dips in revenue, high rates of abandoned shopping carts, spikes in payment transaction failures, increases in new user signups, and many more. Lookout for Metrics goes beyond simple anomaly detection. It allows developers to set up autonomous monitoring for important metrics to detect anomalies and identify their root cause in a matter of few clicks, using the same technology used by Amazon internally to detect anomalies in its metrics—all with no ML experience required.

Alert is an optional feature that allows you to set up notifications on anomalies in the datasets, which are sent through Amazon Simple Notification Service (Amazon SNS) and AWS Lambda functions. Previously, when you set up an alert, you were notified on all detected anomalies above the severity score you selected, which made it challenging to quickly identify the most relevant anomalies to your business. Now, by implementing filters and edits in the alert system, different business units within your organization are able to specify the types of alerts they receive. Your developers can benefit from this feature by being able to receive alerts on anomalies that are related to the development of their service, while your business analysts and business managers can track anomalies related to the status of their business, such as a location that is underperforming. For example, you may set up an alert to get notified when there is a spike or drop in your revenue. But you may only be interested in a specific store location and in a particular product. The filtering capability allows you to get alerted only when a revenue anomaly fits the criteria you have set.

Solution overview

In this post, we demonstrate how to create Alert with filters and how the configured filters publish alerts only for anomalies matching the filter criteria. The alert filters are based on metrics and dimensions that are present in the dataset definition for the anomaly detector. The solution enables you to use alert filters to get targeted notifications for anomalies detected in your data. The following diagram illustrates the solution architecture.

Provision resources with AWS CloudFormation

You can use the provided AWS CloudFormation stack to set up resources for the walkthrough. It contains resources to continuously generate live data and publish them to Amazon S3, create a detector (named TestAlertFilters) and add a dataset (named AlertFiltersDataset) to the detector. Complete the following steps:

  1. Choose Launch Stack:
  2. Choose Next.
  3. Enter a stack name (for example, L4MAlertFiltersStack).
  4. Enter the values for the detector (TestAlertFilters) and dataset (AlertFiltersDataset).
  5. Choose Next.
  6. Leave the settings for Configure stack options at their defaults and choose Next.
  7. Select the acknowledgement check box and choose Create stack.

Activate the detector created by CFN template

To set up your detector, complete the following steps:

  1. On the Lookout for Metrics console, choose Detectors in the navigation pane.
  2. Select the detector TestAlertFilters and choose View details.
  3. To activate the detector, you can either choose Activate at the top or choose Activate detector under How it works.
  4. Choose Activate to confirm if you want to activate the detector for continuous detection.

A confirmation message shows that the detector is activating. Activation can take up to 1 hour to complete. In the meantime, we can proceed with alert configuration.

Configure your alert

We now configure an alert to get notifications for anomalies detected by the detector. Alert filters are optional configurations, and you can select up to 5 measures and 5 dimensions while adding filters. In this post, we walk through creating an alert with filters. Complete the following steps:

  1. On your detector details page, choose Add alerts.
  2. Confirm your alert name.
    Lookout for Metrics populates the configuration fields with the metrics and dimensions supplied during dataset creation.In this release, the Severity score field is optional, which previously was a required field. By default, we start with severity score of 70, which you can change or remove.
  3. To add a measure, choose Add criteria and choose Measure.
  4. For Measure EQUALS, choose the revenue measure.
  5. Choose Add criteria again and choose Dimension.

    You can choose up to 5 dimension filters. For this post, we configure two.
  6. For Dimension, choose the marketplace dimension.
  7. For Equals, add the values US and CA.
  8. Add category as your second dimension with the values fashion and jewellery.
  9. For Severity score, enter 20.
  10. For Channel, choose Amazon SNS.
  11. Choose your SNS topic (for this post, we use the SNS topic to which we already subscribed our email to receive the alert notifications).
  12. Choose your format (for this post, we choose Long Text).
  13. Under Service access, select Use an existing service role and choose your role.
  14. Choose Add alert.

    A message appears when the alert is created successfully.
  15. Select the alert and choose View details.

You can review the alert filters and other details. The Filter criteria explains how the configured filters are used to filter anomalies before publishing alert notifications.

If you want to modify the alert configuration, select the alert on the Alerts page and choose Edit.

Alternatively, you can open the alert details page and choose Edit.

You’re redirected to the Edit page, where you can modify the alert configuration as required. You can modify the same configurations you set when you created the alert, but you can’t change the alert name while editing.

Review and analyze the results

When Lookout for Metrics detects anomalies in your data, it sends a notification if alerts were configured on that detector. If the anomaly group details match the filter criteria (measure filter, dimension filter, and severity score) of the alert, a notification is published.

For this example, we created two alerts on the detector, testAlertWithNoFilters and testRevenueForFashionOrJewelleryInUSOrCA, and injected anomalies in our data. We also enabled email subscription on the SNS topic used for alert notification publishing. The following screenshots show the details for each alert.

The following is an example of an anomaly notification for testRevenueForFashionOrJewelleryInUSOrCA:

{
"Type" : "Notification",
 "MessageId" : "0b0a7bfe-d029-5f4f-b706-20f644793c3d",
 "TopicArn" : "arn:aws:sns:us-west-2:488415817882:filterAlertsDemoTopic",
 "Message" : "[Amazon LookoutForMetrics] The anomaly detector TestAlertFilters detected 
             an anomaly in revenue with a severity score of 77.3 on May 25, 2022 at 8:05 PM.
             nAnomalous graphs were detected for the following:n
             nrevenue for: jewellery, thirdParty, CA, regular, priorityn
             nrevenue for: electronics, self, MX, premium, overnightn
             nrevenue for: electronics, self, US, regular, overnightn
             nTo view the anomaly, visit the Lookout for Metrics console at: 
             https://us-west-2.console.aws.amazon.com/lookoutmetrics/home?region=us-west-2#arn:aws:lookoutmetrics:us-west-2:488415817882:AnomalyDetector:TestAlertFilters/anomalies/anomaly/bd0a07e1-c520-46bd-aaa3-dcc00583d707
             nTo modify settings for this alert: https://us-west-2.console.aws.amazon.com/lookoutmetrics/home?region=us-west-2#arn:aws:lookoutmetrics:us-west-2:488415817882:AnomalyDetector:TestAlertFilters/alerts/alertDetails/arn:aws:lookoutmetrics:us-west-2:488415817882:Alert:testRevenueForFashionOrJewelleryInUSOrCA",
 "Timestamp" : "2022-05-25T20:31:12.330Z",
 "SignatureVersion" : "1",
 "Signature" : "pFDZj3TwLrL9rqjkRiVgbWjcrPhxz5PDV485d6NroLXWhrviX7sUEQqOIL5j8YYd0SFBjFEkrZKZ27RSbd+33sRhJ52mmd1eR23cZQP68+iIVdpeWubcPgGnqxoOa3APE1WZr4SmVK/bgJAjX1RXn0rKZvPzwDkxPD2fZB4gnbqPJ8GBw/1dxU5qfJzRpkqc87d1gpvQIwMpb5uUROuPZEQVyaR/By0BTsflkE2Sz2mOeZQkMaXz3q9dwX/qDxyR9q6gNviMagGtOLwtb6StN8/PUYlvK9fCBcJnJxg0bdmMtnXiXWdl1O7J50Wqj4Tkl8amph97UlVAnComoe649g==",
 "SigningCertURL" : "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-7ff5318490ec183fbaddaa2a969abfda.pem",
 "UnsubscribeURL" : "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-west-2:488415817882:filterAlertsDemoTopic:8f24ae74-b160-44c7-8bc9-96a30e27d365"
}

The following is an example of an anomaly notification for testAlertWithNoFilters:

{
 "Type" : "Notification",
 "MessageId" : "fcc70263-f2c1-52ed-81ec-596b8c399b67",
 "TopicArn" : "arn:aws:sns:us-west-2:488415817882:filterAlertsDemoTopic",
 "Message" : "[Amazon LookoutForMetrics] The anomaly detector TestAlertFilters detected 
             an anomaly in revenue with a severity score of 77.59 on May 25, 2022 at 6:35 PM.
             nAnomalous graphs were detected for the following:n
             nrevenue for: jewellery, self, UK, regular, overnightn
             nrevenue for: jewellery, thirdParty, JP, premium, overnightn
             nrevenue for: electronics, thirdParty, DE, premium, priorityn
             nTo view the anomaly, visit the Lookout for Metrics console at: 
             https://us-west-2.console.aws.amazon.com/lookoutmetrics/home?region=us-west-2#arn:aws:lookoutmetrics:us-west-2:488415817882:AnomalyDetector:TestAlertFilters/anomalies/anomaly/194c87f4-3312-420c-8920-12fbfc9b1700
             nTo modify settings for this alert: https://us-west-2.console.aws.amazon.com/lookoutmetrics/home?region=us-west-2#arn:aws:lookoutmetrics:us-west-2:488415817882:AnomalyDetector:TestAlertFilters/alerts/alertDetails/arn:aws:lookoutmetrics:us-west-2:488415817882:Alert:testAlertWithNoFilters",
 "Timestamp" : "2022-05-25T19:00:08.374Z",
 "SignatureVersion" : "1",
 "Signature" : "e4+BHo4eh8wNbfQMaR3L8MWY2wkpqxoxKKrj2h/QROQHvhcnYfucYchjfppgjM8LNIF7Oo4QfuP6qcLj9DlghiMZ80qpzHyAH6vmIDfSjK7Bz23i8rnIMyKJIVRFN8z69YlC9vfsp3MayWyyMJcskeVJ1bzsdkDIeA5gkT1le8yh/9nhbsgwm+bowNjsnl+/sFwk6QZJlplYB27sOqegrm73nH/CrmTe4FcPtekCRysSECwMLKazPJqR1uiGagnWfUeyTptRg9rVQVQJJdmOUwlv8vodR96s52btAegpY4iZZLUJ87vs1PwOwVfTTIHf+pdnwPUuFupzejUEudP7sQ==",
 "SigningCertURL" : "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-7ff5318490ec183fbaddaa2a969abfda.pem",
 "UnsubscribeURL" : "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-west-2:488415817882:filterAlertsDemoTopic:8f24ae74-b160-44c7-8bc9-96a30e27d365"
}

We didn’t receive the notification for this anomaly through the testRevenueForFashionOrJewelleryInUSOrCA alert because the anomaly group details don’t match the filter criteria for dimension marketplace. For our filter criteria on the measure revenue, the dimension marketplace must equal US or CA, and the dimension category must equal fashion or jewellery, with a severity threshold of 20.

Although the anomaly detected matches the filter criteria for the measure, severity score, and category dimension, it doesn’t match the criteria for the marketplace dimension, so the alert wasn’t published.

Based on the notifications we received, we can confirm that Lookout for Metrics detected anomalies and verified the alert filter-based notifications.

Clean up

After you complete the testing, you can delete the CloudFormation stack created by the template. Deletion of stack the cleans up all the resources created for the purpose of this test. To delete the stack, open the AWS CloudFormation console, select the stack L4MAlertFiltersStack, and choose Delete.

Deletion of the stack doesn’t delete the S3 bucket created by the template because it’s not empty; you have to delete it manually.

Conclusion

You can now easily customize your notification experience by adding filters and editing existing alerts to reduce noise and focus on the metrics that matter the most to your business.

To learn more about this capability, see Working with Alerts. You can use this capability in all Regions where Lookout for Metrics is publicly available. For more information about Region availability, see AWS Regional Services.


About the Authors

Alex Kim is a Sr. Product Manager for AWS AI Services. His mission is to deliver AI/ML solutions to all customers who can benefit from it. In his free time, he enjoys all types of sports and discovering new places to eat.

Utkarsh Dubey is a Software Development Engineer in the Lookout for Metrics team. His interests lie in building scalable distributed systems. In his spare time, he enjoys traveling and catching up with friends.

Read More

Use a pre-signed URL to provide your business analysts with secure access to Amazon SageMaker Canvas

Agility and security have historically been two aspects of IT of paramount importance for any company. With the simplification of access to advanced IT technologies thanks to low-code and no-code (LCNC) tools, an even bigger number of people must be enabled to access resources, without impacting security. For many companies, the solution has been to develop a company web portal, which simplifies access to cloud applications and resources, by redirecting to or embedding applications, so that employees can have a single point of access to the services they use most.

In this post, we suggest an architecture for a company with an existing web portal to generate a pre-signed URL redirecting to Amazon SageMaker Canvas, a visual point-and-click interface for business analysts to build machine learning (ML) models and generate accurate predictions without writing code or having any previous ML experience, without having to log in via the AWS Management Console.

Solution overview

The solution architecture is composed of three main parts:

  • The company web portal, with its own system for authentication of users and other resources.
  • An AWS Lambda function, responsible for calling the Amazon SageMaker SDK. This function is directly called via its function URL, a simple way to assign an HTTP(S) endpoint to the Lambda function directly, without the need for a REST API.
  • The Canvas app.

The following diagram illustrates the solution workflow.

The flow has four steps:

  1. The business analyst accesses the company portal, (optionally) authenticates, then chooses to generate a Canvas URL.
  2. The Lambda function receives information about the user from the company portal, and uses it to call SageMaker via an AWS SDK to generate a presigned Canvas URL. For this post, we use the AWS SDK for Python (Boto3).
  3. The generated URL is sent back to the business analyst through the company portal.
  4. The business analyst can then choose that link to access Canvas directly, without having to access the console.

Prerequisites

Before you implement the solution architecture, make sure that you have correctly onboarded to an Amazon SageMaker Studio domain using AWS Identity and Access Management (IAM). For instructions, refer to Onboard to Amazon SageMaker Domain Using IAM. IAM as method of authentication is a strict requirement, because the API CreatePresignedDomainURL requires the IAM authentication method, and it won’t work with AWS Single Sign-On authentication for your domain. Also, make sure you have created at least one user profile for your Studio domain.

Deploy the solution

The first step is to create the Lambda function.

  1. On the Lambda console, choose Create function.
  2. For Name, enter a name (for this post, canvas-presignedURL).
  3. For Runtime, choose Python 3.9.
  4. For Architecture, select your preferred architecture (for this post, we select arm64).
  5. Under Permissions, expand Change default execution role.
  6. Select Create a new role with basic Lambda permissions.
    We change the Lambda permissions in a later step.
  7. Under Advanced settings, select Enable function URL.
  8. For Auth type, select NONE.
    For this post, we don’t provide authentication details to our requests. However, this isn’t a best practice and it’s not advised for production workloads. We suggest using IAM authentication for your Lambda function, or another method for authentication and authorization such as Amazon Cognito.
  9. If your domain runs in a VPC, select Enable VPC to access those private resources.
  10. Choose Create function.
    Function creation takes a few seconds to complete. You can now set up the permissions to run SageMaker calls.
  11. On the Configuration tab, choose Permissions in the left pane.
  12. Choose your role name.

    You’re redirected to the IAM console.
  13. Choose Add permissions.
  14. Choose Create inline policy.
  15. For Service, choose SageMaker.
  16. For Actions, choose CreatePresignedDomainUrl.
  17. For Resources, select Any in this account.
  18. Choose Review.
  19. Enter a name for the policy (for this post, CanvasPresignedURLsFromLambda).
  20. Choose Create policy.
    The policy is now created and assigned to the role. You can close the IAM console tab and return to the Lambda console.Now it’s time to change our code base to run a call to SageMaker. We use the Boto3 call create_presigned_domain_url.
  21. On the Code tab, replace the code inside the lambda_function.py file with the following:
    import json
    import boto3
    
    sagemaker = boto3.client('sagemaker')
    SESSION_EXPIRATION_IN_SECONDS = 8*60*60 # the session will be valid for 8 hours
    URL_TIME_TO_LIVE_IN_SECONDS = 60 # the URL is only valid for 60 seconds
    
    def lambda_handler(event, context):
        
        # Parse the event body
        body = json.loads(event['body'])
        
        # Pass the domain ID and user profile name as part of the request
        domain_id = body['domain_id']
        user_profile_name = body['user_profile_name']
        
        # Call the service to create the URL
        response = sagemaker.create_presigned_domain_url(
            DomainId=domain_id,
            UserProfileName=user_profile_name,
            SessionExpirationDurationInSeconds=SESSION_EXPIRATION_IN_SECONDS,
            ExpiresInSeconds=URL_TIME_TO_LIVE_IN_SECONDS
        )
        studio_url = response['AuthorizedUrl']
        
        # Add the redirect to Canvas
        canvas_url = studio_url + '&redirect=Canvas'
        
        # Return to the app
        return {
            'statusCode': 200,
            'body': json.dumps(canvas_url)
        }

    The preceding code consists of three main steps:

    • Parsing the body of the request and retrieving the Studio domain ID and user profile name
    • Calling the API with this information
    • Adding the redirection to Canvas and returning the result

    Now that the function is ready, let’s test it.

  22. Choose Deploy, then choose Test.
  23. In the test event configuration, provide the following event JSON, substituting the correct values:
    {
      "body": "{"domain_id": "<YOUR-DOMAIN-ID>","user_profile_name": "<YOUR-USER-PROFILE>"}"
    }

  24. Save the test event and choose Test again.

Your result should now be available in the body of your response.

You can now test this with your HTTP request tool of choice, such as CURL or Postman, to integrate into your existing company web portal. Below, a screenshot of a Postman POST request to the AWS Lambda function URL created in the previous steps, and the response payload containing the pre-signed URL.

The following screenshot shows an example of a (simplified) company web portal that, upon login, generates a pre-signed URL to access Amazon SageMaker Canvas.

Conclusion

In this post, we discussed a solution to help business analysts experience no-code ML via Canvas in a secured and unified way through their company web portal, without the need to allow access via the console. We used a Lambda function to generate a presigned URL, which the business analyst can use directly in their browser.

To make this solution production-ready, we suggest considering how to implement authentication and authorization, either via IAM authentication of Lambda functions with function URLs, or more advanced solutions based on Amazon API Gateway, such as API Gateway Lambda authorizers. For more information, refer to Security and auth model for Lambda function URLs.

If you haven’t built yet your company web portal, you might want to check out AWS Amplify Studio, a visual development environment that lets developers easily build and ship complete web and mobile apps in hours instead of weeks. With Amplify Studio, you can quickly build an app backend, create rich user interface (UI) components, and connect a UI to the backend with minimal coding.

To learn more about Canvas, check out Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capability for Business Analysts.


About the Author

Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customers throughout Benelux. He has been a developer since very young, starting to code at the age of 7. He started learning AI/ML in his later years of university, and has fallen in love with it since then.

Read More

Enable business analysts to access Amazon SageMaker Canvas without using the AWS Management Console with AWS SSO

IT has evolved in recent years: thanks to low-code and no-code (LCNC) technologies, an increasing number of people with varying backgrounds require access to tools and platforms that were previously a prerogative to more tech-savvy individuals in the company, such as engineers or developers.

Out of those LCNC technologies, we have recently announced Amazon SageMaker Canvas, a visual point-and-click interface for business analysts to build machine learning (ML) models and generate accurate predictions without writing code or having any previous ML experience.

To enable agility for those new users while ensuring security of the environments, many companies have chosen to adopt single sign-on technology, such as AWS Single Sign-On. AWS SSO is a cloud-based single sign-on service that makes it easy to centrally manage SSO access to all your AWS accounts and cloud applications. It includes a user portal where end-users can find and access all their assigned AWS accounts and cloud applications in one place, including custom applications that support Security Assertion Markup Language (SAML) 2.0.

In this post, we walk you through the necessary steps to configure Canvas as a custom SAML 2.0 application in AWS SSO, so that your business analysts can seamlessly access Canvas with their credentials from AWS SSO or other existing identity providers (IdPs), without the need to do so via the AWS Management Console.

Solution overview

To establish a connection from AWS SSO to the Amazon SageMaker Studio domain app, you must complete the following steps:

  1. Create a user profile in Studio for every AWS SSO user that should access Canvas.
  2. Create a custom SAML 2.0 application in AWS SSO and assign it to the users.
  3. Create the necessary AWS Identity and Access Management (IAM) SAML provider and AWS SSO role.
  4. Map the necessary information from AWS SSO to the SageMaker domain via attribute mappings.
  5. Access the Canvas application from AWS SSO.

Prerequisites

To connect Canvas to AWS SSO, you must have the following prerequisites set up:

Create a Studio domain user profile

In a Studio domain, every user has their own user profile. Studio apps like Studio IDE, RStudio, and Canvas can be created by these user profiles, and are bound to the user profile that has created them.

For AWS SSO to access the Canvas app for a given user profile, you have to map the user profile name to the user name in AWS SSO. This way, the AWS SSO user name—and therefore the user profile name—can be passed automatically by AWS SSO to Canvas.

In this post, we assume that AWS SSO users are already available, created during the prerequisites of onboarding to AWS SSO. You need a user profile for each AWS SSO user that you want to onboard to your Studio domain and therefore to Canvas.

To retrieve this information, navigate to the Users page on the AWS SSO console. Here you can see the user name of your user, in our case davide-gallitelli.

With this information, you can now go to your Studio domain and create a new user profile called exactly davide-gallitelli.

If you have another IdP, you can use any information provided by it to name your user profile, as long as it’s unique for your domain. Just make sure you map it correctly according to AWS SSO attribute mapping.

Create the custom SAML 2.0 application in AWS SSO

The next step is to create a custom SAML 2.0 application in AWS SSO.

  1. On the AWS SSO console, choose Applications in the navigation pane.
  2. Choose Add a new application.
  3. Choose Add a custom SAML 2.0 application.
  4. Download the AWS SSO SAML metadata file, which you use during IAM configuration.
  5. For Display name, enter a name, such as SageMaker Canvas followed by your Region.
  6. For Description, enter an optional description.
  7. For Application start URL, leave as is.
  8. For Relay state, enter https://YOUR-REGION.console.aws.amazon.com/sagemaker/home?region=YOUR-REGION#/studio/canvas/open/YOUR-STUDIO-DOMAIN-ID.
  9. For Session duration, choose your session duration. We suggest 8 hours.
    The Session duration value represents the amount of time you want the user session to last before authentication is required again. One hour is the most secure, whereas more time means less need for interaction. We choose 8 hours in this case, equivalent to one work day.
  10. For Application ACS URL, enter https://signin.aws.amazon.com/saml.
  11. For Application SAML audience, enter urn:amazon:webservices.
    After your settings are saved, your application configuration should look similar to the following screenshot.
    You can now assign your users to this application, so that the application appears in their AWS SSO portal after login.
  12. On the Assigned users tab, choose Assign users.
  13. Choose your users.

Optionally, if you want to enable a lot of data scientists and business analysts in your company to use Canvas, the fastest and easiest way is to use AWS SSO groups. To do so, we create two AWS SSO groups: business-analysts and data-scientists. We assign the users to these groups according to their roles, and then give access to the application to both groups.

Configure your IAM SAML provider and AWS SSO role

To configure your IAM SAML provider, complete the following steps:

  1. On the IAM console, choose Identity providers in the navigation pane.
  2. Choose Add provider.
  3. For Provider type, select SAML.
  4. For Provider name, enter a name, such as AWS_SSO_Canvas.
  5. Upload the metadata document you downloaded earlier.
  6. Note the ARN to use in a later step.

    We also need to create a new role for AWS SSO to use to access the application.
  7. On the IAM console, choose Roles in the navigation pane.
  8. Choose Create role.
  9. For Trusted entity type, select SAML 2.0 federation.
  10. For SAML 2.0-based provider, choose the provider you created (AWS_SSO_Canvas).
  11. Don’t select either of the two SAML 2.0 access methods.
  12. For Attribute, choose SAML:sub_type.
  13. For Value, enter persistent.
  14. Choose Next.

    We need to give AWS SSO the permission to create a Studio domain presigned URL, which we need to perform the redirect to Canvas.
  15. On the Permissions policies page, choose Create policy.
  16. On the Create policy tab, choose JSON and enter the following code:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "sagemaker:CreatePresignedDomainUrlWithPrincipalTag",
                    "sagemaker:CreatePresignedDomainUrl"
                ],
                "Resource": "*"
            }
        ]
    }

  17. Choose Next:Tags and provide tags if needed.
  18. Choose Next:Review.
  19. Name the policy, for example CanvasSSOPresignedURL.
  20. Choose Create policy.
  21. Return to the Add permissions page and search for the policy you created.
  22. Select the policy, then choose Next.
  23. Name the role, for example AWS_SSO_Canvas_Role, and provide an optional description.
  24. On the review page, edit the trust policy to match the following code:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Federated": "<ARN OF THE SAML PROVIDER FROM IAM>"
                },
                "Action": [
                    "sts:AssumeRoleWithSAML",
                    "sts:SetSourceIdentity",
                    "sts:TagSession"
                ],
                "Condition": {
                    "StringEquals": {
                        "SAML:sub_type": "persistent",
                        "SAML:aud": "https://signin.aws.amazon.com/saml"
                    }
                }
            }
        ]
    }

  25. Save the changes, then choose Create role.
  26. Note the ARN of this role as well, to use in the following section.

Configure the attribute mappings in AWS SSO

The final step is to configure the attribute mappings. The attributes you map here become part of the SAML assertion that is sent to the application. You can choose which user attributes in your application map to corresponding user attributes in your connected directory. For more information, refer to Attribute mappings.

  1. On the AWS SSO console, navigate to the application you created.
  2. On the Attribute mappings tab, configure the following mappings:
User attribute in the application Maps to this string value or user attribute in AWS SSO
Subject ${user:email}
https://aws.amazon.com/SAML/Attributes/RoleSessionName ${user:email}
https://aws.amazon.com/SAML/Attributes/PrincipalTag:SageMakerStudioUserProfileName ${user:subject}
https://aws.amazon.com/SAML/Attributes/Role <ARN OF THE SAML PROVIDER FROM IAM>, <ARN OF THE CANVAS SSO ROLE FROM IAM>
  1. Choose Save changes.

You’re done!

Access the Canvas application from AWS SSO

On the AWS SSO console, note down the user portal URL. We suggest you log out of your AWS account first, or open an incognito browser window. Navigate to the user portal URL, log in with the credentials you set for the AWS SSO user, then choose your Canvas application.

You’re automatically redirected to the Canvas application.

Conclusion

In this post, we discussed a solution to enable business analysts to experience no-code ML via Canvas in a secured and unified way through a single sign-on portal. To do this, we configured Canvas as a custom SAML 2.0 application within AWS SSO. Business analysts are now one click away from using Canvas and solving new challenges with no-code ML. This enables the security needed by cloud engineering and security teams, while allowing for the agility and independence of business analysts teams. A similar process can be replicated in any IdP by reproducing these steps and adapting them to the specific SSO.

To learn more about Canvas, check out Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capability for Business Analysts. Canvas also enables easy collaboration with data science teams. To learn more, see Build, Share, Deploy: how business analysts and data scientists achieve faster time-to-market using no-code ML and Amazon SageMaker Canvas. For IT administrators, we suggest checking out Setting up and managing Amazon SageMaker Canvas (for IT administrators).


About the Author

Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customer throughout Benelux. He has been a developer since very young, starting to code at the age of 7. He started learning AI/ML in his later years of university, and has fallen in love with it since then.

Read More

Create, train, and deploy a billion-parameter language model on terabytes of data with TensorFlow and Amazon SageMaker

The increasing size of language models has been one of the biggest trends in natural language processing (NLP) in recent years. Since 2018, we’ve seen unprecedented development and deployment of ever-larger language models, including BERT and its variants, GPT-2, T-NLG, and GPT-3 (175 billion parameters).

These models have pushed the boundaries of possible architectural innovations. We face several challenges when training large-scale deep learning models, especially the new wave of generative pre-trained transformers. These challenges include hardware limitations and trade-offs with computation and efficiency. To overcome these challenges of model and data parallelism, AWS offers a wide range of capabilities.

In this post, we introduce two main approaches: data parallelization and model parallelization using Amazon SageMaker, and discuss their pros and cons.

The model

For the language model, we use Transformers, introduced in the paper Attention Is All You Need. Transformers are deep learning models designed to deliberately avoid the pitfalls of RNNs by relying on a self-attention mechanism to draw global dependencies between input and output. The Transformer model architecture allows for significantly better parallelization and can achieve high performance in relatively short training time. Built on the success of Transformers, BERT, introduced in the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, added bidirectional pre-training for language representation. Inspired by the Cloze task, BERT is pre-trained with masked language modeling (MLM), in which the model learns to recover the original words for randomly masked tokens. The BERT model is also pretrained on the next sentence prediction (NSP) task to predict if two sentences are in correct reading order. Since its advent in 2018, BERT and its variations have been widely used in language models.

We begin by creating two embedding layers for token and positional embedding. The input embeddings are the sum of the token embeddings and position embeddings.

class TokenAndPositionEmbedding(tf.keras.layers.Layer):
    """
    Creates two separate embedding layers: one for tokens and one for token index (positions).
    """
    def __init__(self, maxlen, vocab_size, embed_dim):
        super(TokenAndPositionEmbedding, self).__init__()
        self.token_emb = tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embed_dim)
        self.pos_emb = tf.keras.layers.Embedding(input_dim=maxlen, output_dim=embed_dim)

    def call(self, x):
        maxlen = tf.shape(x)[-1]

        # positions are represented by a token's index
        positions = tf.range(start=0, limit=maxlen, delta=1)
        positions = self.pos_emb(positions)

        # token embedding
        x = self.token_emb(x)

        # return sum as input 
        return x + positions

Then we define a transformer decoder block with two sub-layers: a multi-head self-attention layer, and a simple fully connected feed-forward network followed by layer normalization and dropout:

class TransformerBlock(tf.keras.layers.Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        # self attention layer
        super(TransformerBlock, self).__init__()
        self.att = tf.keras.layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim)
        
        # feed forward layer
        self.ffn = [tf.keras.layers.Dense(ff_dim, activation="relu"), tf.keras.layers.Dense(embed_dim)]

        # layer normalization 
        self.layernorm1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)

        # dropout 
        self.dropout1 = tf.keras.layers.Dropout(rate)
        self.dropout2 = tf.keras.layers.Dropout(rate)

    def call(self, inputs):
        # getting batch size and seq len from input shape
        input_shape = tf.shape(inputs)
        batch_size = input_shape[0]
        seq_len = input_shape[1]

        # decoder casual mask
        casual_mask = casual_attention_mask(batch_size, seq_len, seq_len, tf.bool)

        # self attention forward pass
        attention_output = self.att(inputs, inputs, attention_mask=causal_mask)

        # dense layers, dropout and normalization
        attention_output = self.dropout1(attention_output)
        ffn_output = self.ffn[0](out1)
        ffn_output = self.ffn[1](ffn_output)
        out2 = self.dropout2(ffn_output)
        
        return self.layernorm2(out1 + out2)

Finally, we create our language model with the preceding embedding layer and transformer blocks:

class MyModel(tf.keras.Model):
    def __init__(self, maxlen, vocab_size, embed_dim, num_heads, feed_forward_dim, num_layers, learning_rate):
        super(MyModel, self).__init__(maxlen, vocab_size, embed_dim, num_heads, feed_forward_dim, num_layers, learning_rate)

        # embedding layer
        self.embedding_layer = TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)

        # transformer blocks
        self.transformer_blocks = [
            TransformerBlock(embed_dim, num_heads, feed_forward_dim)
            for i in range(num_layers)
        ]

        # last dense layer
        self.dense = tf.keras.layers.Dense(vocab_size)
        
    def call(self, inputs, training=None):
        x_emb = self.embedding_layer(inputs)
        x = x_emb        
        for transformer_block in self.transformer_blocks:
            x = transformer_block(x)
        outputs = self.dense(x)
        return [outputs, x_emb]


def init_train_settings(maxlen, vocab_size, embed_dim, num_heads, feed_forward_dim, num_layers, learning_rate):
    """
    Creates model, optimizer and loss function 
    """
    model = MyModel(maxlen, vocab_size, embed_dim, num_heads, feed_forward_dim, num_layers, learning_rate) 
    loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
    return model, optimizer, loss_fn

Depending on your hyperparameters, you can scale this model from thousands of parameters to billions of parameters. The primary challenge with billion-parameter models is that you can’t host the model in one instance and need to distribute the model over several nodes for training and inference.

The dataset

In our experiments, we used the Pile dataset. The Pile is an 800 GiB English text dataset designed for training large-scale language models. It is created from 22 diverse and high-quality datasets, including both established NLP datasets and newly introduced ones.

The dataset is created from a variety of data sources, including books; GitHub repositories; webpages; chat logs; and medical, physics, math, computer science, and philosophy papers. Specifically, it uses the following sources: Pile-CC, PubMed Central, ArXiv, GitHub, the FreeLaw Project, Stack Exchange, the US Patent and Trademark Office, PubMed, Ubuntu, IRC, HackerNews, YouTube, PhilPapers, Books3, Project Gutenberg (PG-19), OpenSubtitles, English Wikipedia, DM Mathematics, EuroParl, the Enron Emails corpus, and NIH ExPorter. It also includes OpenWebText2 and BookCorpus2, which are extensions of the original OpenWebText and BookCorpus datasets, respectively. The diversity in data sources can improve the general cross-domain knowledge and consequently improve downstream generalization capabilities.

The primary challenge with this dataset is the sheer size; the dataset has 825 GiB of text, which translates into 4.2 TiB of preprocessed and compressed datapoints. Similar to the challenges we face with training and hosting the models, training a model with this dataset on a single instance will take a lot of time and isn’t practical.

Our solution is to break down the dataset into approximately 1 GiB chunks of data, load and preprocess the features in TensorFlow Dataset objects, and store them in Amazon Elastic File Service (Amazon EFS). TensorFlow datasets provide an easy-to-use and high-performance data pipeline that integrates well with our models. Amazon EFS is an easy-to-use service that enables us to build a shared file system that scales automatically as files are added and deleted. In addition, Amazon EFS is capable of bursting to higher throughput levels when needed, which is critical in our data and model training pipeline.

Next, we look into distributed training strategies to tackle these challenges.

Distributed training

In this project, we faced two challenges: scaling model size and data volume. Increasing the model size and number of trainable parameters may result in better accuracy, but there’s a limit to the model you can fit into a single GPU memory or even multiple GPUs in a single instance. In addition, bigger model sizes take more time to train.

You can tackle these challenges two different ways: data parallelism and model parallelism. With data parallelism, we perform Stochastic Gradient Descent (SGD) by distributing the records of a mini-batch over different devices to speed up the training. However, parallel data training comes with extra complexity of computing mini-batch gradient average with gradients from all devices, a step called AllReduce, which becomes harder as the training cluster is grown. While using data parallelism, we must be able to fit the model and a single datapoint in a device (CPU or GPU), which is a limiting factor in our experiments because the size of such a large model is much larger than the single GPU’s memory size.

Another solution is to use model parallelism, which splits the model over multiple devices. Model parallelism is the process of splitting a model up between multiple devices or nodes (such as GPU-equipped instances) and creating an efficient pipeline to train the model across these devices to maximize GPU utilization.

Data parallelization

Parallelizing the data is the most common approach to multiple GPUs or distributed training. You can batch your data, send it to multiple devices (each hosting a replicated model), then aggregate the results. We experimented with two packages for data parallelization: Horovod and the SageMaker distributed data parallel library.

Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. To use Horovod, we went through the following process:

  1. Initialize by running hvd.init().
  2. Associate each device with a single process. The first process or worker is associated with the first device, the second process is associated with the second device, and so on.
  3. Adjust the learning rate based on the number of devices.
  4. Wrap the optimizer in hvd.DistributedOptimizer.
  5. Broadcast the initial variable states from the first worker with rank 0 to all other processes. This is necessary to ensure consistent initialization of all workers when training is started with random weights or restored from a checkpoint.
  6. Make sure that only device 0 can save checkpoints to prevent other workers from corrupting them.

The following is the training script:

import horovod.tensorflow as hvd
# Initialize Horovod
hvd.init()

# Pin GPU to be used to process local rank (one GPU per process)
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)
if gpus:
    tf.config.experimental.set_visible_devices(gpus[hvd.local_rank()], 'GPU')

# Build model
...

@tf.function
def training_step(texts, labels, first_batch):
    with tf.GradientTape() as tape:
        predictions = model(texts, training=True)
        loss = loss_fn(labels, predictions[0])

    # Horovod: add Horovod Distributed GradientTape.
    tape = hvd.DistributedGradientTape(tape)

    grads = tape.gradient(loss, model.trainable_variables)
    opt.apply_gradients(zip(grads, model.trainable_variables))

    # Horovod: broadcast initial variable states from rank 0 to all other processes.
    # This is necessary to ensure consistent initialization of all workers when
    # training is started with random weights or restored from a checkpoint.
    #
    # Note: broadcast should be done after the first gradient step to ensure optimizer
    # initialization.
    if first_batch:
        hvd.broadcast_variables(model.variables, root_rank=0)
        hvd.broadcast_variables(opt.variables(), root_rank=0)

    return loss

# Horovod: adjust number of steps based on number of GPUs.
for batch, (texts, labels) in enumerate(dataset.take(10000 // hvd.size())):
    loss = training_step(texts, labels, batch == 0)

    if batch % 10 == 0 and hvd.local_rank() == 0:
        print('Step #%dtLoss: %.6f' % (batch, loss))

# Horovod: save checkpoints only on worker 0 to prevent other workers from
# corrupting it.
if hvd.rank() == 0:
    checkpoint.save(checkpoint_dir)

The SageMaker data parallel library enables us to scale our training with near-linear efficiency, speeding up our training with minimal code changes. The library performs a custom AllReduce operation and optimizes device-to-device communication by fully utilizing AWS’s network infrastructure and Amazon Elastic Compute Cloud (Amazon EC2) instance topology. To use the SageMaker data parallel library, we went through the following process:

  1. Import and initialize sdp.init().
  2. Associate each device with a single smdistributed.dataparallel process with local_rank. sdp.tensorflow.local_rank() gives us the local rank of devices. The leader is rank 0, and workers are rank 1, 2, 3, and so on.
  3. Adjust the learning rate based on the number of devices.
  4. Wrap tf.GradientTape with DistributedGradientTape to perform AllReduce.
  5. Broadcast the initial model variables from the leader node to all the worker nodes.
  6. Make sure that only device 0 can save checkpoints.

Model parallelization

We can adjust the hyperparameters to keep the model small enough to train using a single GPU, or we can use model parallelism to split the model between multiple GPUs across multiple instances. Increasing a model’s number of trainable parameters can result in better accuracy, but there’s a limit to the maximum model size you can fit in a single GPU memory. We used the SageMaker distributed model parallel library to train our larger models. The steps are as follows:

  1. Import and initialize the library with smp.init().
  2. The Keras model needs to inherit from smp.DistributedModel instead of the Keras Model class.
  3. Set drop_remainder=True in the tf.Dataset.batch() method to ensure that the batch size is always divisible by the number of microbatches.
  4. Random operations in the data pipeline all need to use the same seed: smp.dp_rank(), for example, shuffle(ds, seed=smp.dp_rank()). This ensures consistency of data samples across devices that hold different model partitions.
  5. Forward and backward logic needs to be in a step function with smp.step decoration.
  6. Perform postprocessing on the outputs across microbatches using StepOutput methods such as reduce_mean. The smp.step function must have a return value that depends on the output of smp.DistributedModel.

The training script is as follows:

import smdistributed.modelparallel.tensorflow as smp

# SMP: Initialize
smp.init()

# SMP: Define smp.DistributedModel the same way as Keras sub-classing API
class MyModel(smp.DistributedModel):
    def __init__(self, maxlen, vocab_size, embed_dim, num_heads, feed_forward_dim, num_layers, learning_rate):
        super(MyModel, self).__init__(maxlen, vocab_size, embed_dim, num_heads, feed_forward_dim, num_layers, learning_rate)
        
        self.embedding_layer = gpt_model.TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)
        self.transformer_blocks = [
            gpt_model.TransformerBlock(embed_dim, num_heads, feed_forward_dim)
            for i in range(num_layers)
        ]
        self.dense = tf.keras.layers.Dense(vocab_size)
        
    def call(self, inputs, training=None):
        x_emb = self.embedding_layer(inputs)
        x = x_emb

        for transformer_block in self.transformer_blocks:
            x = transformer_block(x)
        outputs = self.dense(x)
        return [outputs, x_emb]


# SMP: Define smp.step. Return any tensors needed outside
@smp.step
def get_grads(texts, labels):
    predictions = model(texts, training=True)
    loss = loss_fn(labels, predictions[0])
    grads = optimizer.get_gradients(loss, model.trainable_variables)
    return grads, loss, predictions[0]

@tf.function
def train_step(texts, labels, first_batch):
    gradients, loss, predictions = get_grads(texts, labels)
    # SMP: Accumulate the gradients across microbatches
    gradients = [g.accumulate() for g in gradients]
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    
    # SMP: Average the loss across microbatches
    train_loss(loss.reduce_mean())
    # SMP: Merge predictions across microbatches
    train_accuracy(labels, predictions.merge())
    return loss.reduce_mean()

histories = []

for _ in range(epochs):
    train_loss.reset_states()
    train_accuracy.reset_states()

    for texts, labels in text_ds:
        for i in range(128):
            text = tf.expand_dims(texts[0][i], axis=0)
            label = tf.expand_dims(labels[0][i], axis=0)
            train_step(text, label)  

For a detailed guide to enable the TensorFlow training script for the SageMaker distributed model parallel library, refer to Modify a TensorFlow Training Script. For PyTorch, refer to Modify a PyTorch Training Script.

SageMaker Debugger

In the previous sections, we discussed how to optimize the training using model and data parallelization techniques. With Amazon SageMaker Debugger, we can now capture performance profiling information from our training runs to determine how much the training has improved. By default, Debugger captures system metrics for each SageMaker training job such as GPU, CPU utilization, memory, network, and I/O at a sampling interval of 500 milliseconds. We can access the data as follows:

from smdebug.profiler.analysis.notebook_utils.training_job import TrainingJob
tj = TrainingJob('SMD-MP-demo-2022-01-21-06-43-23-841', "us-east-1")
tj.wait_for_sys_profiling_data_to_be_available()
system_metrics_reader = tj.get_systems_metrics_reader()

Debugger provides utilities to visualize the profiling data in different ways. In the following example, we see the total GPU and CPU utilization as well as the I/O wait time for the multi-GPU training job using Horovod. To generate these graphs, we run the following code:

from smdebug.profiler.analysis.notebook_utils.timeline_charts import TimelineCharts

view_timeline_charts = TimelineCharts(
    system_metrics_reader, 
    framework_metrics_reader,
    select_dimensions=["CPU", "GPU", "I/O"], 
    select_events=["total"],
    show_workers=False           
)

The GPU utilization frequently fluctuates between 0–100%, and high I/O wait times with low GPU utilization are an indicator of an I/O bottleneck. Furthermore, the total CPU utilization never exceeds 70%, which means that we can improve data preprocessing by increasing the number of worker processes.

We can improve performance by switching from Horovod to the SageMaker distributed data parallel library. In the following graphs, we can see that GPUs are utilized more efficiently and only dropping to low utilization for short periods of time.

Training infrastructure

For training the models, we used 10 ml.p3.16xlarge instances using a SageMaker training job. SageMaker reduces the time and cost to train and tune machine learning (ML) models without the need to manage infrastructure. With SageMaker, you can easily train and tune ML models using built-in tools to manage and track training experiments, automatically choose optimal hyperparameters, debug training jobs, and monitor the utilization of system resources such as GPUs, CPUs, and network bandwidth. The data was hosted in Amazon EFS, which enabled us to grow and shrink as we add and remove files with no need for management or provisioning. Our primary objectives were to improve training speed and reduce costs.

Model scalability

Although this infrastructure is primarily used for language generation, with the GPT architecture and Pile dataset, you can use these techniques to train large-scale transformer models, which is useful in many domains beyond NLP. In machine learning itself, many computer vision tasks are now solved with large-parameter (transformer) architectures where they have been shown to outperform traditional CNNs (Convolutional Neural Network) on tasks like representation learning (see Advancing the state of the art in computer vision with self-supervised Transformers and 10x more efficient training) and large-scale mapping of images to text (such as CLIP). Large-parameter models are also breaking new ground in life sciences in fields like protein structure analysis and analysis of medical image data.

The solutions we detail in this post for distributed training and managing large models should apply to models in any of these domains as well.

Trade-offs

There has been an ongoing discussion in the research community regarding the risks of training large-scale language models, and whether enough thought has been put into the potential risks associated with developing them and strategies to mitigate these risks, some of which include the financial and environmental costs. According to a paper published in ACM, training a single BERT base model (without hyperparameter tuning) on GPUs was estimated to require as much energy as a trans-American flight. The environmental impacts scale with model size, and being able to efficiently fine-tune such models can potentially curtail the emissions significantly. AWS recently launched a new Customer Carbon Footprint Tool, available to all AWS customers at no cost, as part of Amazon’s efforts to increase sustainability and reduce carbon emissions. Running applications on the AWS Cloud can potentially decrease the carbon footprint (when compared to enterprise data centers that were surveyed in a 2019 report).

Conclusion

This post demonstrated a solution that facilitates the fine-tuning of language models with a billion parameters on the AWS Cloud using SageMaker.

For more information about model parallelism with SageMaker, refer to Train 175+ billion parameter NLP models with model parallel additions and Hugging Face on Amazon SageMaker and How Latent Space used the Amazon SageMaker model parallelism library to push the frontiers of large-scale transformers.

If you’d like help accelerating your use of ML in your products and processes, please contact the Amazon ML Solutions Lab.


About the Authors

Sia Gholami is a Senior Data Scientist at the Amazon ML Solutions Lab, where he builds AI/ML solutions for customers across various industries. He is passionate about natural language processing (NLP) and deep learning. Outside of work, Sia enjoys spending time in nature and playing tennis.

Mehdi Nooriis a Manager and a Senior Applied Scientist at the Amazon ML Solutions Lab, where he works with customers across various industries, and helps them to accelerate their cloud migration journey, and to solve their ML problems using state-of-the-art solutions and technologies.

Muhyun Kim is a data scientist at Amazon Machine Learning Solutions Lab. He solves customer’s various business problems by applying machine learning and deep learning, and also helps them gets skilled.

Danny Byrd is an Applied Scientist at the Amazon ML Solutions Lab. At the lab he’s helped customers develop advanced ML solutions, in ML specialties from computer vision to reinforcement learning. He’s passionate about pushing technology forward and unlocking new potential from AWS products along the way.

Francisco Calderon Rodriguez is a Data Scientist in the Amazon ML Solutions Lab. As a member of the ML Solutions Lab, he helps solve critical business problems for AWS customers using deep learning. In his spare time, Francisco likes to play music and guitar, play soccer with his daughters, and enjoy time with his family.

Yohei Nakayama is a Deep Learning Architect at the Amazon ML Solutions Lab. He works with customers across different verticals to accelerate their use of artificial intelligence and AWS Cloud services to solve their business challenges. He is interested in applying ML/AI technologies to the space industry.

Nathalie Rauschmayr is a Senior Applied Scientist at AWS, where she helps customers develop deep learning applications.

Read More

Identify potential root cause in business-critical anomalies using Amazon Lookout for Metrics

We are excited to launch a causal contribution analysis capability in Amazon Lookout for Metrics that helps you to understand the potential root causes for the business-critical anomalies in the data. Previously, you were only given the root causes for a single anomaly per measure. You had to analyze to determine if causal relationships existed between the detected anomalies in different measures. When focusing on a single anomaly, you can easily miss the anomaly’s downstream (or upstream) impact. For example, you may see a spike in your checkout cart abandonment and know that your revenue will decrease. However, you may not know what caused the checkout carts to be abandoned at a higher rate. The causal contribution analysis feature can tell you that the spike in checkout cart abandonment may be due to spikes in transaction failures or sudden changes in prices due to promotion expiration.

Lookout for Metrics uses machine learning (ML) to automatically detect and diagnose anomalies in large datasets where deviations from normal are hard to detect and missed anomalies have business-critical impact. Lookout for Metrics reduces the time to implement AI/ML services for business-critical problems.

In this post, we discuss the new causal contribution analysis capability and its benefits.

Challenges in anomaly detection

Anomaly detection has two parts: detecting anomalies and identifying the root cause that triggered the anomalies so that team can take action to mitigate the problem.

Traditional business intelligence (BI) systems that use static threshold-based or rule-based anomalies have three problems. First, you might have millions of metrics to track across multiple data sources. Take digital advertisement, for example—you want to track metrics like impression, clicks, revenue, and shopping cart metrics across campaign IDs, product categories, geographies, and more. And it’s the same for any domain, be it retail, telecom, gaming, or financial services. With traditional BI tools, managing data across multiple sources, creating dashboards and reports, and adding alerts at a granular level requires a lot of manual work and isn’t scalable.

Second, these traditional BI tools work by setting up rules. You set up a range, and anything outside the range is an anomaly and you are alerted on those. If the range is too broad, you miss important alerts, and if it’s too narrow, you receive too many false alerts.

These ranges (upper bound and lower bound in image above) are also static, and don’t change based on the time of the day, day of the week, or seasons; they need to be manually updated. You’re likely to miss important anomalies and receive too many false alarms, or you lose trust in the tool and start ignoring these alerts altogether.

Lastly, BI reports and dashboards are often generated at the end of hour, end of day, or end of week, when it’s too late for you to act on a problem. And even when these results come, it doesn’t answer the why. So developers, analysts, and business owners can spend weeks trying to identify the root cause of the anomaly, delaying meaningful action even further.

Causal inference in Lookout for Metrics

Although asking for the root cause of an unexpected event seems to be at the heart of the human way of understanding the world, statistical associations are often misinterpreted as a causal influence. That is, correlation doesn’t imply causation, and discerning the causes of events from observational data requires specialized causal inference methods.

The root cause analysis in Lookout for Metrics uses causal inference techniques to increase the visibility and interpretability of anomalies across measures. Lookout for Metrics is capable of not only identifying causal drivers, but also quantitatively attributing the anomalous events to them, providing a percentage score of likelihood among the probable causal drivers of an anomalous event. For example, Lookout for Metrics can now draw causal links between a drop in advertisement views (anomaly) due to fewer clicks on your website, IOS, and Android (causation), leading to a decline in the revenue (downstream impact). Suppose one or more potential root causes occur (website, iOS, Android). In that case, Lookout for Metrics can identify the most likely cause (for example, website with a 90% likelihood) that led to a drop in advertisement views.

The scientific approach relies on a two-step procedure:

  1. Infer the causal relations between the measures.
  2. Based on the inferred causal structure, attribute the anomalies of the affected measure to the causing measures.

To infer the causal relations between the measures, we use a Granger causality method that takes the panel data structure of Lookout for Metrics into account. The existing Granger causality methods for panel data can’t deal with dependencies across dimension value combinations (for instance, dependencies of revenue across different countries that we typically have in real data). For example, events such as Black Friday increase the revenue of multiple countries and therefore there is an external source that renders the revenue of different countries dependent). We therefore had to develop our own Granger causality[1] method on panel data that can deal with these types of dependencies.

Once the causal structure is available, we attribute the anomalies of the affected measure to its causing measures to quantify the cause-effect relationships.

Analyze anomalies on the Lookout for Metrics console

After Lookout for Metrics starts anomaly detection, you can look for the detected anomalies on the Anomalies page for the detector. When you choose an anomaly, you’re redirected to the details page for the observed anomaly.

The anomaly details page includes a Root cause analysis section. This section tries to explain this observed anomaly with respect to the other anomalies for the anomaly detector configured measures.

In the following example, “Revenue impacted” is the observed anomaly, and the potential causes include orders and non-configured measures. Orders contributes approximately 81.84% to the current anomaly, namely revenue that leads to a downstream impact on profit.

Choosing the potential cause orders takes us to the details of its observed anomaly. In this case, the possible causes for this anomaly are clicks and non-configured measures. Clicks could be one of the potential causes of this anomaly, but it gets a relatively low contribution score of 8.37%, and the detector doesn’t observe anything anomalous for it. In this case, Lookout for Metrics concludes that the orders anomaly is caused by external factors or measures that weren’t configured for monitoring during the detector setup phase. This anomaly in orders has a potential downstream impact on profit and revenue.

Choosing the potential downstream impact profit takes us to the details of its observed anomaly. In this case, the potential causes seem to be a mix of anomalies in revenue, orders, and non-configured measures, with respective contribution scores of 33%, 14%, and 53%. No downstream measures are affected by this anomaly.

For this example, the anomaly in profit can be partially explained by the anomalies in revenue and orders. Then the anomaly in revenue can be explained by the anomaly in orders with a high certainty.

Conclusion

The new causal contribution analysis capability in Lookout for Metrics detects the causal interaction between anomalies in your measures. To achieve this, the detector learns the causal relation between measures in your data fully self-supervised and uses this causal information to trace anomalies back to their root causes. This feature can help you causally connect anomalies across measures and provides you with a tool to quickly diagnose and subsequently fix any issues in your system.

[1] L. Minorics, C. Turkmen, P. Bloebaum, D. Kernert, L. Callot and D. Janzing. Testing Granger Non-Causality in Panels with Cross-Sectional dependencies. AISTATS, 2022.

About the Authors

Lenon Minorics is an Applied Scientist focusing on causal inference and anomaly detection. Prior to Amazon, Lenon was an academic researcher in mathematics. His personal research interests include machine learning, causal inference, stochastics, and fractal geometry. In his free time, Lenon enjoys practicing all kinds of sports, especially Brazilian Jiu-Jitsu.

Shashank Srivastava is Senior Product Manager for Amazon AI vertical services. He is passionate about solving problems in AI in NLP, novelty detection, and data scarcity. In his free time, Shashank enjoys playing tennis and golf.

Caner Türkmen is an Applied Scientist at Amazon Web Services, where he works on problems at the intersection of machine learning, forecasting, and anomaly detection. Before joining AWS, he worked in the management consulting industry as a data scientist, serving the financial services and telecommunications industries on projects across the globe. Caner’s personal research interests span a range of topics, including probabilistic and Bayesian ML, stochastic processes, and their practical applications.

Alex Kim is a Sr. Product Manager for AWS AI Services. His mission is to deliver AI/ML solutions to all customers who can benefit from it. In his free time, he enjoys all types of sports and discovering new places to eat.

Read More